Russian dictionary with concreteness/abstractness indices

封面

如何引用文章

详细

The demand for a Russian dictionary with indices of abstractness/concreteness of words has been expressed in a number of areas including linguistics, psychology, neurophysiology and cognitive studies focused on imaging concepts in human cognitive systems. Although dictionaries of abstractness/concreteness were compiled for a number of languages, Russian has been recently viewed as an under-resourced language for the lack of one. The Laboratory of Quantitative Linguistics of Kazan Federal University has implemented two methods of compiling dictionaries of abstract/concrete words, i.e. respondents survey and extrapolation of human estimates with the help of an original computer program. In this article, we provide a detailed description of the methodology used for assessing abstractness/concreteness of words by native Russian respondents, as well as control algorithms validating the survey quality. The implementation of the methodology has enabled us to create a Russian dictionary (1500 words) with indices of concreteness/abstractness of words, including those missing in the Russian Semantic Dictionary by N.Yu. Shvedova (1998). We have also created three versions of a machine dictionary of abstractness/concreteness based on the extrapolation of the respondents' ratings. The third, most accurate version contains 22,000 words and has been compiled with the use of a modern deep learning technology of neural networks. The paper provides statistical characteristics (histograms of the distribution of ratings, dispersion, etc.) of both the machine dictionary and the dictionary obtained by interviewing informants. The quality of the machine dictionary was validated on a test set of words by means of contrasting machine and human evaluations with the latter viewed as more credible. The purpose of the paper is to give a detailed description of the methodology employed to create a concrete/abstract dictionary, as well as to demonstrate the methodology of its application in theoretical and applied research on concrete examples. The paper shows the practical use of this vocabulary in six case studies: predicting the complexity of school textbooks as a function of the share of abstract words; comparing abstractness indices of Russian-English equivalents; assessing concreteness/abstractness of polysemantic words; contrasting ratings of different age groups of respondents; contrasting ratings of respondents with different levels of education; analyzing concepts of "concreteness” and “specificity”.

作者简介

Valery Solovyev

Kazan (Volga region) Federal University

Email: maki.solovyev@mail.ru
ORCID iD: 0000-0003-4692-2564

Doctor Habil. of Physics and Mathematics, Professor, Chief Researcher of the Text Analytics Research Laboratory

18 Kremlevskaya St., Kazan, 420008, Russia

Yulia Volskaya

Kazan (Volga region) Federal University

Email: kovaleva95julia@mail.ru
ORCID iD: 0000-0001-8276-5864

Assistant Professor of the Department of Applied and Experimental Linguistics, and Junior Research Fellow of the Neurocognitive Research Laboratory

18 Kremlevskaya St., Kazan, 420008, Russia

Mariia Andreeva

Kazan (Volga region) Federal University; Kazan State Medical University

Email: lafruta@mail.ru
ORCID iD: 0000-0002-5760-0934

holds a PhD degree in Philology and is Associate Professor of the Department of Foreign Languages

18 Kremlevskaya St., Kazan, 420008, Russia; 49 Butlerov St., Kazan, 420012, Russia

Artem Zaikin

Kazan (Volga region) Federal University

编辑信件的主要联系方式.
Email: kaskrin@gmail.com
ORCID iD: 0000-0002-5596-3176

Doctor of Physics and Mathematics and Associate Professor of the Department of Mathematical Statistics

18 Kremlevskaya St., Kazan, 420008, Russia

参考

  1. Andreeva, Mariia, Marina Solnyshkina, Artem Zaikin, Olga Bukach & Radif Zamaletdinov. 2020. Assessment of comparative abstractness: Quantitative approach. Proceedings of the Computational Models in Language and Speech Workshop (CMLS 2020) co-located with 16th International Conference on Computational and Cognitive Linguistics (TEL 2020). 132-144.
  2. Black, Paul. 2019. Manhattan distance. In Dictionary of Algorithms and Data Structures [Online]. http://www.nist.gov/dads/HTML/manhattanDistance.html. (accessed 19.04.2022)
  3. Bolognesi, Marianna, Burgers Christian & Caselli Tommaso. 2020. On abstraction: Decoupling conceptual concreteness and categorical specificity. Cognitive Processing 21 (3). 365-381. DOI: https://doi.org/10.1007/s10339-020-00965-9.
  4. Borghi, Anna M., Ferdinand Binkofski, Cristiano Castelfranchi & Felice Cimatti. 2017. The challenge of abstract concepts. Psychol. Bull 143. 263-292.
  5. Brysbaert, Marc, Amy Beth Warriner & Victor Kuperman. 2014a. Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods 46 (3). 904-911.
  6. Brysbaert, Marc, Michaël Stevens, Simon De Deyne, Simon De Deyne & Gert Storms. 2014b. Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychologica 150. 80-84. https://doi.org/10.1016/j.actpsy.2014.04.010
  7. Chandola, Varun, Arindam Banerjee & Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41(3). 1-58.
  8. Charbonnier, Jean & Wartena Christian. 2019. Predicting word concreteness and imagery. In Proceedings of the 13th International Conference on Computational Semantics-Long Papers. 176-187.
  9. Cristianini, Nello & John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
  10. Coltheart, Max. 1981. The MRC Psycholinguistic Database. Quarterly Journal of Experimental Psychology 33A. 497-505.
  11. Dallin, J Bailey, Christina Nessler, Kiera N Berggren & Julie L Wambaugh. 2020. An Aphasia treatment for verbs with low concreteness: A pilot study. American Journal of Speech-Language Pathology 29 (1). 299-318.
  12. de Groot, Annette M. 1989. Representational aspects of word imageability and word frequency as assessed through word association. Journal of Experimental Psychology: Learning, Memory, and Cognition 15(5). 824-845. https://doi.org/10.1037/0278-7393.15.5.824
  13. Devitt, Ann & Vogel Carl. 2004. The Topology of WordNet: Some Metrics. GWC Proceedings. 106-111.
  14. Devlin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
  15. Evans, James D. 1996. Straightforward Statistics for the Behavioral Ssciences. Brooks/Cole Publishing, Pacific Grove.
  16. Fellbaum, Christiane. 1998. Wordnet: An Electronic Lexical Database. MIT Press. Cambridge, Massachusetts.
  17. Fisher, Douglas, Frey Nancy & Lapp Diane. 2016. Text Complexity: Stretching Readers with Texts and Tasks. Corwin Press.
  18. Fliessbach, Klaus, Susanne Weis, Peter Klaver, Christian E. Elger & Bernhard Weber. 2006. The effect of word concreteness on recognition memory. NeuroImage 32 (3). 1413-1421. https://doi.org/10.1016/j.neuroimage.2006.06.007
  19. Gizatulina, Diana, Farida Ismaeva, Marina Solnyshkina, Ekaterina Martynova & Iskander Yarmakeev. 2020. Fluctuations of text complexity: The case of Basic State Examination in English. In SHS Web of Conferences 88. EDP Sciences.
  20. Ivanov, Vladimir & Solovyev Valery. 2021. The Relation of Categories of Concreteness and Specificity: Russian Data. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2021”. URL: http://www.dialog-21.ru/media/5260/ivanovvplussolovyevv049.pdf. (accessed 19.04.2022).
  21. Joulin, Armand, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou & Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv:1612.03651.
  22. Kousta, Stavroula-Thaleia, Gabriella Vigliocco, David P Vinson & Mark Andrews. 2011. The representation of abstract words: Why emotion matters. Exp Psychol Gen. Feb. 140 (1). 14-34. https://doi.org/10.1037/a0021446.
  23. Krioni, Nikolay K., Alexey D. Nikitin & Anastasiya V. Fillipova. 2008. Avtomatizirovannaya sistema analiza slozhnosti uchebnyh tekstov. Bulletin of Ufa State Technical University of Aviation 11. 1 (28). 101-107. (In Russ.) Kuznecov, Sergey A. 2006. Bol'shoy Tolkovy Slovar' Russkogo Yazyka. Norint. (In Russ.)
  24. Laming, Donald. 2004. Human Judgement: The Eye of the Beholder. London: Thompson Learning.
  25. Lukashevich, Natilia V. 2011. Thesauruses in Information Search Tasks. M.: Izd-vo Moskovskogo universiteta. (In Russ.)
  26. Maximilian, Köper & Sabine Schulte im Walde. 2016. Automatically generated affective norms of abstractness, arousal, imageability and valence for 350 000 German lemmas. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 2595-2598.
  27. McCarthy, Kathryn Soo, Danielle Siobhan Mcnamara, Marina I. Solnyshkina, Fanuza Kh. Tarasova & Roman V. Kupriyanov. 2019. The Russian language test: Towards assessing text comprehension. Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Serii a 2, Iazykoznanie; Volgograd 18 (4). 231-247.
  28. McNamara, Danielle, Arthur C. Graesser, Philip M. Mccarthy & Zhiqiang Cai. 2014. Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge, MA: Cambridge University Press.
  29. Mestres-Missé, Anna, Thomas F. Münte & Antoni Rodriguez-Fornells. 2014. Mapping concrete and abstract meanings to new words using verbal contexts. Second Language Research 30 (2). 191-223.
  30. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado & Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. arΧiv:1310.4546.
  31. Miller, George A. 1998. Nouns in WordNet. In Christiane Fellbaum (ed.), Wordnet: An electronic lexical database mit press. Cambridge, Massachusetts.
  32. Mkrtychian, Nadezhda, Evgeny Blagovechtchenski, Diana Kurmakaeva, Daria Gnedykh, Svetlana Kostromina & Yury Shtyrov. 2019. Concrete vs. Abstract Semantics: From mental representations to functional brain mapping. Frontiers in Human Neuroscience 13. 267. https://doi.org/10.3389/fnhum.2019.00267
  33. Naumann, Daniela, Diego Frassinelli & Sabine Schulte im Walde. 2018. Quantitative semantic variation in the contexts of concrete and abstract words. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, New Orleans, LA. 76-85.
  34. Paivio, Allan. 1965. Abstractness, imagery, and meaningfulness in paired-associate learning. Journal of Verbal Learning and Verbal Behaviour 4. 32-38. https://doi.org/10.1016/s0022-5371(65)80064-0
  35. Paivio, Allan. 1990. Dual Coding Theory, in Mental Representations: A Dual Coding Approach. Oxford: Oxford University Press. 53-83. https://doi.org/10.1093/acprof:oso/9780195066661.003.0004
  36. Pasquale, A. Della Rosa, Eleonora Catricalà, Gabriella Vigliocco & Stefano F. Cappa. 2010. Behavior Research Methods Beyond the abstract-concrete dichotomy: Mode of acquisition, concreteness, imageability, familiarity, age of acquisition, context availability, and abstractness norms for a set of 417 Italian. Behavior Research Methods 42 (4). 1042-1048. https://doi.org/10.3758/BRM.42.4.1042
  37. Peti-Stantić, Anita, Maja Anđel, Vedrana Gnjidić, Gordana Keresteš, Nikola Ljubešić, Irina Masnikosa, Mirjana Tonković, Jelena Tušek, Jana Willer-Gold & Mateusz-Milan Stanojević. 2021. The Croatian Psycholinguistic Database: Estimates for 6000 Nouns, Verbs, Adjectives and Adverbs. 1-18. https://doi.org/10.3758/s13428-020-01533-x
  38. Reilly, Megan, & Rutvik H. Desai. 2017. Effects of semantic neighborhood density in abstract and concrete words. Cognition 169. 46-53. https://doi.org/10.1016/j.cognition.2017.08.004
  39. Rosch, Eleanor. 1975. Cognitive representations of semantic categories. Journal of Experimental Psycholology: General 104 (3). 192-233.
  40. Sadoski, Mark, Wiliam A. Kealy, E. T. Goetz & Allan Paivio. 1997. Concreteness and imagery effects in the written composition of definitions. Journal of Educational Psychology 89(3). 518-526. https://doi.org/10.1037/0022-0663.89.3.518
  41. Sadoski, Mark. 2001. Resolving the effects of concreteness on interest, comprehension, and learning important ideas from text. Educational Psychology Review 13(3). 263-281.
  42. Schmid, Hans-J¨org. 2000. English Abstract Nouns as Conceptual Shells: From Corpus to Cognition. Topics in English Linguistics. Berlin: Mouton de Gruyter.
  43. Schwanenflugel, Paula J. & Edward J. Shoben. 1983. Differential context effects in the comprehension of abstract and concrete verbal materials. Journal of Experimental Psychology: Learning, Memory, and Cognition 9 (1). 82-102. https://doi.org/1037/0278-7393.9.1.82
  44. Schwanenflugel, Paula J., Carolyn Akin & Wei-Ming Luh. 1992. Context availability and the recall of abstract and concrete words. Memory & Cognition 20 (1). 96-104. https://doi.org/10.3758/bf03208259
  45. Snefjella, Bryor, Michel Généreux & Victor Kuperman. 2019. Historical evolution of concrete and abstract language revisited. Behavior Research Methods 51 (4). 1693-1705.
  46. Solnyshkina, Marina I., Radif. R. Zamaletdinov, Ehl'zara Gizzatullina-Gafiyatova, Diana Gizatulina & Maria Begaeva. 2021. Mnogofaktorny analiz slozhnosti teksta. Inostrannye Yazyki v Shkole. 28-34. (In Russ.)
  47. Solovyev, Valery D., Vladimir V. Ivanov & Rauf B. Akhtiamov. 2019a. Dictionary of abstract and concrete words of the Russian language: A methodology for creation and application. Journal of Research in Applied Linguistics 10. 215-227.
  48. Solovyev, Valery, Mariia Andreeva, Marina Solnyshkina, Radif Zamaletdinov, Andrey Danilov & Dina Gaynutdinova. 2019b. Computing concreteness ratings of Russian and English most frequent words: Contrastive approach. In the Proceedings of the 12th International Conference on Developments in eSystems Engineering (DeSE). 403-408.
  49. Solovyev, Valery D., Vladimir V. Bochkarev & S. V. Khristoforov. 2020a. Generation of a dictionary of abstract/concrete words by a multilayer neural network. Journal of Physics: Conference Series 1680 (1). 012046.
  50. Solovyev, Valery, Marina Solnyshkina, Mariia Andreeva, Andrey Danilov & Radif Zamaletdinov. 2020b. Text Complexity and Abstractness: Tools for the Russian Language. Proceedings of the International Conference “Internet and Modern Society”. 75-87.
  51. Solovyev, Valery. 2021. Concreteness/Abstractness Concept: State of the Art. Advances in Intelligent Systems and Computing 1358. 275-283.
  52. Spreen, Otfried & Rudolph W. Schulz. 1966. Parameters of abstraction, meaningfulness, and pronunciability for 329 nouns. Journal of Verbal Learning and Verbal Behavior 5. 459-468.
  53. Taylor, Linda & Weir Cyril J. 2012. IELTS Collected Papers 2: Research in Reading and Listening Assessment 2. Cambridge University Press.
  54. Turney, Peter D. & Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37. 141-188.
  55. Vergallito, Alessandra, Marco Alessandro Petilli & Marco Marelli. 2020. Perceptual modality norms for 1,121 Italian words: A comparison with concreteness and imageability scores and an analysis of their impact in word processing tasks. Behavior Research Methods. 1-18.
  56. Vinogradov, Victor V. 2001. Russian language (Grammatical studies of a word). Russian Language. (In Russ.)
  57. Vol'skaia, Iulia A. 2020. Creating a dictionary of abstract beings in the Russian language: A criterion for selecting vocabulary. Philology and Culture 1 (59). 13-17. (In Russ.)
  58. Volskaya, Yulia A., Irina S. Zhuravkina & Alexander P. Lobanov. 2020. Dictionary of abstract the words of the Russian language: Nouns with high numerical measure of abstractness. International Journal of Criminology and Sociology 9. 2398-2405.
  59. Wang, X. & Y Bi. 2021. Idiosyncratic tower of Babel: Individual differences in word-meaning representation increase as word abstractness increases. Psychological Science 32(10). 1617-1635.
  60. Yao, Zhao, Jia Wu, Yanyan Zhang & Zhenhong Wang. 2017. Norms of valence, arousal, concreteness, familiarity, imageability, and context availability for 1,100 Chinese words. Behav Res 49. 1374-1385. https://doi.org/10.3758/s13428-016-0793-2
  61. Zhuravkina, Irina, Valery Soloviev, Alexander Lobanov & Andrey Danilov. 2020. Comparative analysis of concreteness abstractness of Russian words. In Conference of Open Innovation Association, FRUCT. 464-470.
  62. Lyashevskay Olga N. & Sharoff S.A. 2009. New Russian frequency dictionary. (In Russ.) http://dict.ruslang.ru/freq.php (accessed 28.12.2021).
  63. Small Academic Dictionary. 1981-1984. (In Russ.) https://gufo.me/dict/mas (accessed 28.05.2021).
  64. Russian National Corpus. (In Russ.) http://www.ruscorpora.ru/ (accessed 28.12.2021).
  65. Russian Semantic Dictionary. 1998. In Shvedova N.Yu. (ed.). ‘Azbukovnik’ (In Russ.)
  66. RuThes Thesaurus. (In Russ.) http://www.labinform.ru/pub/ruthes/index.htm (accessed 28.12.2021).
  67. Technologies of Compiling Semantic Electronic Dictionaries. (In Russ.) https://kpfu.ru/tehnologiya-sozdaniya-semanticheskih-elektronnyh.html (accessed 28.12.2021).
  68. Cohmetrix. http://cohmetrix.com/ (accessed 28.12.2021).
  69. Corpus of Contemporary American English. https://www.english-corpora.org/coca (accessed 28.05.2021).
  70. Google Books Ngram. https://books.google.com/ngrams (accessed 28.12.2021).
  71. FastText. Library for efficient text classification and representation learning. https://fasttext.cc/ (accessed 28.12.2021).

版权所有 © Solovyev V., Volskaya Y., Andreeva M., Zaikin A., 2022

Creative Commons License
此作品已接受知识共享署名-非商业性使用 4.0国际许可协议的许可。

##common.cookie##