Russian dictionary with concreteness/abstractness indices

Valery D. Solovyev; Соловьев Валерий Дмитриевич; Yulia A. Volskaya; Вольская Юлия Александровна; Mariia I. Andreeva; Андреева Мария Игоревна; Artem A. Zaikin; Заикин Артем Александрович

doi:10.22363/2687-0088-29475

Russian dictionary with concreteness/abstractness indices

Authors: Solovyev V.D.¹, Volskaya Y.A.¹, Andreeva M.I.¹^,2, Zaikin A.A.¹
Affiliations:
1. Kazan (Volga region) Federal University
2. Kazan State Medical University
Issue: Vol 26, No 2 (2022): Computational Linguistics and Discourse Complexology
Pages: 515-549
Section: Articles
URL: https://journals.rudn.ru/linguistics/article/view/31336
DOI: https://doi.org/10.22363/2687-0088-29475

Cite item

Full Text

Abstract

The demand for a Russian dictionary with indices of abstractness/concreteness of words has been expressed in a number of areas including linguistics, psychology, neurophysiology and cognitive studies focused on imaging concepts in human cognitive systems. Although dictionaries of abstractness/concreteness were compiled for a number of languages, Russian has been recently viewed as an under-resourced language for the lack of one. The Laboratory of Quantitative Linguistics of Kazan Federal University has implemented two methods of compiling dictionaries of abstract/concrete words, i.e. respondents survey and extrapolation of human estimates with the help of an original computer program. In this article, we provide a detailed description of the methodology used for assessing abstractness/concreteness of words by native Russian respondents, as well as control algorithms validating the survey quality. The implementation of the methodology has enabled us to create a Russian dictionary (1500 words) with indices of concreteness/abstractness of words, including those missing in the Russian Semantic Dictionary by N.Yu. Shvedova (1998). We have also created three versions of a machine dictionary of abstractness/concreteness based on the extrapolation of the respondents' ratings. The third, most accurate version contains 22,000 words and has been compiled with the use of a modern deep learning technology of neural networks. The paper provides statistical characteristics (histograms of the distribution of ratings, dispersion, etc.) of both the machine dictionary and the dictionary obtained by interviewing informants. The quality of the machine dictionary was validated on a test set of words by means of contrasting machine and human evaluations with the latter viewed as more credible. The purpose of the paper is to give a detailed description of the methodology employed to create a concrete/abstract dictionary, as well as to demonstrate the methodology of its application in theoretical and applied research on concrete examples. The paper shows the practical use of this vocabulary in six case studies: predicting the complexity of school textbooks as a function of the share of abstract words; comparing abstractness indices of Russian-English equivalents; assessing concreteness/abstractness of polysemantic words; contrasting ratings of different age groups of respondents; contrasting ratings of respondents with different levels of education; analyzing concepts of "concreteness” and “specificity”.

Keywords

concreteness, abstractness, digital dictionary, Russian, academic texts

Full Text

Fig. 1. Histogram of ratings distribution

Fig. 2. Distribution of ratings difference prior to and after filtration

Fig. 3. Histogram of distribution of ratings dispersion

Fig. 4. Dispersions distribution with superimposed moving average line

Fig. 5. Ratings plot based on two age groups

Table 1. Words with major negative difference of ratings

Word	Group 1 ratings	Group 2 ratings	Word	Group 1 ratings	Group 2 ratings
scouting	1,533	2,133	criterion	1,867	3,267
report	1,667	2,167	hunt	2,8	3,267
knot	1,867	2,4	gift	2,967	3,467
agent	1,967	2,467	exertion	3,067	3,6
decree	2,133	2,633	rate	2,767	3,633
lecture	2	2,667	method	3,167	3,667
acquaintance	2,133	2,667	intention	3,333	3,833
interview	1,833	2,8	dullness	3,467	4
inquiry	2,133	2,8	complaint	3,1	4,033
ordeal	2,333	2,867	concentration	3,7	4,267
wedding	2,3	2,967	opportunity	3,833	4,4
Sunday	2,5	2,967	suffering	3,933	4,433
statistics	2,5	3,067

Table 2. Words with major positive difference of ratings

Word	Group 1 ratings	Group 2 ratings	Word	Group 1 ratings	Group 2 ratings
ball	1,767	1,267	prey	2,900	2,433
fog	2,367	1,433	nourishment	2,967	2,467
hall	2,133	1,500	symbol	3,033	2,533
filming	2,567	2,067	turn	3,467	2,900
task	2,588	2,080	change	3,600	3,100
meeting	2,600	2,133	career	3,633	3,167
section	3,067	2,200	secret	3,800	3,167
tempo	3,533	2,300	strain	3,794	3,260
certificate	2,833	2,367	implementation	3,900	3,433
standard	3,133	2,400	restoration	4,300	3,567

Fig. 6. Ratings dispersion based on respondents’ education

Table 3. Russian-English ratings

#	Russian word	Rating (Rus.)	Rating (Eng.)	Rating difference	English word
1	sila	340	339	1	strength
2	derevo	606	604	2	tree
3	effekt	288	295	7	effect
	…	…	…	…	…
771	administratcija	559	331	268	administration

Fig. 7. Ratings difference in word combinations

Fig. 8. Ratings’ difference

Fig. 9. Negative difference between machine dictionary and survey results data

Fig. 10. Positive difference between machine dictionary and survey results data

Fig. 11. Ratings with extreme and mean values

Fig. 12. Word distribution in two-dimensional space of concreteness-specificity (Ivanov & Solovyev 2021)

Table 4. Russian Academic Corpus

Grade	Number of words
Grade	Sciences	Humanities	TOTAL
1	21304	4757	26061
2	29284	28235	57519
3	53565	-	53565
4	51489	24621	76110
5	102467	19527	121994
6	-	159664	159664
7	75205	111788	186993
8	-	273251	273251
9	88335	390821	479156
10	207271	656072	863343
11	-	436322	436322
Итого	628920	2105058	2733978

Table 5. Ratings of recalls and textbooks
Subject	Grade	Mean rating
Primary school	1-4	+0,34
Biology	5-7	+0,49
Biology	9-10	+0,15
History	10-11	0
Social studies	5-8	-0,11
Social studies	9-11	-0,15
Literature	6-8	+0,08
Literature	9-11	-0,14
MT Texts	5	0,12
OT Texts	5	0,17
Recalls	5	0,27

About the authors

Valery D. Solovyev

Kazan (Volga region) Federal University

Email: maki.solovyev@mail.ru
ORCID iD: 0000-0003-4692-2564

Doctor Habil. of Physics and Mathematics, Professor, Chief Researcher of the Text Analytics Research Laboratory

18 Kremlevskaya St., Kazan, 420008, Russia

Yulia A. Volskaya

Kazan (Volga region) Federal University

Email: kovaleva95julia@mail.ru
ORCID iD: 0000-0001-8276-5864

Assistant Professor of the Department of Applied and Experimental Linguistics, and Junior Research Fellow of the Neurocognitive Research Laboratory

18 Kremlevskaya St., Kazan, 420008, Russia

Mariia I. Andreeva

Kazan (Volga region) Federal University; Kazan State Medical University

Email: lafruta@mail.ru
ORCID iD: 0000-0002-5760-0934

holds a PhD degree in Philology and is Associate Professor of the Department of Foreign Languages

18 Kremlevskaya St., Kazan, 420008, Russia; 49 Butlerov St., Kazan, 420012, Russia

Artem A. Zaikin

Kazan (Volga region) Federal University

Author for correspondence.
Email: kaskrin@gmail.com
ORCID iD: 0000-0002-5596-3176

Doctor of Physics and Mathematics and Associate Professor of the Department of Mathematical Statistics

18 Kremlevskaya St., Kazan, 420008, Russia

References

Andreeva, Mariia, Marina Solnyshkina, Artem Zaikin, Olga Bukach & Radif Zamaletdinov. 2020. Assessment of comparative abstractness: Quantitative approach. Proceedings of the Computational Models in Language and Speech Workshop (CMLS 2020) co-located with 16th International Conference on Computational and Cognitive Linguistics (TEL 2020). 132-144.
Black, Paul. 2019. Manhattan distance. In Dictionary of Algorithms and Data Structures [Online]. http://www.nist.gov/dads/HTML/manhattanDistance.html. (accessed 19.04.2022)
Bolognesi, Marianna, Burgers Christian & Caselli Tommaso. 2020. On abstraction: Decoupling conceptual concreteness and categorical specificity. Cognitive Processing 21 (3). 365-381. DOI: https://doi.org/10.1007/s10339-020-00965-9.
Borghi, Anna M., Ferdinand Binkofski, Cristiano Castelfranchi & Felice Cimatti. 2017. The challenge of abstract concepts. Psychol. Bull 143. 263-292.
Brysbaert, Marc, Amy Beth Warriner & Victor Kuperman. 2014a. Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods 46 (3). 904-911.
Brysbaert, Marc, Michaël Stevens, Simon De Deyne, Simon De Deyne & Gert Storms. 2014b. Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychologica 150. 80-84. https://doi.org/10.1016/j.actpsy.2014.04.010
Chandola, Varun, Arindam Banerjee & Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41(3). 1-58.
Charbonnier, Jean & Wartena Christian. 2019. Predicting word concreteness and imagery. In Proceedings of the 13th International Conference on Computational Semantics-Long Papers. 176-187.
Cristianini, Nello & John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
Coltheart, Max. 1981. The MRC Psycholinguistic Database. Quarterly Journal of Experimental Psychology 33A. 497-505.
Dallin, J Bailey, Christina Nessler, Kiera N Berggren & Julie L Wambaugh. 2020. An Aphasia treatment for verbs with low concreteness: A pilot study. American Journal of Speech-Language Pathology 29 (1). 299-318.
de Groot, Annette M. 1989. Representational aspects of word imageability and word frequency as assessed through word association. Journal of Experimental Psychology: Learning, Memory, and Cognition 15(5). 824-845. https://doi.org/10.1037/0278-7393.15.5.824
Devitt, Ann & Vogel Carl. 2004. The Topology of WordNet: Some Metrics. GWC Proceedings. 106-111.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
Evans, James D. 1996. Straightforward Statistics for the Behavioral Ssciences. Brooks/Cole Publishing, Pacific Grove.
Fellbaum, Christiane. 1998. Wordnet: An Electronic Lexical Database. MIT Press. Cambridge, Massachusetts.
Fisher, Douglas, Frey Nancy & Lapp Diane. 2016. Text Complexity: Stretching Readers with Texts and Tasks. Corwin Press.
Fliessbach, Klaus, Susanne Weis, Peter Klaver, Christian E. Elger & Bernhard Weber. 2006. The effect of word concreteness on recognition memory. NeuroImage 32 (3). 1413-1421. https://doi.org/10.1016/j.neuroimage.2006.06.007
Gizatulina, Diana, Farida Ismaeva, Marina Solnyshkina, Ekaterina Martynova & Iskander Yarmakeev. 2020. Fluctuations of text complexity: The case of Basic State Examination in English. In SHS Web of Conferences 88. EDP Sciences.
Ivanov, Vladimir & Solovyev Valery. 2021. The Relation of Categories of Concreteness and Specificity: Russian Data. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2021”. URL: http://www.dialog-21.ru/media/5260/ivanovvplussolovyevv049.pdf. (accessed 19.04.2022).
Joulin, Armand, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou & Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv:1612.03651.
Kousta, Stavroula-Thaleia, Gabriella Vigliocco, David P Vinson & Mark Andrews. 2011. The representation of abstract words: Why emotion matters. Exp Psychol Gen. Feb. 140 (1). 14-34. https://doi.org/10.1037/a0021446.
Krioni, Nikolay K., Alexey D. Nikitin & Anastasiya V. Fillipova. 2008. Avtomatizirovannaya sistema analiza slozhnosti uchebnyh tekstov. Bulletin of Ufa State Technical University of Aviation 11. 1 (28). 101-107. (In Russ.) Kuznecov, Sergey A. 2006. Bol'shoy Tolkovy Slovar' Russkogo Yazyka. Norint. (In Russ.)
Laming, Donald. 2004. Human Judgement: The Eye of the Beholder. London: Thompson Learning.
Lukashevich, Natilia V. 2011. Thesauruses in Information Search Tasks. M.: Izd-vo Moskovskogo universiteta. (In Russ.)
Maximilian, Köper & Sabine Schulte im Walde. 2016. Automatically generated affective norms of abstractness, arousal, imageability and valence for 350 000 German lemmas. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 2595-2598.
McCarthy, Kathryn Soo, Danielle Siobhan Mcnamara, Marina I. Solnyshkina, Fanuza Kh. Tarasova & Roman V. Kupriyanov. 2019. The Russian language test: Towards assessing text comprehension. Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Serii a 2, Iazykoznanie; Volgograd 18 (4). 231-247.
McNamara, Danielle, Arthur C. Graesser, Philip M. Mccarthy & Zhiqiang Cai. 2014. Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge, MA: Cambridge University Press.
Mestres-Missé, Anna, Thomas F. Münte & Antoni Rodriguez-Fornells. 2014. Mapping concrete and abstract meanings to new words using verbal contexts. Second Language Research 30 (2). 191-223.
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado & Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. arΧiv:1310.4546.
Miller, George A. 1998. Nouns in WordNet. In Christiane Fellbaum (ed.), Wordnet: An electronic lexical database mit press. Cambridge, Massachusetts.
Mkrtychian, Nadezhda, Evgeny Blagovechtchenski, Diana Kurmakaeva, Daria Gnedykh, Svetlana Kostromina & Yury Shtyrov. 2019. Concrete vs. Abstract Semantics: From mental representations to functional brain mapping. Frontiers in Human Neuroscience 13. 267. https://doi.org/10.3389/fnhum.2019.00267
Naumann, Daniela, Diego Frassinelli & Sabine Schulte im Walde. 2018. Quantitative semantic variation in the contexts of concrete and abstract words. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, New Orleans, LA. 76-85.
Paivio, Allan. 1965. Abstractness, imagery, and meaningfulness in paired-associate learning. Journal of Verbal Learning and Verbal Behaviour 4. 32-38. https://doi.org/10.1016/s0022-5371(65)80064-0
Paivio, Allan. 1990. Dual Coding Theory, in Mental Representations: A Dual Coding Approach. Oxford: Oxford University Press. 53-83. https://doi.org/10.1093/acprof:oso/9780195066661.003.0004
Pasquale, A. Della Rosa, Eleonora Catricalà, Gabriella Vigliocco & Stefano F. Cappa. 2010. Behavior Research Methods Beyond the abstract-concrete dichotomy: Mode of acquisition, concreteness, imageability, familiarity, age of acquisition, context availability, and abstractness norms for a set of 417 Italian. Behavior Research Methods 42 (4). 1042-1048. https://doi.org/10.3758/BRM.42.4.1042
Peti-Stantić, Anita, Maja Anđel, Vedrana Gnjidić, Gordana Keresteš, Nikola Ljubešić, Irina Masnikosa, Mirjana Tonković, Jelena Tušek, Jana Willer-Gold & Mateusz-Milan Stanojević. 2021. The Croatian Psycholinguistic Database: Estimates for 6000 Nouns, Verbs, Adjectives and Adverbs. 1-18. https://doi.org/10.3758/s13428-020-01533-x
Reilly, Megan, & Rutvik H. Desai. 2017. Effects of semantic neighborhood density in abstract and concrete words. Cognition 169. 46-53. https://doi.org/10.1016/j.cognition.2017.08.004
Rosch, Eleanor. 1975. Cognitive representations of semantic categories. Journal of Experimental Psycholology: General 104 (3). 192-233.
Sadoski, Mark, Wiliam A. Kealy, E. T. Goetz & Allan Paivio. 1997. Concreteness and imagery effects in the written composition of definitions. Journal of Educational Psychology 89(3). 518-526. https://doi.org/10.1037/0022-0663.89.3.518
Sadoski, Mark. 2001. Resolving the effects of concreteness on interest, comprehension, and learning important ideas from text. Educational Psychology Review 13(3). 263-281.
Schmid, Hans-J¨org. 2000. English Abstract Nouns as Conceptual Shells: From Corpus to Cognition. Topics in English Linguistics. Berlin: Mouton de Gruyter.
Schwanenflugel, Paula J. & Edward J. Shoben. 1983. Differential context effects in the comprehension of abstract and concrete verbal materials. Journal of Experimental Psychology: Learning, Memory, and Cognition 9 (1). 82-102. https://doi.org/1037/0278-7393.9.1.82
Schwanenflugel, Paula J., Carolyn Akin & Wei-Ming Luh. 1992. Context availability and the recall of abstract and concrete words. Memory & Cognition 20 (1). 96-104. https://doi.org/10.3758/bf03208259
Snefjella, Bryor, Michel Généreux & Victor Kuperman. 2019. Historical evolution of concrete and abstract language revisited. Behavior Research Methods 51 (4). 1693-1705.
Solnyshkina, Marina I., Radif. R. Zamaletdinov, Ehl'zara Gizzatullina-Gafiyatova, Diana Gizatulina & Maria Begaeva. 2021. Mnogofaktorny analiz slozhnosti teksta. Inostrannye Yazyki v Shkole. 28-34. (In Russ.)
Solovyev, Valery D., Vladimir V. Ivanov & Rauf B. Akhtiamov. 2019a. Dictionary of abstract and concrete words of the Russian language: A methodology for creation and application. Journal of Research in Applied Linguistics 10. 215-227.
Solovyev, Valery, Mariia Andreeva, Marina Solnyshkina, Radif Zamaletdinov, Andrey Danilov & Dina Gaynutdinova. 2019b. Computing concreteness ratings of Russian and English most frequent words: Contrastive approach. In the Proceedings of the 12th International Conference on Developments in eSystems Engineering (DeSE). 403-408.
Solovyev, Valery D., Vladimir V. Bochkarev & S. V. Khristoforov. 2020a. Generation of a dictionary of abstract/concrete words by a multilayer neural network. Journal of Physics: Conference Series 1680 (1). 012046.
Solovyev, Valery, Marina Solnyshkina, Mariia Andreeva, Andrey Danilov & Radif Zamaletdinov. 2020b. Text Complexity and Abstractness: Tools for the Russian Language. Proceedings of the International Conference “Internet and Modern Society”. 75-87.
Solovyev, Valery. 2021. Concreteness/Abstractness Concept: State of the Art. Advances in Intelligent Systems and Computing 1358. 275-283.
Spreen, Otfried & Rudolph W. Schulz. 1966. Parameters of abstraction, meaningfulness, and pronunciability for 329 nouns. Journal of Verbal Learning and Verbal Behavior 5. 459-468.
Taylor, Linda & Weir Cyril J. 2012. IELTS Collected Papers 2: Research in Reading and Listening Assessment 2. Cambridge University Press.
Turney, Peter D. & Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37. 141-188.
Vergallito, Alessandra, Marco Alessandro Petilli & Marco Marelli. 2020. Perceptual modality norms for 1,121 Italian words: A comparison with concreteness and imageability scores and an analysis of their impact in word processing tasks. Behavior Research Methods. 1-18.
Vinogradov, Victor V. 2001. Russian language (Grammatical studies of a word). Russian Language. (In Russ.)
Vol'skaia, Iulia A. 2020. Creating a dictionary of abstract beings in the Russian language: A criterion for selecting vocabulary. Philology and Culture 1 (59). 13-17. (In Russ.)
Volskaya, Yulia A., Irina S. Zhuravkina & Alexander P. Lobanov. 2020. Dictionary of abstract the words of the Russian language: Nouns with high numerical measure of abstractness. International Journal of Criminology and Sociology 9. 2398-2405.
Wang, X. & Y Bi. 2021. Idiosyncratic tower of Babel: Individual differences in word-meaning representation increase as word abstractness increases. Psychological Science 32(10). 1617-1635.
Yao, Zhao, Jia Wu, Yanyan Zhang & Zhenhong Wang. 2017. Norms of valence, arousal, concreteness, familiarity, imageability, and context availability for 1,100 Chinese words. Behav Res 49. 1374-1385. https://doi.org/10.3758/s13428-016-0793-2
Zhuravkina, Irina, Valery Soloviev, Alexander Lobanov & Andrey Danilov. 2020. Comparative analysis of concreteness abstractness of Russian words. In Conference of Open Innovation Association, FRUCT. 464-470.
Lyashevskay Olga N. & Sharoff S.A. 2009. New Russian frequency dictionary. (In Russ.) http://dict.ruslang.ru/freq.php (accessed 28.12.2021).
Small Academic Dictionary. 1981-1984. (In Russ.) https://gufo.me/dict/mas (accessed 28.05.2021).
Russian National Corpus. (In Russ.) http://www.ruscorpora.ru/ (accessed 28.12.2021).
Russian Semantic Dictionary. 1998. In Shvedova N.Yu. (ed.). ‘Azbukovnik’ (In Russ.)
RuThes Thesaurus. (In Russ.) http://www.labinform.ru/pub/ruthes/index.htm (accessed 28.12.2021).
Technologies of Compiling Semantic Electronic Dictionaries. (In Russ.) https://kpfu.ru/tehnologiya-sozdaniya-semanticheskih-elektronnyh.html (accessed 28.12.2021).
Cohmetrix. http://cohmetrix.com/ (accessed 28.12.2021).
Corpus of Contemporary American English. https://www.english-corpora.org/coca (accessed 28.05.2021).
Google Books Ngram. https://books.google.com/ngrams (accessed 28.12.2021).
FastText. Library for efficient text classification and representation learning. https://fasttext.cc/ (accessed 28.12.2021).

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register

Vol 29, No 1 (2025): Ecolinguistics: Consolidating a research paradigm

Russian dictionary with concreteness/abstractness indices

Full Text

Abstract

Keywords

Full Text

About the authors

Valery D. Solovyev

Yulia A. Volskaya

Mariia I. Andreeva

Artem A. Zaikin

References

This website uses cookies