Russian dictionary with concreteness/abstractness indices

Cover Page

Cite item

Abstract

The demand for a Russian dictionary with indices of abstractness/concreteness of words has been expressed in a number of areas including linguistics, psychology, neurophysiology and cognitive studies focused on imaging concepts in human cognitive systems. Although dictionaries of abstractness/concreteness were compiled for a number of languages, Russian has been recently viewed as an under-resourced language for the lack of one. The Laboratory of Quantitative Linguistics of Kazan Federal University has implemented two methods of compiling dictionaries of abstract/concrete words, i.e. respondents survey and extrapolation of human estimates with the help of an original computer program. In this article, we provide a detailed description of the methodology used for assessing abstractness/concreteness of words by native Russian respondents, as well as control algorithms validating the survey quality. The implementation of the methodology has enabled us to create a Russian dictionary (1500 words) with indices of concreteness/abstractness of words, including those missing in the Russian Semantic Dictionary by N.Yu. Shvedova (1998). We have also created three versions of a machine dictionary of abstractness/concreteness based on the extrapolation of the respondents' ratings. The third, most accurate version contains 22,000 words and has been compiled with the use of a modern deep learning technology of neural networks. The paper provides statistical characteristics (histograms of the distribution of ratings, dispersion, etc.) of both the machine dictionary and the dictionary obtained by interviewing informants. The quality of the machine dictionary was validated on a test set of words by means of contrasting machine and human evaluations with the latter viewed as more credible. The purpose of the paper is to give a detailed description of the methodology employed to create a concrete/abstract dictionary, as well as to demonstrate the methodology of its application in theoretical and applied research on concrete examples. The paper shows the practical use of this vocabulary in six case studies: predicting the complexity of school textbooks as a function of the share of abstract words; comparing abstractness indices of Russian-English equivalents; assessing concreteness/abstractness of polysemantic words; contrasting ratings of different age groups of respondents; contrasting ratings of respondents with different levels of education; analyzing concepts of "concreteness” and “specificity”.

Full Text

Fig. 1. Histogram of ratings distribution

Fig. 2. Distribution of ratings difference prior to and after filtration

 

Fig. 3. Histogram of distribution of ratings dispersion

  1.  

Fig. 4. Dispersions distribution with superimposed moving average line

 

Fig. 5. Ratings plot based on two age groups

 

Table 1. Words with major negative difference of ratings

Word

Group 1 ratings

Group 2 ratings

Word

Group 1 ratings

Group 2 ratings

scouting

1,533

2,133

criterion

1,867

3,267

report

1,667

2,167

hunt

2,8

3,267

knot

1,867

2,4

gift

2,967

3,467

agent

1,967

2,467

exertion

3,067

3,6

decree

2,133

2,633

rate

2,767

3,633

lecture

2

2,667

method

3,167

3,667

acquaintance

2,133

2,667

intention

3,333

3,833

interview

1,833

2,8

dullness

3,467

4

inquiry

2,133

2,8

complaint

3,1

4,033

ordeal

2,333

2,867

concentration

3,7

4,267

wedding

2,3

2,967

opportunity

3,833

4,4

Sunday

2,5

2,967

suffering

3,933

4,433

statistics

2,5

3,067

 

 

 

 

Table 2. Words with major positive difference of ratings

Word

Group 1 ratings

Group 2 ratings

Word

Group 1 ratings

Group 2 ratings

ball

1,767

1,267

prey

2,900

2,433

fog

2,367

1,433

nourishment

2,967

2,467

hall

2,133

1,500

symbol

3,033

2,533

filming

2,567

2,067

turn

3,467

2,900

task

2,588

2,080

change

3,600

3,100

meeting

2,600

2,133

career

3,633

3,167

section

3,067

2,200

secret

3,800

3,167

tempo

3,533

2,300

strain

3,794

3,260

certificate

2,833

2,367

implementation

3,900

3,433

standard

3,133

2,400

restoration

4,300

3,567

 

Fig. 6. Ratings dispersion based on respondents’ education

 

 Table 3. Russian-English ratings

#

Russian word

Rating (Rus.)

Rating (Eng.)

Rating difference

English word

1

sila

340

339

1

strength

2

derevo

606

604

2

tree

3

effekt

288

295

7

effect

 

771

administratcija

559

331

268

administration

 

Fig. 7. Ratings difference in word combinations

 

Fig. 8. Ratings’ difference

 

Fig. 9. Negative difference between machine dictionary and survey results data

 

Fig. 10. Positive difference between machine dictionary and survey results data

Fig. 11. Ratings with extreme and mean values

 

Fig. 12. Word distribution in two-dimensional space of concreteness-specificity (Ivanov & Solovyev 2021)

 

 Table 4. Russian Academic Corpus

Grade

Number of words

Sciences

Humanities

TOTAL

1

21304

4757

26061

2

29284

28235

57519

3

53565

-

53565

4

51489

24621

76110

5

102467

19527

121994

6

-

159664

159664

7

75205

111788

186993

8

-

273251

273251

9

88335

390821

479156

10

207271

656072

863343

11

-

436322

436322

Итого

628920

2105058

2733978

 

Table 5. Ratings of recalls
and textbooks

Subject

Grade

Mean rating

Primary school

1-4

+0,34

Biology

5-7

+0,49

Biology

9-10

+0,15

History

10-11

0

Social studies

5-8

-0,11

Social studies

9-11

-0,15

Literature

6-8

+0,08

Literature

9-11

-0,14

MT Texts

5

0,12

OT Texts

5

0,17

Recalls

5

0,27

 

×

About the authors

Valery D. Solovyev

Kazan (Volga region) Federal University

Email: maki.solovyev@mail.ru
ORCID iD: 0000-0003-4692-2564

Doctor Habil. of Physics and Mathematics, Professor, Chief Researcher of the Text Analytics Research Laboratory

18 Kremlevskaya St., Kazan, 420008, Russia

Yulia A. Volskaya

Kazan (Volga region) Federal University

Email: kovaleva95julia@mail.ru
ORCID iD: 0000-0001-8276-5864

Assistant Professor of the Department of Applied and Experimental Linguistics, and Junior Research Fellow of the Neurocognitive Research Laboratory

18 Kremlevskaya St., Kazan, 420008, Russia

Mariia I. Andreeva

Kazan (Volga region) Federal University; Kazan State Medical University

Email: lafruta@mail.ru
ORCID iD: 0000-0002-5760-0934

holds a PhD degree in Philology and is Associate Professor of the Department of Foreign Languages

18 Kremlevskaya St., Kazan, 420008, Russia; 49 Butlerov St., Kazan, 420012, Russia

Artem A. Zaikin

Kazan (Volga region) Federal University

Author for correspondence.
Email: kaskrin@gmail.com
ORCID iD: 0000-0002-5596-3176

Doctor of Physics and Mathematics and Associate Professor of the Department of Mathematical Statistics

18 Kremlevskaya St., Kazan, 420008, Russia

References

  1. Andreeva, Mariia, Marina Solnyshkina, Artem Zaikin, Olga Bukach & Radif Zamaletdinov. 2020. Assessment of comparative abstractness: Quantitative approach. Proceedings of the Computational Models in Language and Speech Workshop (CMLS 2020) co-located with 16th International Conference on Computational and Cognitive Linguistics (TEL 2020). 132-144.
  2. Black, Paul. 2019. Manhattan distance. In Dictionary of Algorithms and Data Structures [Online]. http://www.nist.gov/dads/HTML/manhattanDistance.html. (accessed 19.04.2022)
  3. Bolognesi, Marianna, Burgers Christian & Caselli Tommaso. 2020. On abstraction: Decoupling conceptual concreteness and categorical specificity. Cognitive Processing 21 (3). 365-381. DOI: https://doi.org/10.1007/s10339-020-00965-9.
  4. Borghi, Anna M., Ferdinand Binkofski, Cristiano Castelfranchi & Felice Cimatti. 2017. The challenge of abstract concepts. Psychol. Bull 143. 263-292.
  5. Brysbaert, Marc, Amy Beth Warriner & Victor Kuperman. 2014a. Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods 46 (3). 904-911.
  6. Brysbaert, Marc, Michaël Stevens, Simon De Deyne, Simon De Deyne & Gert Storms. 2014b. Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychologica 150. 80-84. https://doi.org/10.1016/j.actpsy.2014.04.010
  7. Chandola, Varun, Arindam Banerjee & Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41(3). 1-58.
  8. Charbonnier, Jean & Wartena Christian. 2019. Predicting word concreteness and imagery. In Proceedings of the 13th International Conference on Computational Semantics-Long Papers. 176-187.
  9. Cristianini, Nello & John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
  10. Coltheart, Max. 1981. The MRC Psycholinguistic Database. Quarterly Journal of Experimental Psychology 33A. 497-505.
  11. Dallin, J Bailey, Christina Nessler, Kiera N Berggren & Julie L Wambaugh. 2020. An Aphasia treatment for verbs with low concreteness: A pilot study. American Journal of Speech-Language Pathology 29 (1). 299-318.
  12. de Groot, Annette M. 1989. Representational aspects of word imageability and word frequency as assessed through word association. Journal of Experimental Psychology: Learning, Memory, and Cognition 15(5). 824-845. https://doi.org/10.1037/0278-7393.15.5.824
  13. Devitt, Ann & Vogel Carl. 2004. The Topology of WordNet: Some Metrics. GWC Proceedings. 106-111.
  14. Devlin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
  15. Evans, James D. 1996. Straightforward Statistics for the Behavioral Ssciences. Brooks/Cole Publishing, Pacific Grove.
  16. Fellbaum, Christiane. 1998. Wordnet: An Electronic Lexical Database. MIT Press. Cambridge, Massachusetts.
  17. Fisher, Douglas, Frey Nancy & Lapp Diane. 2016. Text Complexity: Stretching Readers with Texts and Tasks. Corwin Press.
  18. Fliessbach, Klaus, Susanne Weis, Peter Klaver, Christian E. Elger & Bernhard Weber. 2006. The effect of word concreteness on recognition memory. NeuroImage 32 (3). 1413-1421. https://doi.org/10.1016/j.neuroimage.2006.06.007
  19. Gizatulina, Diana, Farida Ismaeva, Marina Solnyshkina, Ekaterina Martynova & Iskander Yarmakeev. 2020. Fluctuations of text complexity: The case of Basic State Examination in English. In SHS Web of Conferences 88. EDP Sciences.
  20. Ivanov, Vladimir & Solovyev Valery. 2021. The Relation of Categories of Concreteness and Specificity: Russian Data. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2021”. URL: http://www.dialog-21.ru/media/5260/ivanovvplussolovyevv049.pdf. (accessed 19.04.2022).
  21. Joulin, Armand, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou & Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv:1612.03651.
  22. Kousta, Stavroula-Thaleia, Gabriella Vigliocco, David P Vinson & Mark Andrews. 2011. The representation of abstract words: Why emotion matters. Exp Psychol Gen. Feb. 140 (1). 14-34. https://doi.org/10.1037/a0021446.
  23. Krioni, Nikolay K., Alexey D. Nikitin & Anastasiya V. Fillipova. 2008. Avtomatizirovannaya sistema analiza slozhnosti uchebnyh tekstov. Bulletin of Ufa State Technical University of Aviation 11. 1 (28). 101-107. (In Russ.) Kuznecov, Sergey A. 2006. Bol'shoy Tolkovy Slovar' Russkogo Yazyka. Norint. (In Russ.)
  24. Laming, Donald. 2004. Human Judgement: The Eye of the Beholder. London: Thompson Learning.
  25. Lukashevich, Natilia V. 2011. Thesauruses in Information Search Tasks. M.: Izd-vo Moskovskogo universiteta. (In Russ.)
  26. Maximilian, Köper & Sabine Schulte im Walde. 2016. Automatically generated affective norms of abstractness, arousal, imageability and valence for 350 000 German lemmas. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 2595-2598.
  27. McCarthy, Kathryn Soo, Danielle Siobhan Mcnamara, Marina I. Solnyshkina, Fanuza Kh. Tarasova & Roman V. Kupriyanov. 2019. The Russian language test: Towards assessing text comprehension. Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Serii a 2, Iazykoznanie; Volgograd 18 (4). 231-247.
  28. McNamara, Danielle, Arthur C. Graesser, Philip M. Mccarthy & Zhiqiang Cai. 2014. Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge, MA: Cambridge University Press.
  29. Mestres-Missé, Anna, Thomas F. Münte & Antoni Rodriguez-Fornells. 2014. Mapping concrete and abstract meanings to new words using verbal contexts. Second Language Research 30 (2). 191-223.
  30. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado & Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. arΧiv:1310.4546.
  31. Miller, George A. 1998. Nouns in WordNet. In Christiane Fellbaum (ed.), Wordnet: An electronic lexical database mit press. Cambridge, Massachusetts.
  32. Mkrtychian, Nadezhda, Evgeny Blagovechtchenski, Diana Kurmakaeva, Daria Gnedykh, Svetlana Kostromina & Yury Shtyrov. 2019. Concrete vs. Abstract Semantics: From mental representations to functional brain mapping. Frontiers in Human Neuroscience 13. 267. https://doi.org/10.3389/fnhum.2019.00267
  33. Naumann, Daniela, Diego Frassinelli & Sabine Schulte im Walde. 2018. Quantitative semantic variation in the contexts of concrete and abstract words. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, New Orleans, LA. 76-85.
  34. Paivio, Allan. 1965. Abstractness, imagery, and meaningfulness in paired-associate learning. Journal of Verbal Learning and Verbal Behaviour 4. 32-38. https://doi.org/10.1016/s0022-5371(65)80064-0
  35. Paivio, Allan. 1990. Dual Coding Theory, in Mental Representations: A Dual Coding Approach. Oxford: Oxford University Press. 53-83. https://doi.org/10.1093/acprof:oso/9780195066661.003.0004
  36. Pasquale, A. Della Rosa, Eleonora Catricalà, Gabriella Vigliocco & Stefano F. Cappa. 2010. Behavior Research Methods Beyond the abstract-concrete dichotomy: Mode of acquisition, concreteness, imageability, familiarity, age of acquisition, context availability, and abstractness norms for a set of 417 Italian. Behavior Research Methods 42 (4). 1042-1048. https://doi.org/10.3758/BRM.42.4.1042
  37. Peti-Stantić, Anita, Maja Anđel, Vedrana Gnjidić, Gordana Keresteš, Nikola Ljubešić, Irina Masnikosa, Mirjana Tonković, Jelena Tušek, Jana Willer-Gold & Mateusz-Milan Stanojević. 2021. The Croatian Psycholinguistic Database: Estimates for 6000 Nouns, Verbs, Adjectives and Adverbs. 1-18. https://doi.org/10.3758/s13428-020-01533-x
  38. Reilly, Megan, & Rutvik H. Desai. 2017. Effects of semantic neighborhood density in abstract and concrete words. Cognition 169. 46-53. https://doi.org/10.1016/j.cognition.2017.08.004
  39. Rosch, Eleanor. 1975. Cognitive representations of semantic categories. Journal of Experimental Psycholology: General 104 (3). 192-233.
  40. Sadoski, Mark, Wiliam A. Kealy, E. T. Goetz & Allan Paivio. 1997. Concreteness and imagery effects in the written composition of definitions. Journal of Educational Psychology 89(3). 518-526. https://doi.org/10.1037/0022-0663.89.3.518
  41. Sadoski, Mark. 2001. Resolving the effects of concreteness on interest, comprehension, and learning important ideas from text. Educational Psychology Review 13(3). 263-281.
  42. Schmid, Hans-J¨org. 2000. English Abstract Nouns as Conceptual Shells: From Corpus to Cognition. Topics in English Linguistics. Berlin: Mouton de Gruyter.
  43. Schwanenflugel, Paula J. & Edward J. Shoben. 1983. Differential context effects in the comprehension of abstract and concrete verbal materials. Journal of Experimental Psychology: Learning, Memory, and Cognition 9 (1). 82-102. https://doi.org/1037/0278-7393.9.1.82
  44. Schwanenflugel, Paula J., Carolyn Akin & Wei-Ming Luh. 1992. Context availability and the recall of abstract and concrete words. Memory & Cognition 20 (1). 96-104. https://doi.org/10.3758/bf03208259
  45. Snefjella, Bryor, Michel Généreux & Victor Kuperman. 2019. Historical evolution of concrete and abstract language revisited. Behavior Research Methods 51 (4). 1693-1705.
  46. Solnyshkina, Marina I., Radif. R. Zamaletdinov, Ehl'zara Gizzatullina-Gafiyatova, Diana Gizatulina & Maria Begaeva. 2021. Mnogofaktorny analiz slozhnosti teksta. Inostrannye Yazyki v Shkole. 28-34. (In Russ.)
  47. Solovyev, Valery D., Vladimir V. Ivanov & Rauf B. Akhtiamov. 2019a. Dictionary of abstract and concrete words of the Russian language: A methodology for creation and application. Journal of Research in Applied Linguistics 10. 215-227.
  48. Solovyev, Valery, Mariia Andreeva, Marina Solnyshkina, Radif Zamaletdinov, Andrey Danilov & Dina Gaynutdinova. 2019b. Computing concreteness ratings of Russian and English most frequent words: Contrastive approach. In the Proceedings of the 12th International Conference on Developments in eSystems Engineering (DeSE). 403-408.
  49. Solovyev, Valery D., Vladimir V. Bochkarev & S. V. Khristoforov. 2020a. Generation of a dictionary of abstract/concrete words by a multilayer neural network. Journal of Physics: Conference Series 1680 (1). 012046.
  50. Solovyev, Valery, Marina Solnyshkina, Mariia Andreeva, Andrey Danilov & Radif Zamaletdinov. 2020b. Text Complexity and Abstractness: Tools for the Russian Language. Proceedings of the International Conference “Internet and Modern Society”. 75-87.
  51. Solovyev, Valery. 2021. Concreteness/Abstractness Concept: State of the Art. Advances in Intelligent Systems and Computing 1358. 275-283.
  52. Spreen, Otfried & Rudolph W. Schulz. 1966. Parameters of abstraction, meaningfulness, and pronunciability for 329 nouns. Journal of Verbal Learning and Verbal Behavior 5. 459-468.
  53. Taylor, Linda & Weir Cyril J. 2012. IELTS Collected Papers 2: Research in Reading and Listening Assessment 2. Cambridge University Press.
  54. Turney, Peter D. & Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37. 141-188.
  55. Vergallito, Alessandra, Marco Alessandro Petilli & Marco Marelli. 2020. Perceptual modality norms for 1,121 Italian words: A comparison with concreteness and imageability scores and an analysis of their impact in word processing tasks. Behavior Research Methods. 1-18.
  56. Vinogradov, Victor V. 2001. Russian language (Grammatical studies of a word). Russian Language. (In Russ.)
  57. Vol'skaia, Iulia A. 2020. Creating a dictionary of abstract beings in the Russian language: A criterion for selecting vocabulary. Philology and Culture 1 (59). 13-17. (In Russ.)
  58. Volskaya, Yulia A., Irina S. Zhuravkina & Alexander P. Lobanov. 2020. Dictionary of abstract the words of the Russian language: Nouns with high numerical measure of abstractness. International Journal of Criminology and Sociology 9. 2398-2405.
  59. Wang, X. & Y Bi. 2021. Idiosyncratic tower of Babel: Individual differences in word-meaning representation increase as word abstractness increases. Psychological Science 32(10). 1617-1635.
  60. Yao, Zhao, Jia Wu, Yanyan Zhang & Zhenhong Wang. 2017. Norms of valence, arousal, concreteness, familiarity, imageability, and context availability for 1,100 Chinese words. Behav Res 49. 1374-1385. https://doi.org/10.3758/s13428-016-0793-2
  61. Zhuravkina, Irina, Valery Soloviev, Alexander Lobanov & Andrey Danilov. 2020. Comparative analysis of concreteness abstractness of Russian words. In Conference of Open Innovation Association, FRUCT. 464-470.
  62. Lyashevskay Olga N. & Sharoff S.A. 2009. New Russian frequency dictionary. (In Russ.) http://dict.ruslang.ru/freq.php (accessed 28.12.2021).
  63. Small Academic Dictionary. 1981-1984. (In Russ.) https://gufo.me/dict/mas (accessed 28.05.2021).
  64. Russian National Corpus. (In Russ.) http://www.ruscorpora.ru/ (accessed 28.12.2021).
  65. Russian Semantic Dictionary. 1998. In Shvedova N.Yu. (ed.). ‘Azbukovnik’ (In Russ.)
  66. RuThes Thesaurus. (In Russ.) http://www.labinform.ru/pub/ruthes/index.htm (accessed 28.12.2021).
  67. Technologies of Compiling Semantic Electronic Dictionaries. (In Russ.) https://kpfu.ru/tehnologiya-sozdaniya-semanticheskih-elektronnyh.html (accessed 28.12.2021).
  68. Cohmetrix. http://cohmetrix.com/ (accessed 28.12.2021).
  69. Corpus of Contemporary American English. https://www.english-corpora.org/coca (accessed 28.05.2021).
  70. Google Books Ngram. https://books.google.com/ngrams (accessed 28.12.2021).
  71. FastText. Library for efficient text classification and representation learning. https://fasttext.cc/ (accessed 28.12.2021).

Copyright (c) 2022 Solovyev V.D., Volskaya Y.A., Andreeva M.I., Zaikin A.A.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies