Textometr: an online tool for automated complexity level assessment of texts for Russian language learners

Abstract

Evaluation of text accessibility seems to be an extremely urgent and labor-consuming task in the process of preparing texts for teaching Russian as a foreign language. On the other hand, the procedure of assigning a text to one of the levels on the CEFR scale (from A1 to C2) is well-formalized and described in the professional literature, which opens opportunities for its automation. This paper presents Textometr - a new free web-based tool for estimating CEFR level and other key statistics from any given text in Russian that can be relevant for adapting it for foreign students. The automated assessment of the text level here is based on a regression model, trained on the dataset of more than 800 texts from Russian textbooks for foreigners, applying several machine learning and natural language processing methods. In addition to the CEFR level, the tool provides information relevant for adapting the text to educational tasks: lists of keywords and words for a potential vocabulary list, statistics on the text coverage by frequency lists and CEFR-graded vocabulary lists (lexical minima), a frequency list of the text, a forecast of the time needed for reading. The tool shortages at the current stage of development and suggested ways to solve them are also discussed. Finally, the results of the test on the tool quality and the vectors for its further development are reported. Textometr can provide helpful information not only to teachers and guidance teachers, but to authors of textbooks and publishers to check the compliance of the text content with the declared level and educational goals.

Full Text

 

Figure 1. Interface of Textometr

Figure 2. Average sentence length values by CEFR Level

 

Parameter values of the text from “Zhili-byli” textbook obtained by Textometr

Parameter

Value

Text level declared in the textbook

А1

Predicted by Textometr level

A1. Elementary level

Words

200

Unique words

121

Lexical diversity

0.6

Sentences

22

Average sentence length

6.57

Keywords

Крым, Ялта, поезд, Симферополь,
час, автобус, интересный, поездка

Most useful words

Берег, деревня, во-первых, купе, выбирать,
пешком, задание, через, во-вторых, домашний,
есть, самый, необычный, узнавать

Text coverage by A1 vocabulary list

87% of text

Words out of A1 vocabulary list

Во-первых, выбирать, ботанический, уютный, экспресс,
задание, купе, через, необычный, современность,
чудесный, деревня, пешком, во-вторых, ореанда,
самый, берег, домашний, городок, узнавать

Text coverage by A2 vocabulary list

92% of text

Words out of A2 vocabulary list

Современность, чудесный, во-первых, купе,
ботанический, уютный, экспресс, во-вторых,
ореанда, ну, необычный, городок, узнавать

Text coverage by B1 vocabulary list

95% of text

Words out of B1 vocabulary list

Современность, чудесный, купе, ботанический,
уютный, экспресс, ореанда, необычный, городок

Text coverage by B2 vocabulary list

98% of text

Words out of B2 vocabulary list

Современность, городок, ореанда, ботанический

Text coverage by C1 vocabulary list

98% of text

Words out of C1 vocabulary list

Городок, ореанда, ботанический

Text coverage by frequency list 5 000

96% of text

Useful words that are out of lexical minima

Узнавать

Rare words

Экспресс, ботанический, ореанда

Detail reading for details will take

7 min

Skimming reading will take

4 min

Possible grammar topics

Prepositional case

Frequency list of the text

В 13; мы 13; и 9; быть 6; на 6; я 5;
чаc 4; интересный 3; Крым 3; ...

 

×

About the authors

Antonina N. Laposhina

Pushkin State Russian Language Institute

Author for correspondence.
Email: ANLaposhina@pushkin.institute

leading expert, Laboratory of Cognitive and Linguistic Studies

6 Akademika Volgina St, Moscow, 117485, Russian Federation

Maria Yu. Lebedeva

Pushkin State Russian Language Institute

Email: MULebedeva@pushkin.institute

Candidate of Philology, leading researcher of the Laboratory of Cognitive and Linguistic Research, Associate Professor of the Department of Methods of Teaching Russian as a Foreign Language

6 Akademika Volgina St, Moscow, 117485, Russian Federation

References

  1. Alexander, P.A., & Jetton, T.L. (1996). The role of importance and interest in the processing of text. Educational Psychology Review, 8(1), 89–121.
  2. Arutyunov, A.R. (1990). Theory and practice of creating a textbook of the Russian language for foreigners. Moscow: Russkii Yazyk Publ. (In Russ.)
  3. Bim, I.L. (1977). Methods of teaching foreign languages as a science and problems of a school textbook. Moscow: Russkii Yazyk Publ. (In Russ.)
  4. Chen, X., & Meurers, D. (2016). Characterizing text difficulty with word frequencies. Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (June 16, 2016), 11, 84–94. San Diego, CA, USA.
  5. DuBay, W. (2004). The principles of readability. Costa Mesa, CA: Impact Information.
  6. Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse. The Elementary School Journal, 15(2), 210–229.
  7. Karpov, N., Baranova, J., & Vitugin, F. (2014). Single-sentence readability prediction in Russian. Proceedings of Analysis of Images, Social Networks, and Texts conference (AIST), (3), 91–100.
  8. Keskisärkkä, R., & Jönsson, A. (2013). Investigations of synonym replacement for Swedish. Northern European Journal of Language Technology, (3), 41–59.
  9. Laposhina, A.N. (2018). Insights from an experimental study on the text complexity for Russian as a foreign language. The Dynamics of Linguistic and Cultural Processes in Modern Russia: Proceedings of the VI Congress of ROPRYAL, (6), 1544–1549. (In Russ.)
  10. Laposhina, A.N. (2020). A corpus of Russian textbook materials for foreign students as an instrument of an educational content analysis. Russian Language Abroad, (6(283)), 22–28. (In Russ.)
  11. Laposhina, A.N., & Lebedeva, M.U. (2019). Corpus approach to vocabulary selection for learning Russian as a foreign language. Slavica Helsingiensia, (52), 359–368. (In Russ.)
  12. Laposhina, А.N., Veselovskaya, Т.S., Lebedeva, M.U., & Kupreshchenko, O.F. (2018). Automated text readability assessment for Russian second language learners. Dialogue 2018: Proceedings of the International Conference, 17(24), 396–406.
  13. Mikk, Ya.A. (1981). Optimizing the complexity of educational text: A help for authors and editors. Moscow: Prosveshchenie Publ. (In Russ.)
  14. Miller, L.V., Politova, L.V., & Rybakova, I.A. (2016). Once upon a time... 28 Russian lessons for beginners: Textbook. Saint Petersburg: Zlatoust Publ. (In Russ.)
  15. Morkovkin, V.V. (Ed.). (2003). The system of lexical minima of the modern Russian language: 10 lexical lists: From 500 to 5000 of the most important Russian words. Moscow: Astrel Publ. (In Russ.)
  16. Nation, P. (2006). How Large a vocabulary is needed for reading and listening? Canadian Modern Language Review, (63), 59–81.
  17. Qian, D.D. (2002). Investigating the relationship between vocabulary knowledge and academic reading performance: An assessment perspective. Language Learning, 52(3), 513–536.
  18. Reynolds, R. (2016). Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories. Proceedings of the 11th Workshop on the Innovative Use of NLP for Building Educational Applications, 11, 289–300.
  19. Sharoff, S., Kurella, S., & Hartley, A. (2008). Seeking needles in the web’s haystack: Finding texts suitable for language learners. Proceedings of the 8th Teaching and Language Corpora Conference (TaLC-8) (pp. 365–370). Lisbon.
  20. Sharoff, S., Umanskaya, E., & Wilson, J. (2013). A frequency dictionary of Russian: Core vocabulary for learners. New York: Routledge.
  21. To, V., & Le, T. (2013). Lexical density and readability: A case study of English textbooks. Proceedings of the Australian Systemic Functional Linguistics Association Conference (October 1–3, 2013) (pp. 61–71). Melbourne.
  22. Tomina, Yu.A. (1985). Objective assessment of the language difficulty of texts (description, narration, reasoning, argumentation) (Candidate dissertation, Moscow). (In Russ.)
  23. Vyatyutnev, M.N. (1984). Textbook theory of Russian as a foreign language (methodological foundations). Moscow: Russkii Yazyk Publ. (In Russ.)
  24. Zaliznak, A.A. (1967). Russian nominal infleсtion. Moscow: Nauka Publ. (In Russ.)

Supplementary files

Supplementary Files
Action
1. Figure 1. Interface of Textometr

Download (104KB)
2. Figure 2. Average sentence length values by CEFR Level

Download (76KB)

Copyright (c) 2021 Laposhina A.N., Lebedeva M.Y.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies