Textometr: an online tool for automated complexity level assessment of texts for Russian language learners
- Authors: Laposhina A.N.1, Lebedeva M.Y.1
-
Affiliations:
- Pushkin State Russian Language Institute
- Issue: Vol 19, No 3 (2021)
- Pages: 331-345
- Section: Mediadidactics and electronic means of instruction
- URL: https://journals.rudn.ru/russian-language-studies/article/view/27498
- DOI: https://doi.org/10.22363/2618-8163-2021-19-3-331-345
Cite item
Full Text
Abstract
Evaluation of text accessibility seems to be an extremely urgent and labor-consuming task in the process of preparing texts for teaching Russian as a foreign language. On the other hand, the procedure of assigning a text to one of the levels on the CEFR scale (from A1 to C2) is well-formalized and described in the professional literature, which opens opportunities for its automation. This paper presents Textometr - a new free web-based tool for estimating CEFR level and other key statistics from any given text in Russian that can be relevant for adapting it for foreign students. The automated assessment of the text level here is based on a regression model, trained on the dataset of more than 800 texts from Russian textbooks for foreigners, applying several machine learning and natural language processing methods. In addition to the CEFR level, the tool provides information relevant for adapting the text to educational tasks: lists of keywords and words for a potential vocabulary list, statistics on the text coverage by frequency lists and CEFR-graded vocabulary lists (lexical minima), a frequency list of the text, a forecast of the time needed for reading. The tool shortages at the current stage of development and suggested ways to solve them are also discussed. Finally, the results of the test on the tool quality and the vectors for its further development are reported. Textometr can provide helpful information not only to teachers and guidance teachers, but to authors of textbooks and publishers to check the compliance of the text content with the declared level and educational goals.
Full Text
Figure 1. Interface of Textometr
Parameter values of the text from “Zhili-byli” textbook obtained by Textometr
Parameter | Value |
Text level declared in the textbook | А1 |
Predicted by Textometr level | A1. Elementary level |
Words | 200 |
Unique words | 121 |
Lexical diversity | 0.6 |
Sentences | 22 |
Average sentence length | 6.57 |
Keywords | Крым, Ялта, поезд, Симферополь, |
Most useful words | Берег, деревня, во-первых, купе, выбирать, |
Text coverage by A1 vocabulary list | 87% of text |
Words out of A1 vocabulary list | Во-первых, выбирать, ботанический, уютный, экспресс, |
Text coverage by A2 vocabulary list | 92% of text |
Words out of A2 vocabulary list | Современность, чудесный, во-первых, купе, |
Text coverage by B1 vocabulary list | 95% of text |
Words out of B1 vocabulary list | Современность, чудесный, купе, ботанический, |
Text coverage by B2 vocabulary list | 98% of text |
Words out of B2 vocabulary list | Современность, городок, ореанда, ботанический |
Text coverage by C1 vocabulary list | 98% of text |
Words out of C1 vocabulary list | Городок, ореанда, ботанический |
Text coverage by frequency list 5 000 | 96% of text |
Useful words that are out of lexical minima | Узнавать |
Rare words | Экспресс, ботанический, ореанда |
Detail reading for details will take | 7 min |
Skimming reading will take | 4 min |
Possible grammar topics | Prepositional case |
Frequency list of the text | В 13; мы 13; и 9; быть 6; на 6; я 5; |
About the authors
Antonina N. Laposhina
Pushkin State Russian Language Institute
Author for correspondence.
Email: ANLaposhina@pushkin.institute
leading expert, Laboratory of Cognitive and Linguistic Studies
6 Akademika Volgina St, Moscow, 117485, Russian FederationMaria Yu. Lebedeva
Pushkin State Russian Language Institute
Email: MULebedeva@pushkin.institute
Candidate of Philology, leading researcher of the Laboratory of Cognitive and Linguistic Research, Associate Professor of the Department of Methods of Teaching Russian as a Foreign Language
6 Akademika Volgina St, Moscow, 117485, Russian FederationReferences
- Alexander, P.A., & Jetton, T.L. (1996). The role of importance and interest in the processing of text. Educational Psychology Review, 8(1), 89–121.
- Arutyunov, A.R. (1990). Theory and practice of creating a textbook of the Russian language for foreigners. Moscow: Russkii Yazyk Publ. (In Russ.)
- Bim, I.L. (1977). Methods of teaching foreign languages as a science and problems of a school textbook. Moscow: Russkii Yazyk Publ. (In Russ.)
- Chen, X., & Meurers, D. (2016). Characterizing text difficulty with word frequencies. Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (June 16, 2016), 11, 84–94. San Diego, CA, USA.
- DuBay, W. (2004). The principles of readability. Costa Mesa, CA: Impact Information.
- Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse. The Elementary School Journal, 15(2), 210–229.
- Karpov, N., Baranova, J., & Vitugin, F. (2014). Single-sentence readability prediction in Russian. Proceedings of Analysis of Images, Social Networks, and Texts conference (AIST), (3), 91–100.
- Keskisärkkä, R., & Jönsson, A. (2013). Investigations of synonym replacement for Swedish. Northern European Journal of Language Technology, (3), 41–59.
- Laposhina, A.N. (2018). Insights from an experimental study on the text complexity for Russian as a foreign language. The Dynamics of Linguistic and Cultural Processes in Modern Russia: Proceedings of the VI Congress of ROPRYAL, (6), 1544–1549. (In Russ.)
- Laposhina, A.N. (2020). A corpus of Russian textbook materials for foreign students as an instrument of an educational content analysis. Russian Language Abroad, (6(283)), 22–28. (In Russ.)
- Laposhina, A.N., & Lebedeva, M.U. (2019). Corpus approach to vocabulary selection for learning Russian as a foreign language. Slavica Helsingiensia, (52), 359–368. (In Russ.)
- Laposhina, А.N., Veselovskaya, Т.S., Lebedeva, M.U., & Kupreshchenko, O.F. (2018). Automated text readability assessment for Russian second language learners. Dialogue 2018: Proceedings of the International Conference, 17(24), 396–406.
- Mikk, Ya.A. (1981). Optimizing the complexity of educational text: A help for authors and editors. Moscow: Prosveshchenie Publ. (In Russ.)
- Miller, L.V., Politova, L.V., & Rybakova, I.A. (2016). Once upon a time... 28 Russian lessons for beginners: Textbook. Saint Petersburg: Zlatoust Publ. (In Russ.)
- Morkovkin, V.V. (Ed.). (2003). The system of lexical minima of the modern Russian language: 10 lexical lists: From 500 to 5000 of the most important Russian words. Moscow: Astrel Publ. (In Russ.)
- Nation, P. (2006). How Large a vocabulary is needed for reading and listening? Canadian Modern Language Review, (63), 59–81.
- Qian, D.D. (2002). Investigating the relationship between vocabulary knowledge and academic reading performance: An assessment perspective. Language Learning, 52(3), 513–536.
- Reynolds, R. (2016). Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories. Proceedings of the 11th Workshop on the Innovative Use of NLP for Building Educational Applications, 11, 289–300.
- Sharoff, S., Kurella, S., & Hartley, A. (2008). Seeking needles in the web’s haystack: Finding texts suitable for language learners. Proceedings of the 8th Teaching and Language Corpora Conference (TaLC-8) (pp. 365–370). Lisbon.
- Sharoff, S., Umanskaya, E., & Wilson, J. (2013). A frequency dictionary of Russian: Core vocabulary for learners. New York: Routledge.
- To, V., & Le, T. (2013). Lexical density and readability: A case study of English textbooks. Proceedings of the Australian Systemic Functional Linguistics Association Conference (October 1–3, 2013) (pp. 61–71). Melbourne.
- Tomina, Yu.A. (1985). Objective assessment of the language difficulty of texts (description, narration, reasoning, argumentation) (Candidate dissertation, Moscow). (In Russ.)
- Vyatyutnev, M.N. (1984). Textbook theory of Russian as a foreign language (methodological foundations). Moscow: Russkii Yazyk Publ. (In Russ.)
- Zaliznak, A.A. (1967). Russian nominal infleсtion. Moscow: Nauka Publ. (In Russ.)