Lexical enrichment of philological textbooks: corpus and statistical approaches

Cover Page

Cite item

Abstract

The relevance of the study is determined by the need to study objective data on vocabulary frequency in Russian language textbooks and mastering vocabulary in teaching Russian as the native language at school. The article describes the experience of creating a frequency dictionary of philological textbooks based on the linguistic corpus of textbooks on the Russian language and literature for 5-7 grades. Philological textbooks present an average model of the Russian language and literature, reflecting topics relevant to the student and gradually increasing the volume of lexical complexity. The aim of the article is to assess lexical enrichment in philological textbooks for 5-7 grades and to improve the methodology for compiling frequency lists. The study was carried out on the material of a corpus including 66 textbooks on the Russian language and Literature with the total size of 1,553,224 tokens. Methods of corpus and computational linguistics methods, comparative-contrastive, and statistical methods (IKSWEB program, the Google Colab environment, the Pandas, NLTK and Pymorphy libraries) revealed that the frequency list of the 5th grade comprises 8984 lemmas; the 6th grade, 7572 lemmas; the 7th grade, 7321 lemmas. Vocabulary “enrichment” in the 6th grade consists of 258 lexemes, and in the 7th grade, 150 lexemes. The lexical core of the three frequency lists are words of the thematic groups “Philological terms”, “Verbs denoting educational actions”, “Nature”, “Family and friendly relations”, “Art”, and “Time”. The 6th grade vocabulary “enrichment” includes archaisms and historicisms, terms denoting forms of the national language, and word-formation terms. The 7th grade “enrichment” comprises of linguistic terms on the themes “Names of verb forms”, “Religion”, and socio-political vocabulary. The frequency lists confirmed the hypothesis about the thematic balance of texts in modern textbooks on the Russian language and Literature and linguistics terminology being the core in the textbooks. The prospects of the study are seen in conducting a similar research of educational texts in Philology and other subjects form the textbooks for senior school in order to define intra- and meta-subject links.

Full Text

Table 1
Size of the research corpus

Grade

Subject

Textbooks

Volume in wordforms

5

Russian

12

352332

6

Russian

12

323259

7

Russian

8

355296

In total

32

1030887

5

Literature

12

184 936

6

Literature

12

178 619

7

Literature

10

158 782

In total

34

522 337

TOTAL

66

1 553 224

Table 2
Frequency of philological vocabulary in textbooks of grades 5–7

Grade 5

Grade 6

Grade 7

Lemma

Freq (ipt)

Lemma

Freq (ipt)

Lemma

Freq (ipt)

right

128

sound

52

Participle

67

today

125

Russia

44

to write

39

future

101

story

39

passive

31

noun

98

to read

38

verbal participle

29

at first

98

category

37

adverb

26

infinitive

92

morning

37

verbal

26

mood

69

professionalism

34

to write

35

fragment

59

Old Slavonism

32

to name

21

folklore

39

barely

24

prodigal

18

qualitative

16

pecheneg

22

circumstantial

17

Figure 1. Normalized frequency of the “coverage”
Source: Compiled by Kh.N. Galimova, E.V. Martynova, S.A. Moskvitcheva using the Microsoft Excel program.

Figure 2. Vocabulary enrichment lists at stages 5→6 and 6→7
Source: Compiled by Kh.N. Galimova, E.V. Martynova, S.A. Moskvitcheva  using the Microsoft Excel program.

Table 3
Vocabulary enrichment in textbooks of Grades 6 and 7

Rank

Grades 56

Grades 67

Lemma

Frequency

Лемма

Frequency

1

category

37

Participle

67

2

definitive

36

passive

31

3

professionalism

34

verbal participle

29

4

Old Slavonic

32

adverb

26

5

suffix

28

verbal

26

6

archaism

26

saint

19

7

Arshin

26

cultural

19

8

jargon

24

covenant

17

9

Pecheneg

22

circumstantial

17

10

istorizm

22

opponent

11

Table 4
Thematic vocabulary enrichment in Textbooks of Grades 6–7

Grades 56

Grades 67

Vocabulary

Number of words, %

Vocabulary

Number of words, %

Obsolete words (archaisms-historicisms)

25

Names of verb forms

25

Terms characterizing forms of the national  language

25

Religion

30

Word-formation terms

35

Socio-political

25

Other

15

Other

20

 

 

1 Алексеев П.М. Частотные словари : учебное пособие. СПб. : Изд-во С.-Петерб.  ун-та, 2001. 156 с.

2 Закон Ципфа: fr = c, где f — частота встречаемости слова в тексте; r — ранг, порядковый номер; c — постоянная величина, значение которой различается для разных языков.

3 Баранов М.Т., Ладыженская Т.А., Тростенцова Л.А. и др. Русский язык. 6 класс: учебник для общеобразоват. организаций : в 2 частях / науч. ред. Н.М. Шанский. 5-е изд. М. : Просвещение, 2015. 191 с. и 175 с.

4 Здесь и далее в скобках указана частотность слова в частотном словаре соответствующего класса.

[5] Приказ Министерства просвещения РФ от 21 сентября 2022 г. № 858 «Об утверждении федерального перечня учебников, допущенных к использованию при реализации имеющих государственную аккредитацию образовательных программ начального общего, основного общего, среднего общего образования организациями, осуществляющими образовательную деятельность и установления предельного срока использования исключенных учебников».

6 Библиографические данные исследовательского корпуса и список источников размещены на сайте НИЛ «Мультидисциплинарные исследования текста». URL : http://surl.li/zgmoqu (дата обращения : 24.06.2024).

7 Федеральный институт промышленной собственности. URL : https://www.fips.ru/elektronnye-servisy/informatsionno-poiskovaya-sistema/index.php  (дата обращения : 15.05.2024).

8 SEO инструменты. URL : https://iksweb.ru/ (дата обращения : 15.05.2024).

9 Добро пожаловать в Colab! URL : colab.research.google.com/  (дата обращения : 15.05.2024).

10 PANDAS. URL : https://blog.skillfactory.ru/glossary/pandas/ (дата обращения : 15.05.2024).

11 NLTK. URL : https://www.nltk.org/   (дата обращения : 15.05.2024).

12 Морфологический анализатор pymorphy2. URL : https://pymorphy2.readthedocs.io/en/stable/ (дата обращения: 15.05.2024).

13 Штейнфельдт Э.А. Частотный словарь современного русского литературного языка : 2 500 наиболее употребительных слов / под ред. В.А. Ицковича. Таллинн : НИИ педагогики СССР, 1968. 316 с

14 Шанский Н.М., Даунене З.П., Бакеева Н.З., Гайдарова М.П., Карашева Н.Б., Судавичене Л.В. 4 000 наиболее употребительных слов русского языка / под ред. действ. члена АПН СССР Н.М. Шанского. М. : Рус. яз., 1979. 712 с.

15 Частотный словарь русского языка / под ред. Л.Н. Засорина. М. : Рус. яз, 1977. 936 с.

[16] Частотный словарь общенаучной лексики / под общ. ред. Е.М. Степановой. М. : Изд-во Моск. ун-та, 1970.

17 Словарь языка Пушкина : в 4 томах / отв. ред. акад. АН СССР В.В. Виноградов.  2-е изд., доп. / Российская академия наук. Ин-т рус. яз. им. В.В. Виноградова. М. : Азбуковник, 2000; Словарь языка Достоевского / гл. редактор Ю.Н. Караулов. М., Азбуковник, вып. 1, 2001. 442 с., вып. 2, 2003, 510 с.; Словарь поэтического языка Марины Цветаевой :  в 4 томах. Т. 1 : А-Г / отв. ред. М.Ю. Белякова. М. : Дом-музей Марины Цветаевой, 1996. 320 с.

18 Ляшевская О.Н., Шаров С.А. Новый частотный словарь русской лексики. URL : http://dict.ruslang.ru/freq.php  (дата обращения : 20.05.2024).

19 Национальный корпус русского языка. URL : http.www.ruscorpora.ru (дата обращения : 24.06.2024).

20 Большой толковый словарь русских существительных : свыше 15000 имен существительных, идеографическое описание, синонимы, антонимы / ред. Л.Г. Бабенко. 2-е изд., стереотип. М. : АСТ-ПРЕСС, 2008. 864 с.

21 В скобках указаны совокупные доли лексики соответствующей тематической группы.

22 В скобках указана нормализованная частотность слова в корпусе — Freq (ipt).

23 В скобках указана нормализованная на 1000 словоупотреблений частотность  в учебнике 6 класса.

24 В скобках указана нормализованная на 1000 словоупотреблений частотность  в учебнике 7 класса.

25 Рабочая программа (ID 4220440) учебного предмета «Русский язык. Базовый уровень» для обучающихся 7 классов. URL : https://1school-lobnya.ru/assets/files/program/2024-2025/2024_7_Русский%20язык.pdf (дата обращения : 12.06.2024).

×

About the authors

Khalida N. Galimova

Kazan (Volga Region) Federal University

Author for correspondence.
Email: galikha@mail.ru
ORCID iD: 0000-0003-1817-5004
SPIN-code: 7931-3389

PhD in Philology, Senior Researcher at the Multidisciplinary Text Investigation Research Institute of Philology and Intercultural Communication

18 Kremlevskaya St, Kazan, 420008, Russian Federation

Ekaterina V. Martynova

Kazan (Volga Region) Federal University

Email: katerinamarty@yandex.ru
ORCID iD: 0000-0001-5883-0718
SPIN-code: 9431-7981

Senior Lecturer at the Department of Theory and Practice of Teaching Foreign Languages, Junior Researcher at the Multidisciplinary Text Investigation Research Institute of Philology and Intercultural Communication

18 Kremlevskaya St, Kazan, 420008, Russian Federation

Svetlana A. Moskvitcheva

RUDN University

Email: moskvitcheva-sa@rudn.ru
ORCID iD: 0000-0002-8047-7030
SPIN-code: 9596-7692

PhD in Philology, Associate Professor of the General and Russian Linguistics Department, Faculty of Philology

6 Miklukho-Maklaya St, Moscow, 117198, Russian Federation

References

  1. Arapov, M.V. (1982). Text and language — integrity and organization. Scientific Journal of the Tartu University. Tartu. 628. (In Russ.).
  2. Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky Wide Web: A collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43, 209–226. https://doi.org/10.1007/s10579-009-9081-4
  3. Blinova, O.V. (2019). Russian low-frequency words and approaches to modeling general language frequency. Socio- and Psycholinguistic Studies, (7), 7–13. (In Russ.).
  4. Churunina, A.A., Solnyshkina, M.I., & Yarmakeev, I.E. (2023). Lexical diversity as a predictor of the complexity of textbooks on the Russian language. Russian Language Studies, 21(2), 212–227. (In Russ.). https://doi.org/10.22363/2618-8163-2023-21-2-212-227
  5. Generalova, E.V. (2019). Obsolescent vocabulary of the Russian language: educational and lexicographic interpretation issues. Journal of Applied Linguistics and Lexicography, (2), 371–380. (In Russ.). https://doi.org/10.33910/2687-0215-2019-1-2-371-380
  6. Gindin, S.I. (1982). The frequency of the word and its significance in the language system. Tartu Ülikooli Toimetised, (658), 22–54. (In Russ.).
  7. Glinkina, L.A. (2011). Frequency as an important characteristic of lexicography and phraseography. Journal of Historical, Philological and Cultural Studies, (3), 7–11.
  8. Josselson, H. (1953). The Russian word count and frequency analysis of grammatical categories of standard literary Russian. Detroit: Wayne University Press.
  9. Kazachkova, M.B., & Galimova, H.N. (2022). A linguistic corpus of English textbooks creation. Foreign Languages at School, 2, 32–38. (In Russ.).
  10. Korosteleva, L.V. (2013). High-frequency nouns, adjectives and numerals in modern Russian (based on the materials of lexicography): monograph. Nizhnevartovsk: Publishing House of Nizhnevartovsk State University. (In Russ.).
  11. Laposhina, A.N., Veselovskaya, T.S., Lebedeva, M.Yu., & Kupreshchenko, O.F. Lexical composition of the Russian language textbooks for primary school: corpus study. In Computational linguistics and intellectual technologies: based on the materials of the international conference “Dialogue 2019”. Vol. 18 (pp. 351–363). (In Russ.).
  12. Laposhina, A.N., & Lebedeva, M.Yu. (2022). Developing a Russian frequency core vocabulary list for foreign children based on corpus data. Mir Russkogo Slova, (3), 90–99. (In Russ.). https://doi.org/10.24412/1811-1629-2022-3-90-99
  13. Laposhina, A.N., & Lebedeva, M.Yu. (2021). Textometr: an online tool for automated complexity level assessment of texts for Russian language learners. Russian Language Studies, (3), 331–345. (In Russ.). https://doi.org/10.22363/2618-8163-2021-19-3-331-345
  14. Malmkjær, K. (2002). The linguistics encyclopedia. 2nd ed. London; New York: Routledge.
  15. Martynova, E.V., Solnyshkina, M.I, & Merzlyakova, A.R. (2020). Lexical parameters of the academic text (based on the texts of the academic corpus of the Russian language). Philology and Culture, (3), 72–80. https://doi.org/10.26907/2074-0239-2020-61-3-72-80
  16. Nagel, O.V. (2008). Corpus linguistics and its use in computer-based language teaching. Language and Culture, 4, 53–59. (In Russ.).
  17. Nemova, A.N. (2015). Case texts as a cultural code in the process of studying the literature. Nizhny Novgorod Education, (1), 22–26. (In Russ.).
  18. Nesova, N.M., & Bobritskikh, L.Ya. (2018). Representation of the dictionary in theoretical and educational lexicography. RUDN Journal of Language Studies, Semiotics and Semantics, 9(2), 439–450. (In Russ.). https://doi.org/10.22363/2313-2299-2018-9-2-439-450
  19. Orlov, Yu.K. (1978). A model of the frequency structure of vocabulary. Research in computational linguistics and linguostatistics. Moscow State University, 59–118. (In Russ.).
  20. Rudell, A. (1993). Frequency of word usage and perceived word difficulty: Ratings of Kucera and Francis words. Behaviour Research Methods, Instruments, & Computers, (25), 455–463.
  21. Shteifeldt, E. (1963). Frequency dictionary of a modern Russian literary language: 2500 most common words. Tallin.
  22. Solnyshkina, M., & Gafiyatova, E. (2014). Modern forestry English: Macro- and microstructure of low register dictionary. Journal of Language and Literature, 5(4), 220–224. https://doi.org/10.7813/jll.2014/5-4/47
  23. Solnyshkina, M.I., & Gatiyatullina, G.M. (2020). The history of corpus linguistics (on the example of the English language corpora). Tomsk State University Journal of Philology, 63, 133–157. (In Russ.). https://doi.org/10.17223/19986645/63/8
  24. Soloviev, V.D., Solnyshkina, M.I., & McNamara, D.S. (2022). Computational linguistics and discursive complexology: paradigms and research methods. Russian Journal of Linguistics, 26(2), 275–316. (In Russ.). https://doi.org/10.22363/2687-0088-30161
  25. Solovyev, V., Islamov, M., Solnyshkina, M., Kupriyanov, R., & Gafiyatova, E. (2021). Sentiment Analysis for Russian Academic Texts: A Lexicon-Based Approach. In CEUR Workshop Proceedings, 3090 (pp. 89–97).
  26. Turygina, L.A. (1988). Modeling of language structures by means of computer technology. Moscow. (In Russ.).
  27. Tvorogov, O.V. (1995). Gapaks “Words”. In Encyclopedia “Words on Igor's Regiment”. In 5 vol. Vol. 2 (pp.12–15). St. Petersburg: Dmitry Bulanin. (In Russ.).

Supplementary files

Supplementary Files
Action
1. Figure 1. Normalized frequency of the “coverage”
S o u r c e : Compiled by Kh.N. Galimova, E.V. Martynova, S.A. Moskvitcheva using the Microsoft Excel program.

Download (86KB)
2. Figure 2. Vocabulary enrichment lists at stages 5→6 and 6→7
S o u r c e : Compiled by Kh.N. Galimova, E.V. Martynova, S.A. Moskvitcheva using the Microsoft Excel program

Download (38KB)

Copyright (c) 2024 Galimova K.N., Martynova E.V., Moskvitcheva S.A.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies