Linguistic profiling of educational and artistic texts
- Authors: Voronin K.V.1, Ismaeva F.H.1, Danilov A.V.1
-
Affiliations:
- Kazan (Volga Region) Federal University
- Issue: Vol 22, No 4 (2024): LINGUISTIC PROFILES OF RUSSIAN TEXTS: GOING FROM FORM TO MEANING
- Pages: 555-578
- Section: Key Issues of Russian Language Research
- URL: https://journals.rudn.ru/russian-language-studies/article/view/42908
- DOI: https://doi.org/10.22363/2618-8163-2024-22-4-555-578
- EDN: https://elibrary.ru/AWRLUL
Cite item
Full Text
Abstract
Implemented within text analytics as one of the strategic directions of modern Russian linguistics, the research focuses on the linguistic profiling of educational and artistic texts. The identified genre features contribute to software systems development and big language data processing. The study is aimed at revealing the ranges of linguistic parameters which differentiate educational (secondary) and artistic (primary) texts. The study was based on 72 biographies from textbooks on Russian as a foreign language and 90 excerpts from adventure stories. The genres were chosen for the contrastive study for the following reasons: (1) a high degree of narrativity and actions; (2) functional differences, namely, informative function of biography and entertaining function of adventure stories. The research corpus comprised 120932 words. Two data processing tools were used in the study: (1) the linguistic parameters were calculated using RuLingva platform, (2) STATISTIKA program was used to identify statistically significant differences between the two genres. The revealed genre-specific parameters include global and local overlaps of nouns and arguments, nouns in prepositional and genitive cases, past and present verb forms. Global and local overlaps of nouns and arguments were found to contribute to high cohesion of biographies, because the nomination of each next event in the person's life in connected with the previous one. The genitive case prevails in biographies due to the use of nominative word combinations. The research perspective is seen in the typology of genres based on linguistic profiling of official and personal biographies, on the one hand, and adventure novels, on the other hand. An important aspect of further research in the studied area is the installation of meaning matrices in the RuLingva text profiler to automate text linguistic profiling.
Full Text
Table 1
Subcorpus of RFL biographies
Text code1 | Number of texts | Number of tokens |
AnNa_A2 | 3 | 1170 |
AnNa_B1 | 6 | 2718 |
Ag_A2 | 1 | 234 |
Ar_B2 | 6 | 2588 |
BeLu_A2 | 1 | 380 |
BoAg_C1 | 3 | 1595 |
FiDm_A2 | 1 | 142 |
Gr_A2 | 11 | 5015 |
JaSu_C2 | 5 | 1196 |
KaFr_B2 | 1 | 291 |
Li_C1 | 7 | 5750 |
Mo_B1 | 1 | 215 |
MoSi_A1 | 3 | 1187 |
OdNo_A1B2 | 8 | 2785 |
Sa_C1 | 6 | 1512 |
ShKu_B1 | 4 | 2366 |
TiKo_B1 | 5 | 1696 |
TOTAL | 72 | 30840 |
Table 2
Subcorpus of adventure stories
Text code2 | Number of texts | Number of tokens |
Pl_1973 | 21 | 21007 |
St_1974 | 2 | 2004 |
Vu_1975 | 12 | 12023 |
Bo_1984 | 22 | 21995 |
Ga_1990 | 23 | 23054 |
Kn_1990 | 4 | 4022 |
Ch_1991 | 6 | 5987 |
TOTAL | 90 | 90092 |
Fig. 1. Fragment of M.V. Lomonosov biography from RFL textbook by L.V. Moskovkin, L.V. Silvina
Table 3
Differences of linguistic parameters of the texts of two genres
Groups of parameters | Parameter | Mean adventures (n = 90) | Mean biography (n = 72) | p-value* | Difference, % | |
Discourse | 1 | Global noun overlap | 0.03 | 0.14 | < 0.01 | 314.7 |
2 | Local noun overlap | 0.06 | 0.18 | < 0.01 | 171.7 | |
3 | Global argument overlap | 0.16 | 0.32 | < 0.01 | 100.6 | |
4 | Local argument overlap | 0.32 | 0.46 | < 0.01 | 45.9 | |
Morphological | 5 | Prepositional case (Noun) | 27.28 | 55.37 | < 0.01 | 103.0 |
6 | Genitive case (Noun) | 75.03 | 122.38 | < 0.01 | 63.1 | |
Parts of speech | 7 | Average number of adjectives per sentence | 1.15 | 1.81 | < 0.01 | 58.3 |
8 | Average number of nouns per sentence | 3.76 | 5.4 | < 0.01 | 43.6 | |
9 | Adverbs | 68.46 | 39.75 | < 0.01 | 41.9 | |
10 | Adjectives | 98.73 | 138.79 | < 0.01 | 40.6 | |
11 | Verb/Noun ratio | 0.54 | 0.34 | < 0.01 | 36.8 | |
12 | Average number of adverbs per sentence | 0.8 | 0.52 | < 0.01 | 35.7 | |
13 | Nouns | 321.25 | 412.6 | < 0.01 | 28.4 | |
14 | Verbs | 172.49 | 137.44 | < 0.01 | 20.3 | |
15 | Adjective/Noun ratio | 0.3 | 0.34 | < 0.01 | 9.0 | |
Verb tense | 16 | Future tense (Verb) | 6.48 | 0.83 | < 0.01 | 87.2 |
17 | Present tense (Verb) | 41.17 | 19.45 | < 0.01 | 52.7 | |
18 | Past tense (Verb) | 98.78 | 101.26 | < 0.01 | 2.5 | |
Readability | 19 | FKGL (SIS) | 5.82 | 6.86 | < 0.01 | 17.9 |
Note. * p < 0.05 — statistically significant differences. ** The parameters with the largest detected proportion of differences are highlighted.
Figure 2. Differences in linguistic parameters of biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.
Table 4
Metrics of parts-of-speech in biographies and adventure stories
Parameter | Metrics scope | Standard deviation | ||
Adventure | Biography | Adventure | Biography | |
Verbs | 136–212 | 89.74–201.47 | 16.29 | 22.91 |
Nouns | 270–384 | 290.91–503.21 | 25.00 | 47.11 |
Adjectives | 53–134 | 81.82–226.87 | 16.77 | 26.47 |
Adverbs | 50–98 | 6.80–100 | 9.11 | 17.40 |
Verb/Noun ratio | 0.38–0.70 | 0.18–0.68 | 0.07 | 0.09 |
Adjective/Noun ratio | 0.16–0.44 | 0.24–0.46 | 0.05 | 0.05 |
Average number of nouns per sentence | 2.69–5.72 | 3.48–8.62 | 0.66 | 1.10 |
Average number of adjectives per sentence | 0.76–1.77 | 0.93–3.30 | 0.23 | 0.45 |
Average number of adverbs per sentence | 0.52–1.42 | 0.08–1.50 | 0.17 | 0.24 |
Table 5
Discourse parameters of biographies and adventure stories
Parameter | Metrics scope | Standard deviation | ||
Adventure | Biography | Adventure | Biography | |
Global noun overlap | 0.01–0.09 | 0.04–0.39 | 0,02 | 0,07 |
Local noun overlap | 0.00–0.24 | 0.03–0.62 | 0,05 | 0,13 |
Global argument overlap | 0.06–0.29 | 0.13–0.57 | 0,06 | 0,10 |
Local argument overlap | 0.04–0.57 | 0.10–1.14 | 0,13 | 0,21 |
Figure 3. Differences in discourse parameters of biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.
Table 6
Occurrences of category “Case of nouns” in biographies and adventures stories
Parameter | Metrics scope | Standard deviation | ||
Adventure | Biography | Adventure | Biography | |
Prepositional case (Noun) | 13–42 | 22–93 | 6.31 | 17.7 |
Genitive case (Noun) | 50–135 | 40–229 | 12.3 | 35.7 |
Figure 4. Differences in Noun cases parameters in biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.
Table 6
Verb tenses in biographies and adventure stories
Parameter | Metrics scope | Standard deviation | ||
Adventure | Biography | Adventure | Biography | |
Future tense (Verb) | 0–21 | 0–9.09 | 3.89 | 1.94 |
Present tense (Verb) | 21–91 | 0–54.05 | 14.25 | 13.18 |
Past tense (Verb) | 45–147 | 44.84–142.86 | 21.86 | 22.09 |
Figure 5. Differences in Verb tense parameters in biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.
1 Hereinafter in the research the authors K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov developed a code for the textbooks comprising a letter and a number to mark the family names of the textbook authors and the grades.
2] Hereinafter in the research the authors K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov developed a code for the textbooks comprising a letter and a number to mark the family names of the textbook authors and the grades.
About the authors
Konstantin V. Voronin
Kazan (Volga Region) Federal University
Email: voronin.konstantin@outlook.com
assistant at the Department of theory and practice of teaching foreign languages, engineer of the research laboratory ‘Multidisciplinary Text Investigation’ 18 Kremlevskaya St, Kazan, 420008, Russian Federation
Farida H. Ismaeva
Kazan (Volga Region) Federal University
Email: fismaeva@yandex.ru
ORCID iD: 0000-0003-4496-0700
SPIN-code: 4728-3163
Scopus Author ID: 57191851333
ResearcherId: B-5420-2016
PhD in Philology, Associate Professor, Associate Professor of the Department of theory and practice of teaching foreign languages
18 Kremlevskaya St, Kazan, 420008, Russian FederationAndrew V. Danilov
Kazan (Volga Region) Federal University
Author for correspondence.
Email: tukai@yandex.ru
ORCID iD: 0000-0002-2358-1157
SPIN-code: 8525-5480
Scopus Author ID: 57008755500
ResearcherId: L-8745-2013
PhD in Philology, Associate Professor of the Department of bilingual and digital education, Senior researcher of the research laboratory ‘Multidisciplinary Text Investigation’
18 Kremlevskaya St, Kazan, 420008, Russian FederationReferences
- Abdulvahidu, M.A. (2014). Literary travelogue: genre particularities. Philology and Culture, (3), 254–259. (In Russ.).
- Andreeva, M., Solnyshkina, M., Bukach, O., Zaikin, A., & Zamaletdinov, R. (2020). Assessment of comparative abstractness: Quantitative approach. In CEUR Workshop Proceedings (pp. 132–144). Kazan.
- Bakhtin, M.M. (1975). Questions of literature and aesthetics. Studies of different years. Moscow: Khudozhestvennaya Literatura Publ. (In Russ.).
- Biber, D. (1986). Spoken and written textual dimensions in English: Resolving the contradictory findings. Language, 62(2), 384–414. https://doi.org/10.2307/414678
- Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511621024
- Biber, D. (2006). University language: a corpus-based study of spoken and written registers. Amsterdam: John Benjamins Publ. https://doi.org/10.1075/scl.23
- Brunato, D., Cimino, A., Dell’Orletta, F., Venturi, G., & Montemagni, S. (2020). Profiling-ud: a tool for linguistic profiling of texts. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 7145–7151). Marseille: European Language Resources Association.
- Churunina, A.A., Solnyshkina, M.I., & Yarmakeev, I.E. (2023). Lexical diversity as a predictor of the complexity of textbooks on the Russian language. Russian Language Studies, 21(2), 212–227. (In Russ.). https://doi.org/10.22363/2618-8163-2023-21-2-212-227
- Dell’Orletta, F., Montemagni, S., & Venturi, G. (2013). Linguistic profiling of texts across textual genre and readability level. An exploratory study on italian fictional prose. In Proceedings of the Recent Advances in Natural Language Processing Conference (RANLP-2013) (pp. 189–197). RANLP 2013 Organising Committee Publ.
- Efremova, D.A. (2016). Linguistic stylistic means of expressing modality in the text of a biography (based on the material of English-language texts). [Author’s abstr. cand. philol. diss.]. Moscow. (In Russ.).
- Fedosyuk, M.Y. (1997). Unresolved issues of the theory of speech genres. Voprosy Jazykoznania, (5), 102–121. (In Russ.).
- Gatiyatullina, G., Solnyshkina, M., Solovyev, V., Danilov, A., Martynova, E., & Yarmakeev, I. (2020). Computing Russian morphological distribution patterns using RusAC Online Server. In Proceedings of the International Conference on Developments in eSystems Engineering (DeSE 2020) (pp. 393–398). https://doi.org/10.1109/DeSE.2020.9450753
- Holikov, A.A. (2016). Genre potential of the biography of a literary critic. The New Philological Bulletin, (4), 46–51. (In Russ.).
- Ismaeva, F., Tomin, E., & Sharifullina, E. (2023). Comparison of algorithms for automatic terminology extraction on material of educational texts on biology. In Proceeding of the 33rd Conference of FRUCT Associationistics (pp. 95–100). Helsinki: FRUCT Oy. http://doi.org/10.23919/FRUCT58615.2023.10143073
- Ivanov, V., & Solnyshkina, M. (2020). A method for assessment of text complexity based on knowledge graphs. In CEUR Workshop Proceedings. CEUR-WS.
- Jan, J.J. (2023). Genre and stylistic characteristics of a biographical text in Russian linguoculture. [Author’s abstr. cand. philol. diss.]. Moscow. (Russ.).
- Kozhina, M.N. (1999). Some aspects of speech genres study in non-fiction texts. In Stereotypes and art in text (pp. 22–39). Perm: PSU Publ. (In Russ.).
- Litvinova, T.A., Lantyukhova, N.N., Ryzhkova, E.S., & Shevchenko, I.S. (2013). Profiling the author of the text as one of the strategic directions of research. Vestnik Voronezhskogo instituta GPS MChS Rossii, 1(6), 38–41. (In Russ.).
- Manning, C., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge (MA): MIT Press.
- McNamara, D.S. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511894664
- Mendhakar, A. (2022). Linguistic profiling of text genres: Linguistic profiling of text genres: An exploration of fictional vs. non-fictional texts. Information, 13(8), 357. https://doi.org/10.3390/info13080357
- Montemagni, S., Wieling, M., de Jonge B., & Nerbonne, J. (2013). Synchronic patterns of Tuscan phonetic variation and diachronic change. Evidence from a dialectometric study. Literary and Linguistic Computing, 28(1), 157–172. https://doi.org/10.1093/llc/fqs057
- Paltridge, B. (1994). Genre analysis and the identification of textual boundaries. Applied Linguistics, 15(3), 288–299.
- Sakhovskiy, A., Solovyev, V., & Solnyshkina, M. (2020). Topic modeling for assessment of text complexity in Russian textbooks. In 2020 Ivannikov Ispras Open Conference (ISPRAS) (pp. 102–108). IEEE. https://doi.org/10.1109/ISPRAS51486.2020.00022
- Savirova, M.P. (2017). Comparative typological features of adventure genres in literary studies of the Ural-Volga region. In National languages and literatures in multicultural conditions (pp. 129–131). Cheboksary: Yakovlev Chuvash State Pedagogical University Publ. (In Russ.).
- Sirotinina, O.B. (Ed.). (1983). Colloquial speech in the system of functional styles of the modern Russian literary language: Vocabulary. Saratov: Saratov University Press. (In Russ.).
- Solnyshkina, M., Ivanov, V., & Solovyev, V. (2018). Readability formula for Russian texts: a modified version. In Advances in Computational Intelligence. MICAI 2018. Lecture Notes in Computer Science (pp. 132–145), 11289. Springer, Cham. https://doi.org/10.1007/978-3-030-04497-8_11
- Solnyshkina, M.I., Kupriyanov, R.V., & Shoeva, G.N. (2024). Linguistic profiling of text genres: adventure stories vs. textbooks. Research Result. Theoretical and Applied Linguistics, 10(1), 115–132. https://doi.org/10.18413/2313-8912-2024-10-1-0-7
- Solovyev, V., Volskaya, Y., Andreeva, M., & Zaikin, A. (2022). Russian dictionary with concreteness/abstractness indices. Russian Journal of Linguistics, 26(2), 515–549. (In Russ.). https://doi.org/10.22363/2687-0088-29475
- Statsenko, A.S. (2016). Structure and criteria of differentiation of the language genre. Philology. Theory and practice, (5-2), 32–34. (In Russ.).
- Swales, J.M. (2004). Research genres: Explorations and applications. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139524827
- Terpugova, A.V. (2011). Biographical text as an object of linguistic research. [Author’s abstr. cand. philol. diss.]. Moscow. (In Russ.).
- Vakhrusheva, A.Y, Solnyshkina, M.I., Kupriyanov, R.V., Gafiyatova, E.V., & Klimagina, I.O. (2021). Linguistic complexity of educational texts. Issues of journalism, education, linguistics, 40(1), 88–99. http://doi.org/10.52575/2712-7451-2021-40-1-89-99
- Van Halteren, H. (2004). Linguistic profiling for author recognition and verification. In Proceedings of the Association for Computational Linguistics (pp. 199–206). http://doi.org/10.3115/1218955.1218981
Supplementary files
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.
