Linguistic profiling of educational and artistic texts

Cover Page

Cite item

Abstract

Implemented within text analytics as one of the strategic directions of modern Russian linguistics, the research focuses on the linguistic profiling of educational and artistic texts. The identified genre features contribute to software systems development and big language data processing. The study is aimed at revealing the ranges of linguistic parameters which differentiate educational (secondary) and artistic (primary) texts. The study was based on 72 biographies from textbooks on Russian as a foreign language and 90 excerpts from adventure stories. The genres were chosen for the contrastive study for the following reasons: (1) a high degree of narrativity and actions; (2) functional differences, namely, informative function of biography and entertaining function of adventure stories. The research corpus comprised 120932 words. Two data processing tools were used in the study: (1) the linguistic parameters were calculated using RuLingva platform, (2) STATISTIKA program was used to identify statistically significant differences between the two genres. The revealed genre-specific parameters include global and local overlaps of nouns and arguments, nouns in prepositional and genitive cases, past and present verb forms. Global and local overlaps of nouns and arguments were found to contribute to high cohesion of biographies, because the nomination of each next event in the person's life in connected with the previous one. The genitive case prevails in biographies due to the use of nominative word combinations. The research perspective is seen in the typology of genres based on linguistic profiling of official and personal biographies, on the one hand, and adventure novels, on the other hand. An important aspect of further research in the studied area is the installation of meaning matrices in the RuLingva text profiler to automate text linguistic profiling.

Full Text

Table 1
Subcorpus of RFL biographies

Text code1

Number of texts

Number of tokens

AnNa_A2

3

1170

AnNa_B1

6

2718

Ag_A2

1

234

Ar_B2

6

2588

BeLu_A2

1

380

BoAg_C1

3

1595

FiDm_A2

1

142

Gr_A2

11

5015

JaSu_C2

5

1196

KaFr_B2

1

291

Li_C1

7

5750

Mo_B1

1

215

MoSi_A1

3

1187

OdNo_A1B2

8

2785

Sa_C1

6

1512

ShKu_B1

4

2366

TiKo_B1

5

1696

TOTAL

72

30840

Table 2
Subcorpus of adventure stories

Text code2

Number of texts

Number of tokens

Pl_1973

21

21007

St_1974

2

2004

Vu_1975

12

12023

Bo_1984

22

21995

Ga_1990

23

23054

Kn_1990

4

4022

Ch_1991

6

5987

TOTAL

90

90092

 

Fig. 1. Fragment of M.V. Lomonosov biography from RFL textbook by L.V. Moskovkin, L.V. Silvina

 

Table 3
Differences of linguistic parameters of the texts of two genres

Groups of parameters

Parameter

Mean adventures

(n = 90)

Mean biography (n = 72)

p-value*

Difference, %

Discourse

1

Global noun overlap

0.03

0.14

< 0.01

314.7

2

Local noun overlap

0.06

0.18

< 0.01

171.7

3

Global argument overlap

0.16

0.32

< 0.01

100.6

4

Local argument overlap

0.32

0.46

< 0.01

45.9

Morphological

5

Prepositional case (Noun)

27.28

55.37

< 0.01

103.0

6

Genitive case (Noun)

75.03

122.38

< 0.01

63.1

Parts of speech

7

Average number of adjectives per sentence

1.15

1.81

< 0.01

58.3

8

Average number of nouns per sentence

3.76

5.4

< 0.01

43.6

9

Adverbs

68.46

39.75

< 0.01

41.9

10

Adjectives

98.73

138.79

< 0.01

40.6

11

Verb/Noun ratio

0.54

0.34

< 0.01

36.8

12

Average number  of adverbs per  sentence

0.8

0.52

< 0.01

35.7

13

Nouns

321.25

412.6

< 0.01

28.4

14

Verbs

172.49

137.44

< 0.01

20.3

15

Adjective/Noun ratio

0.3

0.34

< 0.01

9.0

Verb tense

16

Future tense (Verb)

6.48

0.83

< 0.01

87.2

17

Present tense (Verb)

41.17

19.45

< 0.01

52.7

18

Past tense (Verb)

98.78

101.26

< 0.01

2.5

Readability

19

FKGL (SIS)

5.82

6.86

< 0.01

17.9

Note. * p < 0.05 — statistically significant differences. ** The parameters with the largest detected proportion of differences are highlighted.

Figure 2. Differences in linguistic parameters of biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based  on the Research corpus.

Table 4
Metrics of parts-of-speech in biographies and adventure stories

Parameter

Metrics scope

Standard deviation

Adventure

Biography

Adventure

Biography

Verbs

136–212

89.74–201.47

16.29

22.91

Nouns

270–384

290.91–503.21

25.00

47.11

Adjectives

53–134

81.82–226.87

16.77

26.47

Adverbs

50–98

6.80–100

9.11

17.40

Verb/Noun ratio

0.38–0.70

0.18–0.68

0.07

0.09

Adjective/Noun ratio

0.16–0.44

0.24–0.46

0.05

0.05

Average number of nouns per sentence

2.69–5.72

3.48–8.62

0.66

1.10

Average number  of adjectives per sentence

0.76–1.77

0.93–3.30

0.23

0.45

Average number  of adverbs per sentence

0.52–1.42

0.08–1.50

0.17

0.24

Table 5
Discourse parameters of biographies and adventure stories

Parameter

Metrics scope

Standard deviation

Adventure

Biography

Adventure

Biography

Global noun overlap

0.01–0.09

0.04–0.39

0,02

0,07

Local noun overlap

0.00–0.24

0.03–0.62

0,05

0,13

Global argument overlap

0.06–0.29

0.13–0.57

0,06

0,10

Local argument overlap

0.04–0.57

0.10–1.14

0,13

0,21

Figure 3. Differences in discourse parameters of biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based  on the Research corpus.

Table 6
Occurrences of category “Case of nouns” in biographies and adventures stories 

Parameter

Metrics scope

Standard deviation

Adventure

Biography

Adventure

Biography

Prepositional case (Noun)

13–42

22–93

6.31

17.7

Genitive case (Noun)

50–135

40–229

12.3

35.7

Figure 4. Differences in Noun cases parameters in biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based  on the Research corpus.

 Table 6
Verb tenses in biographies and adventure stories

Parameter

Metrics scope

Standard deviation

Adventure

Biography

Adventure

Biography

Future tense (Verb)

0–21

0–9.09

3.89

1.94

Present tense (Verb)

21–91

0–54.05

14.25

13.18

Past tense (Verb)

45–147

44.84–142.86

21.86

22.09

 

Figure 5. Differences in Verb tense parameters in biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based  on the Research corpus.

 

 

1 Hereinafter in the research the authors K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov developed a code for the textbooks comprising a letter and a number to mark the family names of the textbook authors and the grades.

2] Hereinafter in the research the authors K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov developed a code for the textbooks comprising a letter and a number to mark the family names of the textbook authors and the grades.

×

About the authors

Konstantin V. Voronin

Kazan (Volga Region) Federal University

Email: voronin.konstantin@outlook.com
assistant at the Department of theory and practice of teaching foreign languages, engineer of the research laboratory ‘Multidisciplinary Text Investigation’ 18 Kremlevskaya St, Kazan, 420008, Russian Federation

Farida H. Ismaeva

Kazan (Volga Region) Federal University

Email: fismaeva@yandex.ru
ORCID iD: 0000-0003-4496-0700
SPIN-code: 4728-3163
Scopus Author ID: 57191851333
ResearcherId: B-5420-2016

PhD in Philology, Associate Professor, Associate Professor of the Department of theory and practice of teaching foreign languages

18 Kremlevskaya St, Kazan, 420008, Russian Federation

Andrew V. Danilov

Kazan (Volga Region) Federal University

Author for correspondence.
Email: tukai@yandex.ru
ORCID iD: 0000-0002-2358-1157
SPIN-code: 8525-5480
Scopus Author ID: 57008755500
ResearcherId: L-8745-2013

PhD in Philology, Associate Professor of the Department of bilingual and digital education, Senior researcher of the research laboratory ‘Multidisciplinary Text Investigation’

18 Kremlevskaya St, Kazan, 420008, Russian Federation

References

  1. Abdulvahidu, M.A. (2014). Literary travelogue: genre particularities. Philology and Culture, (3), 254–259. (In Russ.).
  2. Andreeva, M., Solnyshkina, M., Bukach, O., Zaikin, A., & Zamaletdinov, R. (2020). Assessment of comparative abstractness: Quantitative approach. In CEUR Workshop Proceedings (pp. 132–144). Kazan.
  3. Bakhtin, M.M. (1975). Questions of literature and aesthetics. Studies of different years. Moscow: Khudozhestvennaya Literatura Publ. (In Russ.).
  4. Biber, D. (1986). Spoken and written textual dimensions in English: Resolving the contradictory findings. Language, 62(2), 384–414. https://doi.org/10.2307/414678
  5. Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511621024
  6. Biber, D. (2006). University language: a corpus-based study of spoken and written registers. Amsterdam: John Benjamins Publ. https://doi.org/10.1075/scl.23
  7. Brunato, D., Cimino, A., Dell’Orletta, F., Venturi, G., & Montemagni, S. (2020). Profiling-ud: a tool for linguistic profiling of texts. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 7145–7151). Marseille: European Language Resources Association.
  8. Churunina, A.A., Solnyshkina, M.I., & Yarmakeev, I.E. (2023). Lexical diversity as a predictor of the complexity of textbooks on the Russian language. Russian Language Studies, 21(2), 212–227. (In Russ.). https://doi.org/10.22363/2618-8163-2023-21-2-212-227
  9. Dell’Orletta, F., Montemagni, S., & Venturi, G. (2013). Linguistic profiling of texts across textual genre and readability level. An exploratory study on italian fictional prose. In Proceedings of the Recent Advances in Natural Language Processing Conference (RANLP-2013) (pp. 189–197). RANLP 2013 Organising Committee Publ.
  10. Efremova, D.A. (2016). Linguistic stylistic means of expressing modality in the text of a biography (based on the material of English-language texts). [Author’s abstr. cand. philol. diss.]. Moscow. (In Russ.).
  11. Fedosyuk, M.Y. (1997). Unresolved issues of the theory of speech genres. Voprosy Jazykoznania, (5), 102–121. (In Russ.).
  12. Gatiyatullina, G., Solnyshkina, M., Solovyev, V., Danilov, A., Martynova, E., & Yarmakeev, I. (2020). Computing Russian morphological distribution patterns using RusAC Online Server. In Proceedings of the International Conference on Developments in eSystems Engineering (DeSE 2020) (pp. 393–398). https://doi.org/10.1109/DeSE.2020.9450753
  13. Holikov, A.A. (2016). Genre potential of the biography of a literary critic. The New Philological Bulletin, (4), 46–51. (In Russ.).
  14. Ismaeva, F., Tomin, E., & Sharifullina, E. (2023). Comparison of algorithms for automatic terminology extraction on material of educational texts on biology. In Proceeding of the 33rd Conference of FRUCT Associationistics (pp. 95–100). Helsinki: FRUCT Oy. http://doi.org/10.23919/FRUCT58615.2023.10143073
  15. Ivanov, V., & Solnyshkina, M. (2020). A method for assessment of text complexity based on knowledge graphs. In CEUR Workshop Proceedings. CEUR-WS.
  16. Jan, J.J. (2023). Genre and stylistic characteristics of a biographical text in Russian linguoculture. [Author’s abstr. cand. philol. diss.]. Moscow. (Russ.).
  17. Kozhina, M.N. (1999). Some aspects of speech genres study in non-fiction texts. In Stereotypes and art in text (pp. 22–39). Perm: PSU Publ. (In Russ.).
  18. Litvinova, T.A., Lantyukhova, N.N., Ryzhkova, E.S., & Shevchenko, I.S. (2013). Profiling the author of the text as one of the strategic directions of research. Vestnik Voronezhskogo instituta GPS MChS Rossii, 1(6), 38–41. (In Russ.).
  19. Manning, C., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge (MA): MIT Press.
  20. McNamara, D.S. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511894664
  21. Mendhakar, A. (2022). Linguistic profiling of text genres: Linguistic profiling of text genres: An exploration of fictional vs. non-fictional texts. Information, 13(8), 357. https://doi.org/10.3390/info13080357
  22. Montemagni, S., Wieling, M., de Jonge B., & Nerbonne, J. (2013). Synchronic patterns of Tuscan phonetic variation and diachronic change. Evidence from a dialectometric study. Literary and Linguistic Computing, 28(1), 157–172. https://doi.org/10.1093/llc/fqs057
  23. Paltridge, B. (1994). Genre analysis and the identification of textual boundaries. Applied Linguistics, 15(3), 288–299.
  24. Sakhovskiy, A., Solovyev, V., & Solnyshkina, M. (2020). Topic modeling for assessment of text complexity in Russian textbooks. In 2020 Ivannikov Ispras Open Conference (ISPRAS) (pp. 102–108). IEEE. https://doi.org/10.1109/ISPRAS51486.2020.00022
  25. Savirova, M.P. (2017). Comparative typological features of adventure genres in literary studies of the Ural-Volga region. In National languages and literatures in multicultural conditions (pp. 129–131). Cheboksary: Yakovlev Chuvash State Pedagogical University Publ. (In Russ.).
  26. Sirotinina, O.B. (Ed.). (1983). Colloquial speech in the system of functional styles of the modern Russian literary language: Vocabulary. Saratov: Saratov University Press. (In Russ.).
  27. Solnyshkina, M., Ivanov, V., & Solovyev, V. (2018). Readability formula for Russian texts: a modified version. In Advances in Computational Intelligence. MICAI 2018. Lecture Notes in Computer Science (pp. 132–145), 11289. Springer, Cham. https://doi.org/10.1007/978-3-030-04497-8_11
  28. Solnyshkina, M.I., Kupriyanov, R.V., & Shoeva, G.N. (2024). Linguistic profiling of text genres: adventure stories vs. textbooks. Research Result. Theoretical and Applied Linguistics, 10(1), 115–132. https://doi.org/10.18413/2313-8912-2024-10-1-0-7
  29. Solovyev, V., Volskaya, Y., Andreeva, M., & Zaikin, A. (2022). Russian dictionary with concreteness/abstractness indices. Russian Journal of Linguistics, 26(2), 515–549. (In Russ.). https://doi.org/10.22363/2687-0088-29475
  30. Statsenko, A.S. (2016). Structure and criteria of differentiation of the language genre. Philology. Theory and practice, (5-2), 32–34. (In Russ.).
  31. Swales, J.M. (2004). Research genres: Explorations and applications. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139524827
  32. Terpugova, A.V. (2011). Biographical text as an object of linguistic research. [Author’s abstr. cand. philol. diss.]. Moscow. (In Russ.).
  33. Vakhrusheva, A.Y, Solnyshkina, M.I., Kupriyanov, R.V., Gafiyatova, E.V., & Klimagina, I.O. (2021). Linguistic complexity of educational texts. Issues of journalism, education, linguistics, 40(1), 88–99. http://doi.org/10.52575/2712-7451-2021-40-1-89-99
  34. Van Halteren, H. (2004). Linguistic profiling for author recognition and verification. In Proceedings of the Association for Computational Linguistics (pp. 199–206). http://doi.org/10.3115/1218955.1218981

Supplementary files

Supplementary Files
Action
1. Fig. 1. Fragment of M.V. Lomonosov biography from RFL textbook by L.V. Moskovkin, L.V. Silvina

Download (238KB)
2. Figure 2. Differences in linguistic parameters of biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus

Download (108KB)
3. Figure 3. Differences in discourse parameters of biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.

Download (80KB)
4. Figure 4. Differences in Noun cases parameters in biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.

Download (81KB)
5. Figure 5. Differences in Verb tense parameters in biographies and adventure stories
Source: Calculated by K.V. Voronin, F.Kh. Ismaeva, A.V. Danilov based on the Research corpus.

Download (71KB)

Copyright (c) 2024 Voronin K.V., Ismaeva F.H., Danilov A.V.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies