Natural language processing and discourse complexity studies

Cover Page

Cite item

Abstract

The study presents an overview of discursive complexology, an integral paradigm of linguistics, cognitive studies and computer linguistics aimed at defining discourse complexity. The article comprises three main parts, which successively outline views on the category of linguistic complexity, history of discursive complexology and modern methods of text complexity assessment. Distinguishing the concepts of linguistic complexity, text and discourse complexity, we recognize an absolute nature of text complexity assessment and relative nature of discourse complexity, determined by linguistic and cognitive abilities of a recipient. Founded in the 19th century, text complexity theory is still focused on defining and validating complexity predictors and criteria for text perception difficulty. We briefly characterize the five previous stages of discursive complexology: formative, classical, period of closed tests, constructive-cognitive and period of natural language processing. We also present the theoretical foundations of Coh-Metrix, an automatic analyzer, based on a five-level cognitive model of perception. Computing not only lexical and syntactic parameters, but also text level parameters, situational models and rhetorical structures, Coh-Metrix provides a high level of accuracy of discourse complexity assessment. We also show the benefits of natural language processing models and a wide range of application areas of text profilers and digital platforms such as LEXILE and ReaderBench. We view parametrization and development of complexity matrix of texts of various genres as the nearest prospect for the development of discursive complexology which may enable a higher accuracy of inter- and intra-linguistic contrastive studies, as well as automating selection and modification of texts for various pragmatic purposes.

About the authors

Marina Ivanovna Solnyshkina

Kazan Federal University

Email: mesoln@yandex.ru
ORCID iD: 0000-0003-1885-3039

Doctor Habil. (Philology), Professor of the Department of Theory and Practice of Foreign Language Teaching, Head of “Text Analytics” Research Lab at the Institute of Philology and Intercultural Communication

18 Kremlevskaya str., Kazan, 420008, Russia

Danielle S. McNamara

Arizona State University

Email: Danielle.McNamara@asu.edu
Ph.D., is Professor of Psychology in the Psychology Department and Senior Scientist Payne Hall, TEMPE Campus, Suite 108, Mailcode 1104, the USA

Radif Rifkatovich Zamaletdinov

Kazan Federal University

Author for correspondence.
Email: director.ifmk@gmail.com
ORCID iD: 0000-0002-2692-1698

Doctor Habil. (Philology), Professor, Director of the Institute of Philology and Intercultural Communication

18 Kremlevskaya str., Kazan, 420008, Russia

References

  1. Anderson, Philip. 1972. More is different: Broken symmetry and the hierarchical nature of science. Science 177 (4047). 393-396.
  2. Biber, Douglas. 1988. Variation Across Speech and Writing. Cambridge, England: Cambridge University Press. https://doi.org/10.1017/S0022226700014201
  3. Biemiller, Andrew. 2009. Words Worth Teaching. Columbus, OH: SRA/McGraw-Hill.
  4. Bormuth, John R. 1969. Development of Readability Analysis. Technical report, Projet number 7-0052, U.S. Office of Education, Bureau of Research, Department of Health, Education and Welfare, Washington, DC.
  5. Bulté, Bram & Alex Housen. 2012. Defining and operationalising L2 complexity. In Housen Alex, Folkert Kuiken & Ineke Vedder (eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA, 21-46. Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.32.02bul
  6. Chall, Jeanne S. & Edgar Dale. 1995. Readability Revisited: The New Dale-Chall Readability Formula. Cambridge: Brookline Books.
  7. Charniak, Eugene. 2000. A maximum-entropyinspired parser. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference. 132-139.
  8. Coleman, Edmund B. 1965. On Understanding Prose: Some Determiners of Its Complexity. NSF Final Report GB2604, Washington, D.C, National Science Foundation.
  9. Collins-Thompson, Kevyn. 2015. Computational assessment of text readability: A survey of current and future research. ITL - International Journal of Applied Linguistics 165 (2). 97-135.
  10. Crossley, Scott A., Philip M. Mccarthy, David F Duffy & Danielle McNamara. 2007. Toward a new readability: A mixed model approach. In Proceedings of the 29th Annual Conference of the Cognitive Science Society. 197-202.
  11. Dale, Edgar & Jeanne S. Chall. 1948. A formula for predicting readability. Educational Research Bulletin 27. 11-20, 37-54.
  12. Dale, Edgar & Joseph O'Rourke. 1981. Living Word Vocabulary. Chicago: World Book - Childcraft International.
  13. Danielson, Wayne A. & Sam D. Bryan. 1963. Computer automation of two readability formulas. Journalism Quarterly 40 (2). 201-205. https://doi.org/10.1177%2F107769906304000207
  14. Daoust, François, Léo Laroche & Lise Ouellet. 1996. SATO-CALIBRAGE: Présentation d’un outil d’assistance au choix et à la rédaction de textes pour l’enseignement. Revue Québécoise de Linguistique 25 (1). 205-234.
  15. Dascalu, Mihai. 2014. Analyzing discourse and text complexity for learning and collaborating. In Analyzing Discourse and Text Complexity for Learning and Collaborating, 1-3. Springer, Cham. https://doi.org/10.1007/978-3-319-03419-5
  16. Flesch, Rudolf. 1948. A new readability yardstick. Journal of Applied Psychology 32 (3). 221-233. https://doi.org/10.1037/h0057532
  17. Foltz, Peter W., Walter Kintsch & Thomas Landauer. 1998. The measurement of textual coherence with latent semantic analysis. Discourse Processes 25 (2). 285-307. https://doi.org/10.1080/01638539809545029
  18. Gatiyatullina, Galya, Marina Solnyshkina, Valery Solovyev, Andrey Danilov, Ekaterina Martynova & Iskander Yarmakeev. 2020. Computing Russian morphological distribution patterns using RusAC Online Server. In 13th International Conference on Developments in eSystems Engineering (DeSE). 393-398. https://doi.org/10.1109/DeSE51703.2020.9450753
  19. Graesser, Arthur C. & Danielle S. McNamara. 2011. Computational Analyses of Multilevel Discourse Comprehension. Topics in Cognitive Science 3. 371-398.
  20. Graesser, Arthur C., Matthew Singer & Tom Trabasso. 1994. Constructing inferences during narrative text comprehension. Psychological Review 101. 371-395.
  21. Gray, William & William Leary. 1935. What Makes a Book Readable. University of Chicago Press, Chicago: Illinois.
  22. Hall, Charles, Debra S. Lee, Gwenyth Lewis, Phillip M. McCarthy & Danielle S. McNamara. 2006. Language in law: Using Coh-Metrix to assess differences between American and English/Welsh language varieties. In Proceedings of the Annual Meeting of the Cognitive Science Society 28.
  23. Heilman, Michael, Le Zhao, Juan Pino & Maxine Eskenazi. 2008. Retrieval of reading materials for vocabulary and reading practice. In Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications. 80-88. https://doi.org/10.3115/1631836.1631846
  24. Hendrix, Gary G. 1980. Future prospects for computational linguistics. In ACL '80: Proceedings of the 18th Annual Meeting on Association for Computational Linguistics. 131-135. Association for Computational Linguistics, United States. https://doi.org/10.3115/981436.981476
  25. Jones, Michael N., Walter Kintsch & Douglas J. Mewhort. 2006. High-dimensional semantic space accounts of priming. Journal of Memory and Language 55(4). 534-552.
  26. Kemper, Susan. 1983. Measuring the inference load of a text. Journal of Educational Psychology 75 (3). 391-401.
  27. Kintsch, Walter & Vipond Douglas. 1979. Reading comprehension and readability in educational practice and psychological theory. In Lars-Göran Nilsson (ed.), Perspectives on memory research, 329-365. Hillsdale, NJ, Lawrence Erlbaum.
  28. Klare, George R. 1963. The Measurement of Readability. Iowa State University Press.
  29. Kortmann, Bernd & Benedikt Szmrecsanyi (eds.). 2012. Linguistic Complexity: Second Language Acquisition, Indigenization, Contact. Berlin: De Gruyter.
  30. Laposhina, Antonina N. & Maria Yu. Lebedeva. 2021. Tekstometr: Online-instrument opredeleniya urovnya slozhnosti teksta po russkomu yazyku kak inostrannomu. Rusistika 19(3). 331-345. (In Russ.) http://dx.doi.org/10.22363/2618-8163-2021-19-3-331-345
  31. Lively, Bertha & Sidney Pressey. 1923. A method for measuring the ‘vocabulary burden’ of textbooks. Educational Administration and Supervision 9. 389-398.
  32. Marujo, Luis, Jorge Baptista, José Lopes, Maxine Eskenazi, Ceu Viana, Juan Pino & Isabel Trancoso. 2009. Porting reap to European Portuguese. In SLaTE. 69-72. Citeseer.
  33. McCall, William & Lelah Crabbs. 1925. Standard Test Lessons in Reading. New York: Teacher's College Press.
  34. McCarthy, Philip M., John C. Myers, Stephen Briner & Arthur C. Graesser. 2009. A psychological and computational study of sub-sentential genre recognition. JLCL 24 (1). 23-55.
  35. McClusky, Howard. 1934. A quantitative analysis of the difficulty of reading materials. The Journal of Educational Research 28. 276-282. https://doi.org/10.1080/00220671.1934.10880487
  36. McLaughlin, G. Harry. 1969. Smog-grading - a new readability formula. Journal of Reading 13. 639-646.
  37. McNamara, Danielle & Arthur C. Graesser. 2012. Coh-Metrix: An Automated Tool for Theoretical and Applied Natural Language Processing. IGI Global. https://doi.org/10.4018/978-1-60960-741-8.ch011
  38. McNamara, Danielle S., Arthur C. Graesser, Philip M. McCarthy & Zhiqiang Cai. 2014. Coh-Metrix: Theoretical, Technological, and Empirical Foundations. In Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511894664.006
  39. Meyer, Bonnie J. F. 1982. Reading research and the composition teacher: The importance of plans. College Composition and Communication 33 (1). 37-49. https://doi.org/10.2307/357843
  40. Nelson, Jessica, David Liben, Meredith Liben & Charles Perfetti. 2012. Measures of Text Difficulty: Testing their Predictive Value for Grade Levels and Student Performance. New York, NY: Student Achievement Partners.
  41. Ojemann, Ralph. 1934. The reading ability of parents and factors associated with the reading difficulty of parent education materials. University of Iowa Studies in Child Welfare 8. 11-32.
  42. Rabin, Mikhael'. 1993. Slozhnost' vychislenii. In ACM Turing Award Lectures. 371-391. Moscow: Mir. (In Russ.)
  43. Rescher, Nicholas. 1998. Complexity: A Philosophical Overview. London: Transaction Publishers.
  44. Rosch, Eleanor & Carolyn B. Mervis. 1975. Family resemblances: Studies in the internal structure of categories. Cognitive Psychology 7. 573-605.
  45. Rubakin, Nikolai A. 1890. Notes on literature for the people. Russkoe Bogatstvo 10. 221-231. (In Russ.)
  46. Saimon, Gerbert. 2004. The Sciences of the Artificial. Moscow: Editorial URSS. (In Russ.)
  47. Schwarm, Sarah E. & Mari Ostendorf. 2005. Reading level assessment using support vector machines and statistical language models. In ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 523-530. https://doi.org/10.3115/1219840.1219905
  48. Sheehan, Kathleen M., Irene Kostin, Diane Napolitano & Michael Flor. 2014. The TextEvaluator tool: Helping teachers and test developers select texts for use in instruction and assessment. The Elementary School Journal 115 (2). 184-209. https://doi.org/10.1086/678294
  49. Sherman, Lucius A. 1893. Analytics of Literature: А Manual for the Objective Study of English Prose and Poetry. Boston: Ginn.
  50. Si, Luo & Jamie Callan. 2001. A statistical model for scientific readability. In Proceedings of the Tenth International Conference on Information and Knowledge Management. 574-576. ACM New York, NY, USA. https://doi.org/10.1145/502585.502695
  51. Simon, Herbert A. 1996. The Sciences of the Artificial. Cambridge: The MIT Press.
  52. Smith, Edgar A. & John Quackenbush. 1960. Devereux teaching aids employed in presenting elementary mathematics in a special education setting. Psychological Reports 7. 333-336. https://doi.org/10.2466/PR0.7.6.333-336
  53. Solnyshkina, Marina I., Elena V. Harkova & Aleksander S. Kiselnikov. 2014. Comparative Coh-metrix analysis of reading comprehension texts: Unified (Russian) state exam in English vs Cambridge first certificate in English. English Language Teaching 7 (12). 65-76. https://doi.org/10.5539/elt.v7n12p65
  54. Solnyshkina, Marina I. & Kisel'nikov Aleksandr. S. 2015. Slozhnost' teksta: Etapy izucheniya v otechestvennom prikladnom yazykoznanii. Vestnik Tomskogo Gosudarstvennogo Universiteta. Filologiya 6(38). (In Russ.)
  55. Solnyshkina, Marina I., Elena V. Harkova & Maria B. Kazachkova. 2020. The structure of Cross-Linguistic differences: Meaning and context of 'Readability' and its Russian equivalent 'Chitabelnost'. Journal of Language & Education 6 (1). 103-119. https://jle.hse.ru/article/view/7176/12052. https://doi.org/10.17323/jle.2020.v6.i1
  56. Solnyshkina, Marina I., Ehl'zara Gizzatullina-Gafiyatova, Ekaterina V. Martynova & Valery Solovyev. 2022. Text complexity as an interdisciplinary problem. Voprosy Kognitivnoi Lingvistiki 1. (In Russ.)
  57. Solovyev, Valery D., Vladimir V. Ivanov & Marina I. Solnyshkina. 2018. Assessment of reading difficulty levels in Russian academic texts: Approaches and Metrics. Journal of Intelligent & Fuzzy Systems 34 (5). 3049-3058. https://doi.org/10.3233/JIFS-169489
  58. Solovyev, Valery, Marina Solnyshkina, Vladimir Ivanov & Ildar Batyrshin. 2019. Prediction of reading difficulty in Russian academic texts. Journal of Intelligent & Fuzzy Systems 36 (5). 4553-4563. https://doi.org/10.3233/JIFS-179007
  59. Solovyev, Valerii, Yulia Volskaya, Maria Andreeva & Artem Zaikin. 2022. Russian dictionary with concreteness/abstractness indexes. Russian Journal of Linguistics 2. 514-548. (In Russ.)
  60. Spivey, Nancy N. 1987. Construing constructivism: Reading research in the United States. Poetics 16 (2). 169-192. https://doi.org/10.1016/0304-422X%2887%2990024-6
  61. Steger, Maria & Edgar W. Schneider. 2012. Complexity as a function of iconicity: The case of complement clause constructions in New Englishes. In Kortmann Bernd & Benedikt Szmrecsanyi (eds.), Linguistic complexity: Second language acquisition, indigenization, contact, 156-191. Berlin: De Gruyter.
  62. Stevens, Kathleen C. 1980. Readability Formulae and McCall-Crabbs Standard Test Lessons in Reading. The Reading Teacher 33 (4). 413-415.
  63. Sun, Haimei. 2020. Unpacking reading text complexity: A dynamic language and content approach. Studies in Applied Linguistics & TESOL at Teachers College 20 (2). 1-20. https://doi.org/10.7916/salt.v20i2.7098
  64. Taylor, Wilson L. 1953. Cloze procedure: A new tool for measuring readability. Journalism Quarterly 30 (4). 415-433. https://doi.org/10.1177%2F107769905303000401
  65. Thorndike, Edward. 1921. Word knowledge in the elementary school. The Teachers College Record 22 (5). 334-370.
  66. van Dijk, Teun A. & Walter Kintsch. 1983. Strategies of Discourse Comprehension. New York: Academic.
  67. Vergara, Fermina & Rachelle Lintao. 2020. War on drugs: The readability and comprehensibility of illegal drug awareness campaign brochures. International Journal of Language and Literary Studies 2 (4). 98-121. https://doi.org/10.36892/ijlls.v2i4.412
  68. Vogel, Mabel & Carleton Washburne. 1928. An objective method of determining grade placement of children’s reading material. The Elementary School Journal 28 (5). 373-381. https://doi.org/10.1086/456072
  69. Zwaan, Rolf A. & Gabriel A. Radvansky. 1998. Situation models in language comprehension and memory. Psychological Bulletin 123. 162-185. https://doi.org/10.1037/0033-2909.123.2.162
  70. Zeno, Susan, Robert T. Millard & Raj Duvvuri. 1995. The Educator's Word Frequency Guide. Brewster: Touchstone Applied Science Associates, Inc.
  71. Antonini, Alessio, Francesca Benatti, Edmund King, François Vignale & Guillaume Gravier. 2019. Modelling Changes in Diaries, Correspondence and Authors’ Libraries to Support Research on Reading: The READ-IT Approach. URL: https://hal.archives-ouvertes.fr/hal-02130008/document (accessed 25 January 2022)
  72. Antunes, Hélder M. M. 2019. Automatic Assessment of Health Information Readability. URL: https://repositorio-aberto.up.pt/bitstream/10216/121810/4/345408.pdf (accessed 25 January 2022)
  73. Development of the ATOS Readability Formula. 2014. URL: https://webcache.googleusercontent.com/search?q=cache:lWV4zvGcnhMJ:https://doc.renlearn.com/KMNet/R004250827GJ11C4.pdf+&cd=14&hl=ru&ct=clnk&gl=ru (accessed 25 January 2022).
  74. François, Thomas & Hubert Naets. 2011. Dmesure: A readability platform for French as a foreign language. URL: https://cental.uclouvain.be/team/tfrancois/articles/CLIN21.pdf (accessed 25 January 2022)
  75. Lennon, Colleen & Hal Burdick. 2004. The Lexile Framework as an Approach for Reading Measurement and Success. URL: http://www.lexile.com/m/resources/materials/Lennon__Burdick_2004.pdf (accessed 25 January 2022).
  76. Renaissance. 2022. URL: https://ukhosted43.renlearn.co.uk/2171850/ (accessed 25 January 2022).
  77. Special Collections. Accelerated Reader (ATOS Level: 5.0-5.9). Bookshare a Benetech Initiative. 2002-2022. URL: https://www.bookshare.org/browse/collection/371895 (accessed 25 January 2022).
  78. T.E.R.A.: The Coh-Metrix Common Core Text Ease and Readability Assessor. 2012-2022. URL: http://129.219.222.70:8084/Coh-Metrix.aspx (accessed 25 January 2022).
  79. The ATOS Readability Formula for Books and How it Compares to Other Formulas. 2000. URL: https://files.eric.ed.gov/fulltext/ED449468.pdf (accessed 25 January 2022).
  80. The Lexile Framework for Reading. 2022. URL: https://lexile.com (accessed 25 January 2022).

Copyright (c) 2022 Solnyshkina M.I., McNamara D.S., Zamaletdinov R.R.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies