Computational linguistics and discourse complexology: Paradigms and research methods

封面

如何引用文章

详细

The dramatic expansion of modern linguistic research and enhanced accuracy of linguistic analysis have become a reality due to the ability of artificial neural networks not only to learn and adapt, but also carry out automate linguistic analysis, select, modify and compare texts of various types and genres. The purpose of this article and the journal issue as a whole is to present modern areas of research in computational linguistics and linguistic complexology, as well as to define a solid rationale for the new interdisciplinary field, i.e. discourse complexology. The review of trends in computational linguistics focuses on the following aspects of research: applied problems and methods, computational linguistic resources, contribution of theoretical linguistics to computational linguistics, and the use of deep learning neural networks. The special issue also addresses the problem of objective and relative text complexity and its assessment. We focus on the two main approaches to linguistic complexity assessment: “parametric approach” and machine learning. The findings of the studies published in this special issue indicate a major contribution of computational linguistics to discourse complexology, including new algorithms developed to solve discourse complexology problems. The issue outlines the research areas of linguistic complexology and provides a framework to guide its further development including a design of a complexity matrix for texts of various types and genres, refining the list of complexity predictors, validating new complexity criteria, and expanding databases for natural language.

作者简介

Valery Solovyev

Kazan (Volga Region) Federal University

Email: maki.solovyev@mail.ru
ORCID iD: 0000-0003-4692-2564

Doctor Habil. of Physical and Mathematical Sciences, Professor, Chief Researcher of “Text Analytics” Research Lab, Institute of Philology and Intercultural Communication

18 Kremlevskaya str., Kazan, 420008, Russia

Marina Solnyshkina

Kazan (Volga Region) Federal University

Email: mesoln@yandex.ru
ORCID iD: 0000-0003-1885-3039

Doctor Habil. of Philology, Professor of the Department of Theory and Practice of Teaching Foreign Languages, Head and Chief Researcher of “Text Analytics” Research Lab, Institute of Philology and Intercultural Communication

18 Kremlevskaya str., Kazan, 420008, Russia

Danielle McNamara

Arizona State University

编辑信件的主要联系方式.
Email: Danielle.McNamara@asu.edu
ORCID iD: 0000-0001-5869-1420

Ph.D., is Professor of Psychology in the Psychology Department and Senior Scientist

TEMPE Campus, Suite 108, Mailcode 1104, the USA

参考

  1. Апресян Ю.Д., Богуславский И.М., Иомдин Л.Л., Лазурский А.В., Перцов Н.В., Санников В.З., Цинман Л.Л. Лингвистическое обеспечение системы ЭТАП-2. М.: Наука, 1989. [Apresyan, Yurii D., Igor M. Boguslavskii, Leonid L. Iomdin, Aleksandr V. Lazurskii, Nikolai V. Pertsov, Vladimir Z. Sannikov, Leonid L. Tsinman. 1989. Lingvisticheskoe obespechenie sistems ETAP-2 (Linguistic support of the system STAGE-2). Moscow: Nauka. (In Russ.)].
  2. Бердичевский А. Языковая сложность // Вопросы языкознания. 2012. № 5. С. 101-124. [Berdichevskii, Aleksandr. 2012. Yazykovaya slozhnost' (Language complexity). Voprosy yazykoznaniya 5. 101-124.] (In Russ.)
  3. Вахтин, Н. Рец. на кн.: Peter Trudgil. Sociolinguistic Typology: Social Determinants of Linguistic Complexity // Антропологический форум. 2014. № 2. С. 301-309. [Vakhtin, Nikolai. 2014. Review of Peter Trudgil. Sociolinguistic Typology: Social Determinants of Linguistic Complexity. Antropologicheskii Forum 2. 301-309. (In Russ.)].
  4. Даль Э. Возникновение и сохранение языковой сложности. М.: ЛКИ, 2009. [Dahl, Osten. 1976. Vozniknovenie i sokhranenie yazykovoi slozhnosti (The emergence and persistence of language complexity). Moscow: LKI. (In Russ.)].
  5. Жирмунский В.М. Общее и германское языкознание: Избранные труды. Л.: Наука, 1976. [Zhirmunskii, Viktor M. 1976. Obshchee i germanskoe yazykoznanie: Izbrannye trudy (General and Germanic Linguistics: Selected works). Leningrad: Nauka. (In Russ.)].
  6. Зализняк А.А. Грамматический словарь русского языка. М.: Русский язык, 1977. [Zaliznyak, Andrei A. 1977. Grammaticheskii slovar' russkogo yazyka (Grammatical dictionary of the Russian language). Moscow. (In Russ.)].
  7. Избыточность в грамматическом строе языка / под ред. М.Д. Воейковой. СПб.: Наука, 2010. [Voeikova, Mariya D. (ed.). 2010. Izbytochnost' v grammaticheskom stroe yazyka (Redundancy in the Grammatical Structure of the Language). Saint Petersburg: Nauka. (In Russ.)].
  8. Казак М.Ю. Морфемика и словообразования современного русского языка. Теория. Белгород: ИД «Белгород», 2012. [Kazak, Mariya Yu. 2012. Morfemika i slovoobrazovaniya sovremennogo russkogo yazyka. Teoriya (Morphemics and word formation of the modern Russian language. Theory). Belgorod: ID «Belgorod». (In Russ.)].
  9. Кибрик А.А., Подлесская В.И. (ред.). Рассказы о сновидениях. Корпусное исследование устного русского дискурса. М.: Языки славянских культур, 2009. [Kibrik, A. A. & V. I. Podlesskaya (eds.). 2009. Night Dream Stories: A Corpus Study of Russian Spoken Discourse. Moscow: Yazyki slavyanskikh kul'tur. (In Russ.)].
  10. Маннинг К.Д., Рагхаван П., Шютце Х. Введение в информационный поиск. М.: Вильямс, 2011. [Manning, Kristofer D., Prabkhakar Ragkhavan & Khinrich Shyuttse. 2011. Vvedenie v informatsionnyi poisk (Introduction to Information Search). Moscow: Vil'yams. (In Russ.)].
  11. Мельчук И.А. Опыт теории лингвистических моделей «Смысл ⇔ Текст». М., 1974. [Mel'chuk, Igor' A. 1974. Opyt teorii lingvisticheskikh modelei «Smysl ⇔ Tekst» (The experience of the theory of linguistic models «Meaning ⇔Text»). Moscow. (In Russ.)].
  12. Подлесская В.И., Кибрик А.А. Дискурсивные маркеры в структуре устного рассказа: Опыт корпусного исследования // Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегод. Междунар. конф. «Диалог». 2009. Вып. 8 (15). С. 390-396. [Podlesskaya, V.I. & Kibrik A.A. 2009. Diskursivnye markery v strukture ustnogo rasskaza: Opyt korpusnogo issledovaniya (Discursive mrkers in the structure of oral narrative: The Experience of Corpus Research). In Komp'yuternaya lingvistika i intellektual'nye tekhnologii: Proceedings of the Annual international conference Dialogue 8(15). 390-396].
  13. Солнышкина M.И., Кисельников А.С. Сложность текста: Этапы изучения в отечественном прикладном языкознании // Вестник Томского государственного университета. Филология. 2015. № 6. С. 86-99. [Solnyshkina, M.I., Kise’nikov, A.S. 2015. Slozhnost' teksta: Ehtapy izucheniya v otechestvennom prikladnom yazykoznanii (Text complexity: Stages of study in domestic applied linguistics). Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya 6. 86-99].
  14. Allahyari, Mehdi, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez & Krys Kochut. 2017. Text summarization techniques: A brief survey. arXiv 1707.02268, URL: https://arxiv.org/pdf/1707.02268.pdf. (accessed 20.01.2022).
  15. Batrinca, Bogdan & Philip Treleaven. 2015. Social media analytics: a survey of techniques, tools and platforms. AI & Soc 30 (1). 89-116. https://doi.org/10.1007/s00146-014-0549-4
  16. Bisang, Walter. 2009. On the evolution of complexity: Sometimes less is more in East and mainland Southeast Asia. In Geoffrey Sampson, David Gil & Peter Trudgill (eds.), Language complexity as an evolving variable, 34-49. Oxford, New York: Oxford University Press.
  17. Braunmüller, Kurt. 1990. Komplexe flexionssysteme - (k)ein problem für die natürlichkeitstheorie? Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 43. 625-635.
  18. Cambria, Erik, Dipankar Das, Sivaji Bandyopadhyay & Antonio Feraco (eds.). 2017. A Practical Guide to Sentiment Analysis. Cham, Switzerland: Springer International Publishing.
  19. Chen, Danqi & Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 740-750. https://doi.org/10.3115/v1/D14-1082
  20. Church, Kenneth & Mark Liberman. 2021. The future of computational linguistics: On beyond alchemy. Frontiers in Artificial Intelligence 4. 625341. https://doi.org/10.3389/frai.2021.625341
  21. Cinelli, Matteo, Walter Quattrociocchi, Alessandro Galeazzi, Carlo Michele Valensise, Emanuele Brugnoli, Ana Lucia Schmidt, Paola Zola, Fabiana Zollo & Antonio Scala. 2020. The COVID-19 social media infodemic. Sci Rep 10. 16598. https://doi.org/10.1038/s41598-020-73510-5
  22. Clark, Alexander, Chris Fox & Shalom Lappin (eds.). 2013. The Handbook of Computational Linguistics and Natural Language Processing. John Wiley & Sons.
  23. Crossley, S.A., Greenfield, J. & McNamara, D. S. 2008. Assessing Text Readability Using Cognitively Based Indices. TESOL Quarterly, 42 (3), 475-493.
  24. Dammel, Antje & Sebastian Kürschner. 2008. Complexity in nominal plural allomorphy. In Matti Miestamo, Kaius Sinnemäki & Fred Karlsson (eds.), Language complexity: Typology, contact, change, 243-262. Amsterdam, Philadelphia: Benjamins.
  25. Deutscher, Guy. 2009. «Overall complexity»: A wild goose chase? In Geoffrey Sampson, David Gil & Peter Trudgill (eds.), Language complexity as an evolving variable, 243-251. Oxford: Oxford University Press.
  26. Deutscher, Guy. 2010. Through the Language Glass: Why the World Looks Different in Other Languages. New York: Metropolitan Books.
  27. Devlin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 1810.04805v2. URL: https://arxiv.org/pdf/1810.04805.pdf. (accessed 20.01.2022).
  28. Domingue, John, Dieter Fensel & James A. Hendler (eds.). 2011. Handbook of Semantic Web Technologies. Springer Science & Business Media.
  29. Fellbaum, Christiane (ed.). 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
  30. Fenk-Oczlon, Gertraud & August Fenk. 2008. Complexity trade-offs between the subsystems of language. In Matti Miestamo, Kaius Sinnemäki & Fred Karlsson (eds.), Language complexity: Typology, contact, change, 43-65. Amsterdam, Philadelphia: Benjamins.
  31. Fillmore, Charles J. 1968. The case for case. In Emmon W. Bach & Robert T. Harms (eds.), Universals in Linguistic Theory, 1-88. New York, NY: Holt, Rinehart & Winston.
  32. Ghani, Norjihan A., Suraya Hamida, Ibrahim AbakerTargio Hashemb & Ejaz Ahmedc. 2019. Social media big data analytics: A survey. Computers in Human Behavior 101. 417-428. https://doi.org/10.1016/j.chb.2018.08.039
  33. Gil, David. 2008. How complex are isolating languages? In Matti Miestamo, Kaius Sinnemäki & Fred Karlsson (eds.), Language complexity: Typology, contact, change, 109-131. Amsterdam, Philadelphia: Benjamins.
  34. Givón, Thomas. 2009. The Genesis of Syntactic Complexity: Diachrony, Ontogeny, Neuro-Cognition, Evolution. Amsterdam, Philadelphia: Benjamins.
  35. Hoang, Mickel, Oskar Alija Bihorac & Jacobo Rouces. 2019. Aspect-based sentiment analysis using BERT. In Mareike Hartmann & Barbara Plank (eds.), Proceedings of the 22nd Nordic conference on computational linguistics, 187-196. Turku, Finland: Linköping University Electronic Press Publ.
  36. Hockett, Charles F. 1958. A Course in Modern Linguistics. New York: Macmillan.
  37. Humboldt, Wilhelm von. 1999. On Language: On the Diversity of Human Language Construction and its Influence on the Mental Development of the Human Species. Cambridge, U.K. New York: Cambridge University Press.
  38. Hutchins, John. 1999. Retrospect and prospect in computer-based translation. In Proceedings of MT Summit VII «MT in the Great Translation Era». 30-44. Tokyo: AAMT.
  39. Indurkhya, Nitin & Fred J. Damerau (eds.). 2010. Handbook of Natural Language Processing. CRC Press.
  40. Jiang, Ridong, Rafael E. Banchs & Haizhou Li. 2016. Evaluating and combining name entity recognition systems. In Nancy Chen, Rafael E. Banchs, Xiangyu Duan, Min Zhang & Haizhou Li (eds.), Proceedings of NEWS 2016. The Sixth named entities workshop, 21-27. Berlin, Germany.
  41. Karlsson, Fred. 2009. Origin and maintenance of clausal embedding complexity. In Geoffrey Sampson, David Gil & Peter Trudgill (eds.), Language complexity as an evolving variable, 192-202. Oxford: Oxford University Press.
  42. Kortmann, Bernd & Benedikt Szmrecsanyi. 2004. Global synopsis: Morphological and syntactic variation in English. In Bernd Kortmann, Edgar Schneider Werner, Clive Upton, Kate Burridge & Rajend Mesthrie(eds.), A Handbook of varieties of English, 1142-1202. Berlin, New York: Mouton de Gruyter.
  43. Kusters, Wouter. 2003. Linguistic Complexity: The Influence of Social Change on Verbal Inflection. Utrecht: LOT.
  44. Kutuzov, Andrey & Elizaveta Kuzmenko. 2017. WebVectors: A toolkit for building web interfaces for vector semantic models. In Wil M. P. van der Aalst, Dmitry I. Ignatov, Michael Khachay, Sergei O. Kuznetsov, Victor Lempitsky, Irina A. Lomazova, Natalia Loukachevitch, Amedeo Napoli, Alexander Panchenko, Panos M. Pardalos, Andrey V. Savchenko &Stanley Wasserman (eds.), Analysis of Images, Social Networks and Texts, 155-161. Moscow: AIST.
  45. Lauriola, Ivano, Alberto Lavelli & Fabio Aiolli. 2022. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing 470. 443-456. https://doi.org/10.1016/j.neucom.2021.05.103
  46. Loukachevitch, Natalia V. & Anatolii Levchik. 2016. Creating a general Russian sentiment lexicon. In Proceedings of Language Resources and Evaluation Conference LREC-2016.
  47. Loukachevitch, Natalia V. & G. Lashevich. 2016. Multiword expressions in Russian Thesauri RuThes and RuWordNet. In Proceedings of the AINL FRUCT. 66-71. Saint-Petersburg.
  48. McNamara, Danielle S., Elieen Kintsch, Nancy Butler Songer & Walter Kintsch. 1996. Are Good Texts Always Better? Interactions of Text Coherence, Background Knowledge, and Levels of Understanding in Learning from Text. Cognition and Instruction, 14 (1), 1-43
  49. McWhorter, John. 2001. The world’s simplest grammars are creole grammars. Linguistic Typology 6. 125-166. https://doi.org/10.1515/LITY.2001.001
  50. McWhorter, John. 2008. Why does a language undress? Strange cases in Indonesia. In Matti Miestamo, Kaius Sinnemäki & Fred Karlsson (eds.), Language complexity: Typology, contact, change, 167-190. Amsterdam, Philadelphia: Benjamins.
  51. Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian veres, Matthew K. Gray, The Google books team, Joseph P. Pickett & Dale Hoiberg. 2011. Quantitative analysis of culture using millions of digitized books. Science 331 (6014). 176-182. https://doi.org/10.1126/science.1199644
  52. Miestamo, Matti, Kaius Sinnemäki & Fred Karlsson (eds.). 2008. Language Complexity: Typology, Contact, Change. Amsterdam, Philadelphia: John Benjamins.
  53. Miestamo, Matti. 2008. Grammatical complexity in a cross-linguistic perspective. In Matti Miestamo, Kaius Sinnemäki & Fred Karlsson (eds.), Language complexity: Typology, contact, change, 23-42. Amsterdam, Philadelphia: Benjamins.
  54. Mikolov, Thomas, Kai Chen, Greg Corrado & Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv 1301.3781. URL: https://arxiv.org/abs/1301.3781 (accessed 20.01.2022).
  55. Miranda-Jiménez, Sabino, Alexander Gelbukh & Grigori Sidorov. 2013. Summarizing conceptual graphs for automatic summarization task. In Conceptual Structures for STEM Research and Education. 245-253. Lecture Notes in Computer Science 7735.
  56. Moon, Chang Bae, Jong Yeol Lee, Dong-Seong Kim & Byeong Man Kim. 2020. Multimedia content recommendation in social networks using mood tags and synonyms. Multimedia Systems 26 (6). 1-18. https://doi.org/10.1007/s00530-019-00632-w
  57. Mühlhäusler, Peter. 1974. Pidginization and Simplification of Language. Canberra: Dept. of Linguistics, Research School of Pacific Studies, Australian National University.
  58. Nasirian, Farzaneh, Mohsen Ahmadian & One-Ki D. Lee. 2017. AI-based Voice Assistant Systems: Evaluating from the Interaction and Trust Perspectives. Twenty-third Americas Conference on Information Systems, Boston.
  59. Nassif, Ali Bou, Ismail Shahin, Imtinan Attili, Mohammad Azzeh & Khaled Shaalan. 2019. Speech recognition using deep neural networks: A systematic review. IEEE access 7. 19143-19165. https://doi.org/10.1109/ACCESS.2019.2896880
  60. Nichols, Johanna. 2009. Linguistic complexity: A comprehensive definition and survey. In Geoffrey Sampson, David Gil & Peter Trudgill (eds.), Language complexity as an evolving variable, 64-79. Oxford: Oxford University Press.
  61. Ojokoh, Bolanle & Emmanuel Adebisi. 2018. A review of question answering systems. Journal of Web Engineering 17 (8). 717-758. https://doi.org/10.13052/jwe1540-9589.1785
  62. Ortega, Lourdes. 2003. Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24. 492-518.
  63. Parkvall, Mikael. 2008. The simplicity of creoles in a cross-linguistic perspective. In Matti Miestamo, Kaius Sinnemäki & Fred Karlsson (eds.), Language complexity: Typology, contact, change, 265-285. Amsterdam, Philadelphia: Benjamins.
  64. Patel, Krupa & Hiren B. Patel. 2020. A state-of-the-art survey on recommendation system and prospective extensions. Computers and Electronics in Agriculture 178. 105779. https://doi.org/10.1016/j.compag.2020.105779
  65. Pons Bordería, Salvador & Pascual Aliaga E. 2021. Inter-annotator agreement in spoken language annotation: Applying uα-family coefficients to discourse segmentation. Russian Journal of Linguistics 25(2). 478-506. https://doi.org/10.22363/2687-0088-2021-25-2-478-506
  66. Riley, Michael D. 1989. Some applications of tree-based modelling to speech and language indexing. In Proceedings of the DARPA Speech and Natural Language Workshop. 339-352. San Mateo, CA.
  67. Sahlgren, Magnus. 2008. The Distributional Hypothesis. From context to meaning. In distributional models of the lexicon in linguistics and cognitive science (special issue of the Italian Journal of Linguistics). Rivista di Linguistica 20 (1). 33-53.
  68. Sampson, Geoffrey, David Gil & Peter Trudgill. 2009. Language Complexity as an Evolving Variable. Oxford linguistics. Oxford, New York: Oxford University Press.
  69. Schmidhuber, Jürgen. 2015. Deep learning in neural networks: An overview. Neural Networks 61. 85-117. https://doi.org/10.1016/j.neunet.2014.09.003
  70. Sharnagat, Rahul. 2014. Named Entity Recognition: A Literature Survey. Center for Indian Language Technology.
  71. Shosted, Ryan K. 2006. Correlating complexity: A typological approach. Linguistic Typology 10 (1). 1-40.
  72. Sigdel, Bijay, Gongqi Lin, Yuan Miao & Khandakar Ahmed. 2020. Testing QA systems’ ability in processing synonym commonsense knowledge. IEEE [Special issue]. 24th International Conference Information Visualisation (IV). 317-321. https://doi.org/10.1109/IV51561.2020.00059
  73. Solovyev, Valery & Vladimir Ivanov. 2014. Dictionary-based problem phrase extraction from user reviews. In Petr Sojka, Aleš Horák, Ivan Kopeček & Karel Pala (eds.), Text, speech and dialogue, 225-232. Springer.
  74. Solovyev, Valery D., Vladimir V. Bochkarev & Svetlana S. Akhtyamova. 2020. Google Books Ngram: Problems of representativeness and data reliability. Communications in Computer and Information Science 1223. 147-162. https://doi.org/10.1007/978-3-030-51913-1_10
  75. Su, Xiaoyuan & Taghi M. Khoshgoftaar. 2009. A survey of collaborative filtering techniques. Advances in Artificial Intelligence. 1-19. https://doi.org/10.1155/2009/421425
  76. Tan, Xu, Tao Qin, Frank Soong & Tie-Yan Liu. 2021. A survey on neural speech synthesis. arXiv 2106.15561. URL: https://arxiv.org/pdf/2106.15561.pdf (accessed 20.01.2022).
  77. Tesnière, Lucien. 2015. Elements of Structural Syntax. Amsterdam: John Benjamins Publishing Company.
  78. Trudgill, Peter. 1999. Language contact and the function of linguistic gender. Poznan Studies in Contemporary Linguistics 35. 133-152.
  79. Trudgill, Peter. 2004. Linguistic and Social Typology: The Austronesian migrations and phoneme inventories. Linguistic Typology 8(3). 305-320.
  80. Trudgill, Peter. 2011. Sociolinguistic Typology: Social Determinants of Linguistic Complexity. Oxford: Oxford University Press (reprinted 2012).
  81. Trudgill, Peter. 2012. On the sociolinguistic typology of linguistic complexity loss. In Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts & Paul Trilsbeek (eds.), Language documentation & conservation special publication No. 3 (August 2012): Potentials of language documentation: Methods, analyses, and utilization, 90-95.
  82. Valdez, Cruz & Monika Louize. 2021. Voice Authentication Using Python's Machine Learning and IBM Watson Speech to Text. Universitat Politècnica de Catalunya.
  83. Wang, Yu, Yining Sun, Zuchang Ma, Lisheng Gao, Yang Xu & Ting Sun. 2020. Application of pre-training models in named entity recognition. In 2020 12th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). 23-26. Hangzhou, China.

版权所有 © Solovyev V., Solnyshkina M., McNamara D., 2022

Creative Commons License
此作品已接受知识共享署名-非商业性使用 4.0国际许可协议的许可。

##common.cookie##