Linguistic and statistical analysis of the lexical ‘Langue-Parole’ dichotomy in a restricted domain

封面

如何引用文章

详细

Development of new digital methods for analyzing the ‘Langue-Parole’ dichotomy is one of the most sought-after, but least researched problems of modern theoretical and applied linguistics. This determines the relevance of this study, the purpose of which is to develop a methodology for the automated linguastatistical analysis of a domain-related lexical layer in the context of the ‘Langue-Parole’ dichotomy and to apply the methodology to the Russian-language domain “Research on athlete integrative physiology” (RAIP). The study was conducted on the material of the Russian-language corpus including 56 RAIP domain texts of 300,000 wordforms in total published over the 2013-2020 period in the scientific journals “People. Sport. Medicine” (formerly “SUSU Bulletin. Series “Education, Healthcare, Physical Culture”), “Theory and Practice of Physical Culture”, etc. The key methodological approach is the ontological analysis of corpus data using statistical and linguistic modeling methods. The domain-specific language and speech are modeled by the corresponding lexicon and corpus, while the ‘Langue-Parole’ lexical dichotomy is represented by the values of the linguistic-statistical concept verbalization parameters of the domain concepts in the lexicon and corpus. The computational parameters include the indices of lexical diversity, structural complexity, conceptual syncretism, lexical structural complexity vs. conceptual syncretism correlation, and syncretical concept junction when verbalized in the corpus. The main results of the study are: 1) а methodology for analyzing domain-specific lexical dichotomy ‘Langue-Parole’, which can be ported to other domains and national languages; 2) the RAIP domain-related resources, including language-independent ontology, conceptually annotated Russian corpus, onto-lexicon, linguistic-statistical parameter values of the lexical ‘Langue-Parole’ dichotomy; and 3) tools that automate certain stages of the study.

作者简介

Svetlana Sheremetyeva

South Ural State University

编辑信件的主要联系方式.
Email: sheremetevaso@susu.ru
ORCID iD: 0000-0003-1245-4213

Doctor Habil. in Computational Linguistics, Professor, Head of the Innovative Language Technology R&D center of the Institute of Linguistics and Intercultural Communication at the South Ural State University. She has considerable teaching and research experience acquired both in Russia and abroad. Prof. Sheremetyeva worked as a key researcher and lecturer in computational linguistics at New Mexico State University (USA), Uppsala University (Sweden), and Copenhagen Business School (Denmark). She is a regular participant, reviewer and program committee member of many international conferences on computational linguistics. Her research interests cover a wide range of NLP problems.

Chelyabinsk, Russia

Olga Babina

South Ural State University

Email: babinaoi@susu.ru
ORCID iD: 0000-0002-1733-6075

PhD in Computational Linguistics. She is Head of the Department of Linguistics and Translation and Deputy Head of the Innovative Language Technology R&D center of the Institute of Linguistics and Intercultural Communication at the South Ural State University. Her research interests include corpus linguistics, computational linguistics, natural language processing, as well as text mining and text analysis using machine learning methods.

Chelyabinsk, Russia

参考

  1. Варфоломеев А.П. Психосемантика слова и лингвостатистика текста: метод. рекомендации к спецкурсу. Калининград: Калининградский университет, 2000. [Varfolomeev, Anatoly P. 2000. Psihosemantika slova i lingvostatistika teksta (Psychosemantics of the Word and Linguostatistics of the Text): Guidelines. Kaliningrad: Kaliningrad university Publ. (In Russ.)].
  2. Добров А.В., Доброва А.В., Сомс Н.Л., Чугунов Н.Л. Семантический анализ новостных сообщений по теме «Электронные услуги»: опыт применения методов онтологической семантики // Государство и граждане в электронной среде: теория и технологии исследований. Труды XVIII объединенной конференции «Интернет и современное общество» IMS-2015. Санкт-Петербург: ИТМО, 2015. С. 120-125. [Dobrov, Aleksej V., Anastasija V. Dobrova, Nikolai L. Soms & Andrej V. Chugunov. 2015. Semanticheskii analiz novostnykh soobcshhenii po teme «Elektronnye uslugi»: opyt primeneniya metodov ontologicheskoi semantiki (Semantic analysis of news items on ‘electronic services’ subject domain: Experience of applying methods of ontological semantics). In Gosudarstvo i grazhdane v ehlektronnoi srede: teoriya i tekhnologii issledovanii. Trudy XVIII ob’edinennoi konferentsii «Internet i sovremennoe obshchestvO» IMS-2015. 120-125. Saint-Petersburg: ITMO Publ. (In Russ.)].
  3. Мельчук И.А. Опыт теории лингвистических моделей Смысл ó Текст: Семантика, синтаксис. 2-е изд. М.: Школа «Языки русской культуры», 1999. [Mel'chuk, Igor A. 1999. On the Theory of Linguistic Models “Meaning ⇔ Text”. 2nd ed. Moscow: Shkola «Yazyki russkoi kul'tury». (In Russ.)].
  4. Осипова Л.И. К вопросу о дихотомии «язык-речь» // Актуальные проблемы гуманитарных и естественных наук. 2012. №11. С. 199­-202. [Osipova, Lyudmila I. 2012. K voprosu o dikhotomii “yazyk-rech’” (On the issue of the dichotomy “Langue-Parole”). Aktual'nye Problemy Gumanitarnykh i Estestvennykh Nauk 11. 199­-202. (In Russ.)].
  5. Пименова М.В. Лексико-семантический синкретизм как проявление формально-содержательной языковой асимметрии // Вопросы языкознания. 2011. № 3. С. 19-48. [Pimenova, Marina V. 2011. Leksiko-semanticheskii sinkretizm kak proyavlenie formal'no-soderzhatel'noi yazykovoi asimmetrii (Lexical and semantic syncretism as a manifestation of form- and content-related language asymmetry) // Voprosy yazykoznaniya 11. 19-48. (In Russ.)].
  6. Сысоева А.А. Явление семантического синкретизма (на примере обозначений восприятия в немецком языке в диахронии) // Вестник Московского государственного лингвистического университета. Гуманитарные науки. 2019. T. 817. № 1. С. 317-327. [Sysoeva, Alesia A. 2019. Yavlenie semanticheskogo sinkretizma (na primere oboznachenii vospriyatiya v nemeckom yazyke v diahronii) (Semantic syncretism (on the example of German lexical units denoting perception in diachrony)). Vestnik Moskovskogo gosudarstvennogo lingvisticheskogo universiteta. Gumanitarnye nauki 1 (817). 317-327. (In Russ.)].
  7. Хохлова М.В. Атрибутивные коллокации в золотом стандарте сочетаемости русского языка и их представление в словарях и корпусах текстов // Вопросы лексикографии. 2021. № 21. C. 33-68. [Khokhlova, Maria V. 2021. Attributive collocations in the gold standard of Russian collocability and their representation in dictionaries and corpora. Voprosy Leksikografii 21. 33-68. (In Russ.)].
  8. Чуфарова Е.Н. Юридический язык в дихотомии «язык-речь» // Юридические исследования. 2018. №2. С. 1­-7. [Chufarova, Ekaterina N. 2018. Yuridicheskii yazyk v dikhotomii “yazyk-rech” (Legal language in the ‘language-speech’ dichotomy). Yuridicheskie issledovaniya 2. 1-7. (In Russ.)].
  9. Шнякина Н.Ю. О вербализации событийных концептов // Историческая и социально-образовательная мысль. 2015. Т.7. № 5. Ч. 2. С. 283-288. [Shnjakina, Natal’ja Ju. 2015. O verbalizacii sobytiinyh konceptov (On event concept verbalization). Istoricheskya i social'no-obrazovatel'naya mysl' 7 (5-2). 283-288. (In Russ.)]. https://doi.org/10.17748/2075-9908-2015-7-5/2-283-288
  10. Щерба Л.В. Языковая система и речевая деятельность. М.: Едиториал УРСС, 2004. [Scherba, Lev V. 2004. Yazykovaya sistema i rechevaya deyatel’nost’ (Language system and speech activity). Moscow: Editorial URSS. (In Russ.)].
  11. Alatrish, Emhimed S., Dušan Tošić & Nikola Milenkov. 2014. Building ontologies for different natural languages. Computer Science and Information Systems 11 (2). 623-644. https://doi.org/10.2298/CSIS130429023A
  12. Apresjan, Valentina & Nikolai Mikulin. 2016. Dictionary as an instrument of linguistic research, In Tinatin Margalitadze & George Meladze (eds.), Proceedings of the XVII EURALEX international congress: Lexicography and linguistic diversity, 224-231. Tbilisi: Ivane Javakhishvili Tbilisi State University.
  13. Arp, Robert, Barry Smith & Andrew D. Spear. 2010. Building Ontologies with Basic Formal Ontology. Cambridge, MA: MIT Press.
  14. Carvalho, Victorio A., Joo Paulo A. Almeida, Claudenir M. Fonseca & Giancarlo Guizzardi. 2017. Multi-level ontology-based conceptual modeling. Data & Knowledge Engineering 109 (C). 3-24.
  15. Ceausu, Valentina & Sylvie Després. 2007. Learning term to concept mapping through verbs: A case study. Proceedings of the Semantic Authoring, Annotation and Knowledge Markup Workshop (SAAKM2007) located at the 4th International Conference on Knowledge Capture (KCap 2007), October 28-31, 2007. CEUR Workshop Proceedings 289. Whistler, British Columbia, Canada: CEUR-WS.org.
  16. Chaves, Marcirio S. & Cassia Trojahn. 2010. Towards a Multilingual Ontology for Ontology-Driven Content Mining in Social Web Sites. https://www.researchgate.net/publication/266526035 (accessed 05 December 2022).
  17. Cucerzan, Silviu. 2007. Large-scale named entity disambiguation based on Wikipedia data. Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Prague, Czech Republic. 708-716. Association for Computational Linguistics.
  18. Elworthy, David. 1995. Tagset design and inflected languages. In Steven P. Abney & Erhard W. Hinrichs (eds.), Proceedings of the European chapter of the association for computational linguistics SIGDAT workshop from texts to tags: Issues in multilingual language analysis, 1-10. Dublin: Association for Computational Linguistics.
  19. Embley, David W., Stephen W. Liddle, Deryle W. Lonsdale & Yuri Tijerino. 2011. Multilingual ontologies for cross-language information extraction and semantic search. In Manfred A. Jeusfeld, Lois Delcambre & Tok Wang Ling (eds.), ER'11: Proceedings of the 30th international conference on conceptual modeling, 147­-160. Berlin, Heidelberg: Springer-Verlag.
  20. Erjavec, Tomaž. 2010. Multext-East version 4: Multilingual morphosyntactic specifications, lexicons and corpora. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner & Daniel Tapias (eds.), Proceedings of the seventh conference on international language resources and evaluation (LREC’10), 2544-2547. Valetta, Malta: European Language Resources Association.
  21. Espinoza, Mauricio, Asunción Gómez-Pérez & Eduardo Mena. 2008. Enriching an Ontology with Multilingual Information. The Semantic Web: Research and Applications. ESWC Lecture Notes in Computer Science 5021. 333-347. Berlin, Heidelberg: Springer.
  22. Feldman, Anna, Jirka Hana & Chris Brew. 2006. A cross-language approach to rapid creation of new morpho-syntactically annotated resources. In Proceedings of the fifth international conference on language resources and evaluation (LREC 2006). 549-554. Genoa, Italy: European Language Resources Association.
  23. Galperin, Rina, Shachar Schnapp & Michael Elhadad. 2022. Cross-Lingual UMLS Named Entity Linking using UMLS Dictionary Fine-Tuning. Findings of the Association for Computational Linguistics: ACL 2022. 3380-3390. Dublin, Ireland: Association for Computational Linguistics.
  24. Gauch Jr, Hugh G. 2015. Scientific Method in Practice. New York: Cambridge University Press. https://doi.org/10.1017/CBO9780511815034.
  25. Gnasa, Melanie & Jens Woch. 2002. Architecture of a knowledge based interactive Information Retrieval System. Proceedings of KONVENS 2002. https://konvens.org/proceedings/2002/pdf/12P-gnasa.pdf (accessed 28 November 2022).
  26. Hsieh, Hsiu-Fang & Sarah E. Shannon. 2005. Three approaches to qualitative content analysis. Qualitative Health Research 15 (9). 1277-1288. https://doi.org/10.1177/1049732305276687
  27. Jaccard, Paul. 1901. Étude comparative de la distribuition florale dans une portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37. 547-579. https://doi.org/10.5169/seals-266450.
  28. Mannes, Aaron & Jennifer Golbeck. 2005. Building a Terrorism Ontology. In Proceedings of the ISWC workshop on ontology patterns for the semantic Web 36. https://www.semanticscholar.org/paper/Building-a-Terrorism-Ontology-Mannes-Golbeck/9bcb90e48677e39da7b84939e8c8da2b2a63cde7 (accessed 28 November 2022).
  29. Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross & Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography 3 (4). 235-244.
  30. Montiel-Ponsoda, Elena, Guadelupe Aguado de Cea, Asunción Gómez-Pérez &Wim Peters. 2008. Modelling multilinguality in ontologies. In Proceedings of COLING 2008, Companion volume - Posters. 67-70. Manchester, UK: Coling 2008 Organizing Committee.
  31. Niles, Ian & Adam Pease. 2003. Linking lexicons and ontologies: Mapping WordNet to the suggested upper merged ontology. In Proceedings of the IEEE International Conference on Information and Knowledge Engineering. 412-416.
  32. Nirenburg, Sergei & Viktor Raskin. 2004. Ontological Semantics. Cambridge, MA: MIT Press.
  33. Nivre, Joakim, Igor M. Boguslavsky & Leonid L. Iomdin. 2008. Parsing the SynTagRus treebank of Russian. In Proceedings of the 22nd international conference on computational linguistics (Coling 2008). 641-648. Manchester, UK: Coling 2008 Organizing Committee.
  34. Orosz, György, Attila Novák & Gábor Prószéky. 2014. Lessons learned from tagging clinical Hungarian. International Journal of Computational Linguistics and Applications 5 (1). 129-145.
  35. Petrov, Slav, Dipanjan Das & Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the conference on language resources and evaluation (LREC 2012). 2089-2096. Istanbul, Turkey: European Language Resources Association.
  36. Roberts, Angus, Robert Gaizauskas, Mark Hepple, George Demetriou, Yikun Guo, Ian Roberts & Andrea Setzer. 2009. Building a semantically annotated corpus of clinical texts. Journal of Biomedical Informatics 42 (5). 950-966.
  37. Saussure, Ferdinand de. 1967. Cours de Linguistique Générale. Paris: Payot.
  38. Sheremetyeva, Svetlana. 2012. Automatic extraction of linguistic resources in multiple languages. Proceedings of NLPCS 2012, 9th International Workshop on Natural Language Processing and Cognitive Science in conjunction with ICEIS 2012, Wroclaw, Poland, 44-52.
  39. Sheremetyeva, Svetlana. 2018. Universal computational formalisms and developer environment for rule-based NLP. In Alexander Gelbukh (ed.), Computational linguistics and intelligent text processing: CICLing 2017. Lecture notes in computer science 10761, 67-78. https://doi.org/10.1007/978-3-319-77113-7_5
  40. Solovyev, Vladimir, Marina M. Solnyshkina & Danielle M. McNamara. 2022. Computational linguistics and discourse complexology: Paradigms and research methods. Russian Journal of Linguistics 26 (2). 275-316. https://doi.org/10.22363/2687-0088-30161
  41. Stojanović, Ljiljana, Nenad Stojanovic & Jun Ma. 2007. On the conceptual tagging: An ontology pruning use case. WI '07: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 344-350.
  42. Tsalidis, Christos, Aristides Vagelatos & Giorgos Orphanos A. 2004. An electronic dictionary as a basis for NLP tools: The Greek case, arXiv:cs/0408061 [cs.CL] (accessed 28 November 2022). https://doi.org/10.48550/arXiv.cs/0408061

版权所有 © Sheremetyeva S., Babina O., 2023

Creative Commons License
此作品已接受知识共享署名-非商业性使用 4.0国际许可协议的许可。

##common.cookie##