Corpus Linguistics: Theory Vs Methodilogy

Cover Page

Cite item


The article is devoted to a comprehensive study of the stages of formation and development of corpus linguistics. The purpose of the article is to analyze various scientific approaches to the scientific significance of this linguistic discipline and identify a set of concepts and criteria that form the foundation of this field. Corpus linguistics is one of the most promising and rapidly developing areas of language research. Linguistics of the XIX century set as its goal the study of language as such, and linguistics of the XXI century sees the relevance of the research not in identifying absolute linguistic categories and meanings but in the practical application of linguistic knowledge. The relevance of the article is determined by the fact that the linguistic corpus contains a vast potential, which the scientific community has not fully comprehended since the text as the main object of corpus linguistics in various forms of its implementation is one of the central components systems of language and speech-thinking activity of a modern native speaker of any language. The content and volume of linguistic corpora of various kinds allow obtaining reliable information about the modern and real use of a particular term: the corpus becomes a tool for analyzing the functioning of this term both in the linguistic field of morphology, syntax, and vocabulary and in the theory and practice of translation, identifying the register of its formal or informal usage. The fundamental novelty of this study’s results allows us to speak about the legitimacy of the creation of corpus dictionaries and corpus grammars of a new generation, developed and verified concerning a specific fixed corpus. Simultaneously, the author substantiates the proposition that the corpus nature of dictionaries and grammars increases their reliability and objectivity and avoids the subjectivity that is often characteristic of research-based solely on the intuition of a linguist. The corpus is a medium for obtaining new scientific data, the comprehension of which seems to be a priority for modern linguistic description and necessary in the scientific activity of a modern researcher. From our point of view, this article's relevance and novelty lie in the fact that the expediency of corpus research is an essential requirement of the time, associated with a new quality of linguistic reality and meeting the needs of modern society. The article examines the main stages of the formation of corpus linguistics as a scientific field, characterizes the scientific concepts and approaches inherent in each of these stages, provides an overview of the main conceptual provisions of corpus linguistics within the framework of domestic and foreign linguistics. The author analyzes in detail the polemics between representatives of various scientific directions and reveals the advantages of one or another approach, traces the similarities and differences between approaches to the study of corpora at various historical stages of their formation. The review's focus is the role and place of corpus studies of language in modern linguistics, comparison of the pro and contra arguments of the use of corpus technologies in linguistic description. Considerable attention is paid to the main criteria for the classification of corpora, a brief overview of the most famous corpora in history is offered, and the prospects for their use in various fields of modern language science are discussed.

About the authors

Kamo P. Chilingaryan

Peoples’ Friendship University of Russia (RUDN University)

Author for correspondence.

Senior lecturer in Hotel business and tourism institute

6, Miklukho-Maklaya str., Moscow, Russian Federation, 117198


  1. Melnikov, G.P. (2003). System typology of languages: Principles, methods, models. Moscow: Nauka. (In Russ.).
  2. Plungyan, V.A. (2008). Corpus as a tool and as ideology: on some topics of modern corpus linguistics. Russian language in scientific coverage, 2(16), 7—20. (In Russ.).
  3. Moure, T. & Llisterri, J. (1996). Lenguaje y nuevas tecnologías: el campo de la lingüística computacional In M. Fernández Pérez (coord.) Avances en Lingüística aplicada, Universidade de Santiago de Compostela, Servicio de Publicacións e Intercambio Científico. Santiago de Compostela: Universidade de Santiago de Compostela, Servicio de Publicacións e Intercambio Científico. pp. 147—227. (In Spanish).
  4. Real Academia Española (2001). Diccionario de la lengua española. Madrid: Espasa. URL: (accessed: 10.11.2020). (In Spanish).
  5. Ushakov, D.N. (2012). Explanatory dictionary of the Russian language. URL:корпус (accessed: 29.10.2020).
  6. Villayandre Llamazares, М. (2008). Lingüística con corpus. Estudios humanísticos. Filología, 30, 329—349.
  7. McEnery, T. (2003). Corpus Linguistics In en R. Mitkov (ed.) The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press. pp. 448—463.
  8. McEnery, T. & Wilson, A. (1996). Corpus Linguistics. Edinburgh: Edinburgh University Press.
  9. McEnery, T. & Wilson, A. (2001). Corpus Linguistics. Edinburgh: Edinburgh University Press.
  10. McEnery, T., Xiao, R. & Tono, Y. (2006). Corpus-Based Language Studies. An advanced resource book, London-New York: Routledge. URL: projects/corpus/ZJU/xCBLS/chapters/B03.pdf (accessed: 07.11.2020).
  11. Chomsky, N. (1969). Quine’s empirical assumptions. In D. Davidson & J. Hintikka (Eds.) Words and objections. Essay on the Work of W.V Quine. Dordrecth: D. Reidel. pp. 53—68.
  12. Chomsky, N. (2006). Language and mind. Cambridge.
  13. Abercrombie, D. (1965). Studies in Phonetics and Linguistics. London: Oxford University Press. URL: (accessed: 06.11.2020).
  14. Juilland, A.G., Brodin, D.R. & Davidovitch, C. (1970). Frequency dictionary of French words. Hague—Paris: Mouton.
  15. Biber, D., Conrad, S., Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.
  16. Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic computing, 8(4), 243—257.
  17. Baker, P., Hardie, A. & McEnery, T. (2006). Glossary of Corpus Linguistics. Edinburgh: University Press.
  18. Krasina, E.A. & Novikova, M.L. (2019). Phenomenon of language in the paradigms of functional semantics and linguosemiotics (V Novikov readings. Moscow, April 18—19, 2019). Russian Journal of Linguistics, 23(3), 856—864. DOI: 10.22363 / 2312-9182-2019-23-3-856864. (In Russ.).
  19. Svartvik, J. (1992). Corpus linguistics comes of age In J. Svartvik (ed.) Directions in Corpus Linguistics. Proceedings of Nobel Symposium 82 (Stockholm, 4—8 August, 1991). Berlin—New York: Mouton de Gruyter. pp. 7—13.
  20. Leech, G. (1992). Corpora and theories of linguistic performance In J. Svartvik (ed.) Directions in Linguistics: Proceedings of Nobel Symposium 82 (Stockholm, 4—8 August, 1991). Berlin—New: Mouton de Gruyter. pp. 105—122.
  21. Labov, W. (1969). The logic of non-standard English. Georgetown. Monographs on Language and Linguistics, 22.
  22. Krasina, E.A. & Perfilieva, N.V. (2018). Semantic parameters of quantitative units in differentstructured languages. Cognitive linguistics issues, 1(54), 126—136. doi: 10.20916/18123228-2018-1-126-136. (In Russ.).
  23. Denisenko, V.N., Krasina, E.A. & Perfilieva, N.V. (2016). The principle of double meaning in language and word. Cognitive linguistics issues, 3(48), 103—108. (In Russ.).
  24. Aarts, J. & Meijs, W. (eds.) (1984). Corpus Linguistics. Amsterdam: Rodopi.
  25. Manual for the Corpus of Early English Correspondence Sampler CEECS (1998) Nurmi A. (ed.). Helsinki. URL: (accessed: 06.11.2020).
  26. Taavitsainen, I. & Pahta, P. (1997). Corpus of Early English Medical Writing. Computers in English Linguistics, 21, 71—79.
  27. Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford.
  28. Stubbs, M. (2001). Words and phrases: corpus studies of lexical semantics. Oxford: Blackwell.
  29. Stubbs, M. (2006). Corpus analysis: the state of the art and three types of unanswered question In Hunston, S. & Thompson, G. (eds.) System and corpus: Exploring connections. London: Equinox. pp. 15—36.
  30. Simpson, R. & Swales, J. (2001). Introduction to North American perspective on corpus linguistics at the millennium In R. Simpson and J. Swales (eds.) Corpus linguistics in North America. Selections from the 1999 Symposium. Ann Arbor: The University of Michigan Press. pp. 1—14.
  31. Parodi, G. (2008). Lingüística de Corpus: Una introducción al. ámbito. Revista de Lingüística Teórica y Aplicada, 46(1), 93—119. (In Spanish).
  32. Abaitua, J. (2002). Tratamiento de corpora bilingües In M.A. Martí & J. Llisterri (eds.) Tratamiento del lenguaje natural: tecnología de la lengua oral y escrita. Soria—Barcelona: Fundación Duques de Soria/Edicions de la Universitat de Barcelona. pp. 61—90. (In Spanish).
  33. Aijmer, K. & Altenberg, B. (eds.) (1991). English Corpus Linguistics In Studies in Honour of Jan Svartvik. London: Longman.
  34. Francis, W.N. (1992). Language Corpora B.C. In J. Svartvik (ed.) Directions in Linguistics: Proceedings of Nobel Symposium 82 (Stockholm, 4—8 August 1991). Berlin—New York: Mouton de Gruyter. pp. 17—32.
  35. Sinclair, J. (1996). EAGLES Preliminary recommendations on Corpus Typology. URL: (accessed: 01.11.2020).
  36. Martí Antonín, Mª.A. & Castellón Masalles I. (2000). Lingüística computacional. Barcelona: Edicions Universitat de Barcelona. (In Spanish).
  37. Santalla del Río, M.ª P. (2005). “La elaboración de corpus lingüísticos”, en M. Cal, P. Núñez, I. M. Palacios (eds.): Nuevas tecnologías en Lingüística, Traducción y Enseñanza de lenguas, Universidade de Santiago de Compostela, Servizo de Publicacións e Intercambio Científico, 45—63. (In Spanish).
  38. Zakharov, V.P. (2005). Corpus linguistics. Saint Petersburg. (In Russ.).
  39. Kozlova, N.V. (2013). Linguistic corpus: definition of basic concepts and typology. Novosibirsk State University Bulletin. Series: Linguistics and Communication, 11(1), 76—89. (In Russ.).
  40. Kibrik, A.E., Brykina, M.M., Leontiev, A.P. & Khitrov, A.N. (2006). Russian possessive constructions in the light of corpus-statistical research. Questions of linguistics, 1, 16—45. (In Russ.).
  41. Torruella, J. & Llisterri, J. (1999). Diseño de corpus textuales y orales In J.M. Blecua, G. Clavería, C. Sánchez & J. Torruella (eds.) Filología e informática. Nuevas tecnologías en los estudios filológicos. Barcelona: Milenio Universidad Autónoma de Barcelona, Dpto. de Filología Española. pp. 45—77. (In Spanish).
  42. Krivnova, O.F. (2006). Areas of application of speech corpora and experience of their development In Proceedings of the XVIII Session of the Russian Acoustic Society of RAO. Taganrog. pp. 81—84. (In Russ.).
  43. PRESEEA (2014). Corpus del Proyecto para el estudio sociolingüístico del español de España y de América. Alcalá de Henares: Universidad de Alcalá. URL: (accessed: 01.11.2020). (In Spanish).
  44. Rykov, V.V. (2002). Text corpus as an implementation of the object-oriented paradigm In Proceedings of the international seminar “Dialogue-2002”. Moscow: Nauka. pp. 124—129. (In Russ.).

Copyright (c) 2021 Chilingaryan K.P.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies