Distinctive Lexical Patterns in Russian Patient Information Leaflets: A Corpus-Driven Study

Cover Page

Cite item


This methodologically-oriented corpus-driven study focuses on distinctive patterns of language use in a specialized text type, namely Russian patient information leaflets. The study’s main goal is to identify keywords and recurrent sequences of words that account for the leaflets’ formulaicity, and - as a secondary goal - to describe their discoursal functions. The keywords were identified using three methods (G2, Hedges’ g and Neozeta) and the overlap between the three metrics was explored. The overlapping keywords were qualitatively analyzed in terms of discoursal functions. As for the distinctive multi-word patterns, we focused on recurrent n-grams with the largest coverage in the corpus: these were identified using the Formulex method (Forsyth, 2015b), which provides complementary data with respect to more conservative n- gram and lexical bundles approaches. The results revealed that the most distinctive keywords were identified using Hedges’ g metric, that the largest overlap occurred between G2 and Neozeta metrics, and that the frequent use and discoursal functions of the identified lexical patterns correspond with situational contexts and communicative purposes of patient information leaflets. It is hoped that this study will provide an opportunity for a methodological reflection and inspire further corpus-driven research on distinctive recurrent lexical patterns (e.g., keywords, n-grams, lexical bundles) or - more generally - on formulaic language in texts originally written in Russian.

About the authors

Łukasz Grabowski

University of Opole; University of Ostrava

Email: lukasz@uni.opole.pl
pl. Kopernika 11a, Opole, 45-040, Poland; Dvořákova 7, Ostrava 1, 701 03, Czechia


  1. Altenberg, Bernd (1998). On the phraseology of spoken English: The evidence of recurrent word combinations. In: A. Cowie (ed.), Phraseology: Theory, Analysis and Applications. Oxford: Oxford University Press. 101-122.
  2. Amosova, Natalia (1963). Ocnovy angliiskoi frazeologii [Fundamentals of English Phraseology]. Leningrad: Izdatel’stvo Leningradskogo Universiteta (cited in Cowie 1998, 5-6, 215).
  3. Anic’kov, Igor’ (1992). Idiomatika i Semantika [Idiomatics and semantics]. Voprosy Jazykoznanija, 5, 136-150 (cited in Dobrovolskij & Filipenko 2007, 715).
  4. Appel, Randy and Trofimovich, Pavel (2015). Transitional probability predicts native and non-native use of formulaic sequences. International Journal of Applied Linguistics. Article first published online: 29 Jan 2015 (accessed on 26 February 2015).
  5. Baker, Paul (2010). Sociolinguistics and Corpus Linguistics. Edinburgh: Edinburgh University Press.
  6. Baker, Paul, Gavin Brookes, and Craig Evans (2019). The Language of Patient Feedback: A Corpus Linguistic Study of Online Health Communication. London: Routledge.
  7. Bestgen, Yves (2018). Evaluating the frequency threshold for selecting lexical bundles by means of an extension of the Fisher’s exact test. Corpora, 13(2), 205-228.
  8. Biber, Douglas (2006). University Language. A corpus-based study of spoken and written registers. Amsterdam: John Benjamins.
  9. Biber, Douglas, Susan Conrad, and Viviana Cortes (2003). Lexical bundles in speech and writing: An initial taxonomy. In Andrew Wilson, Paul Rayson, & Tony McEnery ( eds.), Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech. Frankfurt/Main: Peter Lang, 71-92.
  10. Biber, Douglas, Susan Conrad, and Viviana Cortes (2004). If you look at..: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25 (3), 371-405.
  11. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan (1999). The Longman grammar of spoken and written English. London: Longman.
  12. Bogusławski, Andrzej (1976). O zasadach rejestracji jednostek języka. Poradnik językowy 8, 356-364.
  13. Bogusławski, Andrzej (1978). Jednostki języka a produkty językowe. Problem tzw. orzeczeń peryfrastycznych. In: Mieczysław Szymczak (ed.), Z zagadnień słownictwa współczesnego języka polskiego. Wrocław: Zakład Narodowy im. Ossolińskich, 15-30.
  14. Buerki, Andreas (2017). Frequency consolidation among word N-grams: a practical procedure. In: Ruslan Mitkov (ed.), Computational and Corpus-Based Phraseology, Lecture notes in Computer Science, vol. 10596. Cham: Springer, 432-446.
  15. Burrows, John (2007). All the way through: testing for authorship in different frequency strata. Literary and Linguistic Computing, 22 (1), 27-48.
  16. Cacchiani, Silvia (2006). Dis/similiarities between Patient Information Leaflets in Britain and Italy: Implications for the Translator. New Voices in Translation Studies, 2, 28-43.
  17. Cacchiani, Silvia (2016). On intralinguistic translation from summaries of product characteristics to patient information leaflets. In: Giuliana Elena Garzone, Dermot Heaney & Giorgia Riboni (eds), LSP Research and Translation across Languages and Cultures. Newcastle upon Tyne: Cambridge Scholars Publishing, 219-251.
  18. Chlebda, Wojciech (1991). Elementy frazematki: wprowadzenie do frazeologii nadawcy. Opole: Wydawnictwo WSP.
  19. Chlebda, Wojciech (2009). Idiomatykon 4: gdzie jesteśmy, dokąd zmierzamy (i parę zdań o tym, skąd przychodzimy). In: Wojciech. Chlebda (ed.), Podręczny idiomatykon polsko- rosyjski 4. Opole: Wydawnictwo Uniwersytetu Opolskiego, 9-38.
  20. Chlebda, Wojciech (2010). Nieautomatyczne drogi dochodzenia do reproduktów wielowyrazowych. In: Wojciech Chlebda (ed.), Na tropach reproduktów: w poszukiwaniu wielowyrazowych jednostek języka. Opole: Wydawnictwo Uniwersytetu Opolskiego, 15-35.
  21. Clerehan, Rosemary, Di Hirs and Rachelle Buchbinder (2009). Medication information leaflets for patients: the further validation of an analytic linguistic framework. Communication & Medicine 6 (2), 117-128.
  22. Cowie, Anthony (1998). Phraseology: Theory, analysis and applications. Oxford: Clarendon Press.
  23. Craig, Hugh and Kinney, Arthur F. (eds.) (2009). Shakespeare, Computers, and the Mystery of Authorship. Cambridge: Cambridge University Press.
  24. Dobrovolskij, Dmitri, and Tatjana Filipenko (2007). Russian phraseology. In: Harald Burger (ed.), Phraseologie: ein internationals Handbuch zeitgenoessischer Forschung, Vol. Berlin: Walter de Gruyter, 714-727.
  25. Dunning, Ted (1993). Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19 (1), 61-74.
  26. Eder, Maciej (2016). Słowa znaczące, słowa kluczowe, słowozbiory - o statystycznych metodach wyszukiwania wyrazów istotnych. Przegląd Humanistyczny, 3, 31-44.
  27. Ellis, Paul (2010). The Essential Guide to Effect Sizes: Statistical Power, Meta- Analysis, and the Interpretation of Research Results. Cambridge: Cambridge University Press.
  28. Erman, Britt and Warren, Beatrice (2000). The idiom principle and the open choice principle. Text, 20 (1), 29-62.
  29. Forsyth, Richard (2014a). Keysoft. Available at: http://www.richardsandesforsyth.net/software.html (accessed on 14 March 2017).
  30. Forsyth, Richard (2014b). Keysoft. User notes http://www.richardsandesforsyth.net/docs/formulib.pdf (accessed on 14 March 2017).
  31. Forsyth, Richard (2015a). Formulib: Formulaic Language Software Library. Available at: http://www.richardsandesforsyth.net/zips/formulib.zip (accessed on 30 November 2015).
  32. Forsyth, Richard (2015b). Formulib: Formulaic Language Software Library. User notes http://www.richardsandesforsyth.net/docs/formulib.pdf (accessed on 2 November 2015).
  33. Forsyth, Richard and Sharoff, Serge (2014). Document dissimilarity within and across languages: A benchmarking study. Literary and Linguistic Computing, 29 (1), 6-22.
  34. Forsyth, Richard, and Łukasz Grabowski (2015). Is there a formula for formulaic language? Poznań Studies in Contemporary Linguistics, 54 (1), 511-549.
  35. Foster, Pauline (2001). Rules and routines: A consideration of their role in the task-based language production of native and non-native speakers. In Martin Bygate, Peter Skehan and Merill Swain (eds.), Researching pedagogic tasks: Second language learning, teaching, and testing. Harlow: Longman, 75-93.
  36. Gabrielatos, Costas and Marchi, Anna (2011). Keyness. Matching metrics to definitions. Paper presented at the conference Corpus Linguistics in the South: Theoretical-methodological challenges in corpus approaches to discourse studies - and some ways of addressing them. Portsmouth, United Kingdom, 5 Nov 2011. Available at: http://repository.edgehill.ac.uk/4100/7/Gabrielatos% 26Marchi-Keyness-2011.pdf (accessed 15 October 2012).
  37. Gabrielatos, Costas (2018). Keyness Analysis: nature, metrics and techniques. In Charlotte Taylor and Anna Marchi (eds.), Corpus Approaches to Discourse: A Critical Review. Oxford: Routledge, 225-258.
  38. Gałkowski, Błażej (2006). Kompetencja formuliczna a problem kultury i tożsamości w nauczaniu języków obcych. Kwartalnik Pedagogiczny, 4, 163-180.
  39. Goźdź-Roszkowski, Stanisław (2011). Patterns of Linguistic Variation in American Legal English. A Corpus-Based Study. Frankfurt am Main: Peter Lang Verlag.
  40. Grabowski, Łukasz (2014). On Lexical Bundles in Polish Patient Information Leaflets: A Corpus-Driven Study. Studies in Polish Linguistics, 19 (1), 21-43.
  41. Grabowski, Łukasz (2015a). Keywords and lexical bundles within English pharmaceutical discourse: a corpus-driven description. English for Specific Purposes, 38, 23-33.
  42. Grabowski, Łukasz (2015b). Phrase frames in English pharmaceutical discourse: a corpus- driven study of intra-disciplinary register variation. Research in Language, 3, 266- 291.
  43. Grabowski, Łukasz (2015c). Phraseology in English Pharmaceutical Discourse: A Corpus-Driven Study of Register Variation. Opole: Wydawnictwo Uniwersytetu Opolskiego.
  44. Grabowski, Łukasz (2018). Kilka słów o formuliczności z różnych perspektyw językoznawczych. In: Alicja Pstyga, Tatiana Kananowicz and Magdalena Buchowska (eds.), Słowo z perspektywy językoznawcy i tłumacza. Tom VII. Frazeologia z perspektywy językoznawcy i tłumacza. Gdańsk: Wydawnictwo Uniwersytetu Gdańskiego, 67-76.
  45. Grabowski, Łukasz and Jukneviciene, Rita (2016). Towards a refined inventory of lexical bundles: an experiment in the Formulex method. Kalbu Studijos/Studies About Languages, 29, 58-73.
  46. Granger, Sylviane and Meunier, Fanny (2008). Introduction: The many faces of phraseology. In: Sylviane Granger & Fanny Meunier (eds.), Phraseology: An interdisciplinary perspective. Amsterdam: John Benjamins, xix-xxx.
  47. Hardie, Andrew (2014). Statistical identification of keywords, lockwords and collocations as a two-step procedure. Paper delivered at the ICAME 35 conference, Nottingham, UK, March 2014. Available at: http://www.nottingham.ac.uk/conference/fac- arts/english/icame-35/documents/ icame35-book-of-abstracts.pdf (accessed on 15 March 2017).
  48. Hedges, Larry (1981). Distribution Theory for Glass’s Estimator of Effect Size and Related Estimators. Journal of Educational Statistics, 6 (2), 107-128.
  49. Hyland, Kenneth (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27, 4-21.
  50. Ivanov, Vyacheslav (1957). Lingvisticheskie vzglyady E. D. Polivanova [Linguistic views of E.D. Polivanov]. Voprosy Jazykoznanija, 3, 55-76. Available at: http://vja.ruslang.ru/archive/1957-3.pdf (accessed 5 August 2019).
  51. Kecskes, Istvan (2016). Deliberate Creativity and Formulaic Language Use. In: Keith Allan, Alessandro Capone amd Istvan Kecskes (eds.), Pragmemes and Theories of Language Use. Berlin: Springer, 3-20.
  52. Kilgarriff, Adam (2009). “Simple maths for keywords”. In: Michaela Mahlberg, Victorina González-Díaz and Catherine Smith (Eds), Proceedings of Corpus Linguistics Conference CL2009. University of Liverpool, UK, July 2009. Available at: https://www.sketchengine.co.uk/wpcontent/ uploads/2015/04/2009-Simple-maths-for- keywords.pdf (accessed on 12 March 2017).
  53. Kilgarriff, Adam, Vit Baisa, Jan Bušta, Milos Jakubícek, Vojtech Kovář, Jan Michelfeit, Pavel Rychlý and Vit Suchomel (2014). The Sketch Engine: ten years on. Lexicography 1 (1), 7-36.
  54. Kunilovska, Maria, Natalia Morgoun and Alexey Pariy (2018). Learner vs. professional translations into Russian: Lexical profiles. Translation & Interpreting, 10 (1), 33-52. Available at: https://trans-int.org/index.php/transint/article/view/585/304 (accessed on 16 December 2018).
  55. Mel’cuk, Igor’ (1995). Phrasemes in language and phraseology in linguistics. In: Martin Everaert, Erik-Jan van der Linden, Andre Schenk and Rob Schreuder (eds.), Idioms: Structural and Psychological Perspectives. Hillsdale: Lawrance Eribaum Associates, 167-232. Available at: http://bookre.org/reader?file=1500171&pg=175 (accessed 10 March 2014).
  56. Mel’cuk, Igor’ (1998). Collocations and Lexical Functions. In: Anthony Cowie (ed.), Phraseology: Theory, analysis and applications. Oxford: Clarendon Press, 21-53.
  57. Montalt Resurrecció, Vicent and Gonzalez Davies, Maria (2007). Medical Translation Step by Step. Translation Practices explained. Manchester: St. Jerome Publishing.
  58. Moon, Rosamund (2007). Corpus linguistic aspects of phraseology. In: Harald Burger (ed.), Phraseologie: ein internationales Handbuch zeitgenoessischer Forschung Vol. 2, Berlin: Walter de Gruyter, 1045-1059.
  59. Murakami, Akira, Paul Thompson, Susan Hunston and Dominik Vajn (2017). ‘What is this corpus about?’: using topic modelling to explore a specialised corpus. Corpora, 12 (2), 243-277.
  60. Myles, Florence and Cordier, Caroline (2017). Formulaic Sequence(fs) Cannot be an Umbrella Term in SLA: Focusing on Psycholinguistic FSs and Their Identification. Studies in Second Language Acquisition, 39, 3-28. Available at: https://www.cambridge.org/core/services/aop-cambridge- core/content/view/AFCD7233ACEC89C2A4314392127C5967/S027226311600036 Xa.pdf/div-class-title-formulaic-sequence-fs-cannot-be-an-umbrella-term-in-sla-div.pdf (accessed on 12 December 2016).
  61. Nam, Daehyeon and Lee, Sungmin (2016). Lexical bundles in spoken and written Russian. Corpus Linguistics Research, 2, 46. Available at: http://www.kacl.or.kr/read.php?pageGubun=journal search&pageNm=article&search=&journal=Vol.%202&code=286336&issue=21290&Pa ge= 2&year=2016&searchType=&searchValue= (accessed on 12 March 2017).
  62. Nelson, Robert (2018). How ‘chunky’ is language? Some estimates based on Sinclair’s Idiom Principle. Corpora, 13(3), 431-460.
  63. O'Donnell, Matthew Brook (2011). The adjusted frequency list: A method to produce cluster- sensitive frequency lists. ICAME Journal, 35, 135-169.
  64. Pęzik, Piotr (2013). Wybrane aspekty reprezentatywności małych i średnich korpusów. In: Wojciech Chlebda (ed.), Na tropach korpusów. W poszukiwaniu optymalnych zbiorów tekstów. Opole: Wydawnictwo Uniwersytetu Opolskiego, 45-58.
  65. Pęzik, Piotr (2015). Using n-gram independence to identify discourse-functional lexical units in spoken learner corpus data. International Journal of Learner Corpus Research, 1 (2), 242-255.
  66. Pęzik, Piotr (2018). Facets of prefabrication. Perspectives on modelling and detecting phraseological units. Łódź: Wydawnictwo Uniwersytetu Łódzkiego.
  67. Phillips, Martin (1989). Lexical Structure of Text. Discourse Analysis Monographs 12. Birmingham: University of Birmingham (cited in Scott 2001: 110).
  68. Rosenfeld, Barry and Penrod, Steven (2011). Research Methods in Forensic Psychology. London: John Wiley and Sons (cited in Gabrielatos & Marchi 2011).
  69. Schmitt, Norbert and Carter, Ronald (2004). Formulaic sequences in action: An introduction. In: Norbert Schmitt (ed.), Formulaic Sequences: Acquisition, Processing and Use. Amsterdam: John Benjamins, 1-22.
  70. Scott, Michael (1996-2017). WordSmith Tools. Liverpool: Lexical Analysis Software. Available at: http://www.lexically.net/wordsmith/ (accessed on 30 May 2017).
  71. Scott, Michael (2001). Mapping key words to problem and solution. In Michael Hoey, Michael Scott and Geoff Thompson (eds.), Patterns of text: In Honour of Michael Hoey. Amsterdam: John Benjamins, 109-127.
  72. Scott, Michael (2008). WordSmith Tools Help. Liverpool: Lexical Analysis Software.
  73. Sinclair, John (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
  74. Silge, Julia and Robinson, David (2017). Text Mining with R. A Tidy Approach. [Section 6: Topic modelling]. Sebastopol: O’Reilly Media.
  75. Stubbs, Michael (2011). Three concepts of keywords. In Michael Scott and Marina Bondi (eds.), Keyness in Texts. Amsterdam: John Benjamins, 21-42.
  76. Vinogradov, Victor (1947/1977). O osnovnykh tipakh frazeologicheskikh edinits v russkom yazyke [About Basic Types of Phraseological Units in Russian]. In Alexey Shakhmatov (ed.), Сборник статей и материалов [Collection of Papers and Materials]. Moscow: Nauka, 339-364 (cited in Cowie 1998, 2-4 and Dobrovolskij & Filipenko 2007, 714). Available at: http://www.philology.ru/linguistics2/vinogradov-77d.htm (accessed on 10 August 2012).
  77. Wood, David (2015). Fundamentals of Formulaic Language. London: Bloomsbury.
  78. Wood, David (ed.) (2010a). Perspectives on Formulaic Language: Acquisition and Communication. London: Continuum.
  79. Wood, David (ed.) (2010b). Formulaic Language and Second Language Speech Fluency. Background, Evidence and Classroom Applications. London: Continuum.
  80. Wray, Allison and Perkins, Michael (2000). The functions of formulaic language: an integrated model. Language & Communication, 20, 1-28.
  81. Wray, Allison (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
  82. Wray, Allison (2008). Formulaic language. Pushing the boundaries. Oxford: Oxford University Press.
  83. Wray, Allison (2009). Identifying formulaic language. Persistent challenges and new opportunities. In Roberta Corrigan, Edith Moravcsik, Hamid Ouali and Kathleen Wheatley (eds.). Formulaic Language. Vol. 1. Distribution and historical change. Amsterdam: John Benjamins. 27-51.

Copyright (c) 2019 Grabowski Ł.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies