Data visualization in Indian print media: a comparative study of English and Hindi newspapers 1

. The advancing technology is affecting every aspect of life and journalism is also not untouched by this. Due to digitalization, huge amount of data is being generated and the continuous advancement of computer science has made it possible to extract meaningful information by storing and analysing this huge data. The term “data journalism” has become quite popular over the last decade. Analysing data sets, extracting newsworthy information from it and passing it on to the public is data journalism. Data visualization also has a very important place in this whole process. Data visualization is used to communicate information extracted from the data to the users in a clear, interesting and engaging way. The amount of data-based content has started increasing in the news media as well, so the importance of data visualization has also increased. The use of data visualization improves readers’ reading experience and also helps to better understand the data-based content. This preliminary study focuses on the use of data visualizations by English and Hindi newspapers in India. In this research, a comparative study of various aspects of the use of data visualizations in English and Hindi newspapers has been done. Content analysis with quantitative approach has been employed as the research method. This study reveals that there is a big difference in every aspect of the use of data visualizations in English and Hindi newspapers. English newspaper used data visualizations in a better way than their Hindi counterpart.


Introduction
Digitalization, or computerization, is increasing rapidly in all areas of human life and due to this huge amount of data is being generated. The continuous advancement of computer science has made it possible to collect and analyse this huge data and extract useful information from it. In this scenario, the importance of data visualization is also growing rapidly. The main idea of data visualization is to present the data visually in such a way that the human eye can derive meaning from their structure and patterns [1].
Like other fields, journalism also used advanced techniques of data storage, analysis and visualization. As a result, data journalism came into existence. Modern techniques of data analysis enabled the analysis of documents containing thousands and millions of pages and revelation of interrelations among them. As a result, the public was exposed to many big news. Most media organizations are understanding the usefulness of data journalism and are also giving necessary training to their staff [2].
Use of data in journalism is not new but modern technologies have made it possible to extract newsworthy information by holistically analysing large data sets. New technologies have also made data visualization relatively more effective and easier. WikiLeaks's secret data sets are considered to be the beginning of current data journalism, but it is also true that the data journalism is not limited only to the investigative news. Uskali and Kuutti (2015) classified data journalism into two main categories: investigative data journalism and general data journalism. General data journalism is also called as daily data journalism. According to them, investigative data journalism needs a lot of time. It may be few months or in some cases years. It requires advanced level data skills. Unofficial, confidential and data leaks have great significance in this type of data journalism. On the other hand, in general data journalism, journalists have only few hours or days to complete the data stories. Here, basic level data skills are required and public and open access data sets are used. They also talked about one more type of data journalism, realtime data journalism. This type of data journalism is based on the automatic creation of news stories and mainly practised in sports or financial news. At present it is in nascent stage [3].
Although this study is mainly focused on data visualization, the discussion on data journalism is also very important because data visualization is a part of data journalism. Gray, Chambers and Bounegru (2012) explained the process of data journalism and divided it into three main parts -gathering data, filtering data, and visualizing data. According to them, data visualization is an integral part of data journalism and must be used in data-driven storytelling [2]. A data journalist can successfully present a complex data-based story with the help of engaging infographics.
After discussing data journalism and data visualization, it is required now to shift the direction of the discourse towards language related aspects. Hindi and English together occupy more than half of Indian print media in terms of circulation and number of publication as well. According to the latest report by Registrar of Newspapers for India (RNI) (2019), about 41% of the total registered publications are in Hindi and 12% in English. Following the same trend, Hindi has the highest share (45%) in total circulation followed by English (12%). It means that Hindi and English together have 53% share of Indian print media in terms of number of publications. As far as circulation is concerned, these two languages have 57% share of overall circulation of Indian print media [4].
This study is aimed to compare the newspapers of two different languages, English and Hindi. In India, Hindi and English do not differ only due to linguistic features but the socioeconomic dimensions are also different. Readers of English newspapers and readers of Hindi newspapers have different socioeconomic status. So, while comparing the data visualization of Hindi and English newspapers, a brief discussion on the socioeconomic dimensions of this issue is quite relevant.
India has a complex language scenario. Total 22 Indian languages including Hindi have been included in the eighth schedule of the Indian Constitution [5]. Many other languages are also demanding for inclusion in the eighth schedule. However, article 343 (1) of the Indian Constitution declared Hindi as the official language of the union. According to Kumar (2014), the public sphere of India splits into two major divisions: English and Vernacular. Some people describe it as the division of India and Bharat. The distribution of twelve subcategories of socialeconomic classification (SEC) system, generally used by market researchers, shows that the higher categories consist of English speaking public [6].
English is considered as an important tool for academic, economic and social upliftment in India. Even the politicians who raise the sentiments of the pride of mother language send their own children to English medium schools [7]. Now, it is quite clear that English enjoys a superior position in Indian society. This language is considered as a language of higher education, better employment and ensures upper social status. Though Hindi is the most widely spoken language in India, as far as socioeconomic dimensions are concerned, English is enjoying higher status. This situation may also impact the content of newspapers including data visualization of these two languages.

Review of literature
In the field of mass communication, researchers of data journalism have often made data visualization an important part of their studies. Some researchers have also done independent research on data visualization, but their number is relatively less. Highlighting the importance of data journalism Arthur (2010) quoted Tim Berner Lee, the inventor of the World Wide Web and said that news stories of the future would not come from chatting in the bar, but from poring over rows of data. Lee declared that data-driven journalism is the future [8]. Following the same line Glover and Beard (2017) find that data journalism is becoming popular among journalists and media organizations. Data-based reporting has now taken root in media and a large number of journalists are equipping themselves with the skills of data journalism. According to journalists, data-based journalism puts the truth in more credible ways [9].
A number of scholars defined data visualization. According to Kirk (2016), the visual representation of data to facilitate understanding is data visualization [10]. Weber, Engebretsen and Kennedy (2018) also define data visualization as a visual representation of data generated to strengthen the cognitive processing and the social application of the data represented. They considered graphs, charts, maps and timelines, or a combination of these as classic data visualizations [11].
As mentioned above, data visualization is an integral part of data journalism. A number of scholars also expressed the same. According to Uskali and Kuutti (2015), data visualization is an essential part of data journalism and every data journalist should be equipped with at least basic skills of data visualization [3]. Lorenz also found data visualization as one of the three main dimensions of the data journalism [12]. According to Stalph (2018), data visualization is the fundamental feature of data-driven stories. General data-based stories or daily data-driven news can differentiate themselves from traditional news stories by focusing on data sources as main sources and data visualizations [13].

ЖУРНАЛИСТИКА. Новые медиа 557
A number of studies have been conducted on the use of different types of data visualizations in the field of journalism. Stalph (2018) found bar charts as the most popular data visualization followed by maps and line chart [13]. According to Loosen, Reimer and De Silva-Schmidt (2017), charts and maps are the most popular data visualizations because they are quite easy to create and they can be made with free software as well. According to them, availability of time to the journalists also decides the type of visualization [14]. While doing content analysis of data journalism award finalists (2012 to 2015), Young, Hermida and Fulda (2018) also found maps and graphs to be the most commonly used data visualizations [15].

Objectives of the study
The objectives of this study were following: -to compare the use of data visualizations for presenting news in select English and Hindi newspapers; -to find out the different types of data visualizations used in select English and Hindi newspapers; -to examine and compare data sources of data visualizations used in select English and Hindi newspapers; -to analyse the importance given to data visualizations in select English and Hindi newspapers.

Research methodology
Content analysis with quantitative approach was employed to achieve the objectives of this study. Content analysis has been very popular research method in communication research. According to Krippendorff (1989), mass media have been the most important area for the content analysis and the available literature is full of the content analysis studies of newspapers, books, magazines, films, television programming and radio broadcasts [16]. A number of definitions are available for content analysis. A big section of scholars defines content analysis as a quantitative method. According to Berelson (1952), content analysis "is a research technique for the objective, systematic, and quantitative description of the manifest content of communication" [17. P. 18]. Therefore, in this study, content analysis was applied for quantitative description of different aspects of data-driven news content in Indian English and Hindi newspapers.
Sampling. To take the suitable sample for this content analysis, multi-stage sampling approach was applied. The whole Indian English and Hindi newspapers were the population of this study, therefore sampling was required and it was done at the two stages -first, selection of Indian English and Hindi newspapers and second, selection of dates. For selection of newspapers, purposive sampling was employed. According to Wimmer and Dominick (2011), purposive sampling is a nonprobability sampling that chooses units of population with certain features or qualities and filter out those who do not meet these criteria [18]. Following the same, circulation was fixed as the criterion for selection of newspapers and the English and Hindi newspapers with highest circulation were selected as samples for this study. According to the latest report of RNI (2019), the largest circulated multi-edition daily was "Dainik Bhaskar", a Hindi newspaper, and the second largest circulated multi-edition daily was "The Times of India", an English daily [4]. So, New Delhi editions of "The Times of India" and "Dainik Bhaskar" were selected as sample for this study.
As far as selection of dates was concerned, constructed week sampling was applied. According to Riffe, Aust and Lacy (1993) constructed week sampling is more suitable than purely random and consecutive day sampling for the content analysis of newspapers where content of weekdays and weekend may differ or content planning may be different for different days. So, representation of all seven days is required [19]. Hester and Dougall (2007) also advocated for constructed week sampling for better results in news content analysis. According to them, constructed week sampling should be used to control the bias of cyclic trends in news coverage [20]. Hence, constructed week sampling was employed in this study too.
One constructed week was taken from two months, January and February 2020. This constructed week was created by selecting Monday from the first week, Tuesday from the second week, Wednesday from the third week, Thursday from the fourth week, Friday from the fifth week, Saturday from the sixth week and Sunday from the seventh week. It had representation of all seven days.

Findings and analysis
In this section, key findings of this study will be discussed. These findings will present a comparative picture of English and Hindi newspapers in the context of data visualization.
Use of data visualization. Findings revealed that the English newspaper ("The Times of India") used data visualization significantly more than the Hindi newspaper ("Dainik Bhaskar"). Figure 1 illustrates this difference clearly. In the sample taken for study, total 121 data visualizations were used by both the newspapers. The English newspaper used data visualization almost three times more than its Hindi counterpart. From this it is clear that English newspapers give Themes/subjects of data visualizations. Theme/subject wise analysis of data visualizations was also one of the objectives of this study. Table 1 presents the findings related to it. The findings showed that both Hindi and English newspapers made maximum use of data visualization to present information related to business, economics and finance. This subject/theme got the highest share in both English and Hindi newspapers. However, the English newspaper gave it a lower share than Hindi. Around 43% of data visualizations used in Hindi newspaper was related to business, economics and finance. On the other hand this theme got around 32% share in English newspaper. Politics got the second position (approx. 25%) in English whereas environment captured the second highest share (approx. 23%) in Hindi. Developmental issues got third highest share in both English and Hindi newspapers.
Findings also stated that English newspaper used data visualization to present information related to more subjects than Hindi. Subject related diversity was high in its visualizations. The table above shows that "The Times of India" used data visualizations in eight theme-related categories, while "Dainik Bhaskar" in only six. In the last category "Other", where the remaining topics are included, the share of English newspaper is almost twice that of Hindi.
Types of usage of data visualizations. Newspapers use data visualizations in three different ways. First, visualization with story. In this, data visualizations are used along with a news story. It means that there is a data-driven story which is written in the text and visualizations are used to present its key information. The second way is only visualization or stand-alone visualization. As the name itself suggests, only visualizations are used in this method. All information is given only by visualizations. In this type, there is a headline and the whole story is presented only through data visualizations. Text is used in very small amounts. In this method of using data visualizations, text is used as bullet points or to a very limited extent to make visualizations a bit clearer. The third and last type is relevant visualization. In this type, data visualizations also come with text story but they do not directly present the information of the news story. They present some background information related to that news or some relevant data in the context of that story. Figures 2 and 3 illustrate the usage types of data visualizations in English and Hindi newspapers respectively.  The English newspaper has recognized data visualizations more as an independent tool of storytelling than its Hindi counterpart. It has used it more seriously. "The Times of India" used 54% data visualizations as independent tool to present the story. Visualization with story got the second highest share followed by Dainik Bhaskar (Hindi) relevant visualization. "Dainik Bhaskar", a Hindi newspaper, used data visualizations mostly (67%) with stories. Only visualization/stand-alone visualization category got the second rank followed by relevant visualization. Types of data visualizations. There are many different types of data visualizations that can be used by newspapers to present information in a more interesting, easy to understand and creative way. Table 2 shows the different types of visualizations used by the English and Hindi newspapers. Note. Percentages were rounded to two decimal digits.
Findings suggest that English newspaper was more creative and sincere than its Hindi counterpart in utilization of data visualizations. It gave more importance to data visualization and also work harder on this side. This study shows that English newspaper used ten different types of data visualizations while Hindi newspaper only five. It means, English newspaper's data visualization had almost twice the variety than Hindi.
"The Times of India" used bar charts the most (approx. 22%). Bar chart with table ranked second (approx. 16%). The line chart (approx. 15%) and table (approx. 14%) ranked third and fourth respectively. One thing to notice is that the use of bar chart with table, line chart and table remained almost the same. The difference between them was quite small. Hindi newspapers, on the other hand, dependent mainly on tables (80%). Bar chart got the second highest share (10%) but there was huge difference between the use of table and bar chart. Pie chart, column chart, and map were used equally but significantly less.
Data sources of data visualizations. It is quite clear that data visualizations are based on data and the sources of these data is also a variable of this study. Table 3 shows the data sources of data visualizations used in both English and Hindi newspapers.
The difference between English and Hindi newspapers is also visible in the case of data sources for data visualizations. Here, English newspaper has given great importance (41.76%) to the government and its various agencies. On the other hand, Hindi newspaper has used relatively less government sources (23.33%). Both newspapers have lagged far behind in generating their own data. There is also a category "Other" in this analysis which includes all the sources other than the categories given above, for example: private companies, NGOs, private research institutes, etc. Data visualizations in which the source was not explicitly mentioned have also been put in "Other" category. During the study, it was observed that the Hindi newspaper did not explicitly mention the data sources in many more places than the English. Placement of data visualizations. The importance of a news item or element in newspapers is also determined by its placement. If a news item or element finds a place on the first page, it is considered more important. The inside pages are less important than the front page. But this honour of the first page has started to be challenged. Jacket pages have now started trending to attract advertisers. These jacket pages are placed even before the first page and advertisements are printed on them. Earlier these jackets were without a masthead, but now they come with a masthead to further attract advertisers. The jackets have reduced the importance of first page. If two jacket pages are put on a day, then the first page actually becomes the fifth page. But if you look at the page numbers printed on the newspaper, they start from the first page only [21]. Apart from full jacket, half jacket is also popular. Many times, half jackets are also attached along with full jackets. Half jackets come with advertisements as well as news stories. Both newspapers included in this study use jackets. The importance that various newspapers give to data visualizations can also be understood from the pages on which they place them. That is why in this research the placement of visualizations has also been studied. Figures 4 and 5 illustrate this.
The placement of data visualizations also suggests that English newspaper attach more importance to them. Since the jacket pages come even before the first page, their importance cannot be considered less than the first page. Hence the findings can be studied by combining the jacket pages and the first page to determine the importance. It is clear from the chart above that the English newspaper placed more than half (57%) of data visualizations on the jacket pages or the first page. On the other hand, Hindi newspaper placed almost two-thirds of the data visualizations on the inside pages. Only 33% of the visualizations could get place on the jacket or the first pages. During the study period, the English newspaper "The Times of India" appeared almost daily with half jacket pages. Sometimes two half jacket pages were also published. These half jacket pages were printed with half mastheads and also carried news along with advertisements. This newspaper published a large amount of data visualization based news content on the other side of the half jacket.

Conclusion
This study concluded that both English and Hindi newspapers used data visualizations, but English newspaper used them in greater quantity, more seriously and with more creativity than its Hindi counterpart. Almost three times more data visualizations were used in English newspaper in comparison to Hindi. It was observed that data visualizations were used the most in business, economics and finance related news. This trend remained the same in both English and Hindi newspapers, but the English newspaper had higher subject related diversity.
The English newspaper has recognized data visualization more as an independent storytelling tool than its Hindi counterpart. Data visualizations were also used more creatively in English newspaper. "The Times of India" used ten different types of data visualizations, while "Dainik Bhaskar" only five. It means, English newspaper's data visualization had almost twice the variety than Hindi. Bar chart emerged as the most popular data visualization in English paper whereas Hindi newspaper used table the most. As far as data sources are concerned, newspapers of both the languages have lagged far behind in generating their own data. English newspaper has given more importance to the government and its various agencies as data sources than its Hindi counterpart. It was also observed that the Hindi newspaper did not explicitly mention the data sources in many more places than the English newspaper.
The placement of data visualizations also indicates that the English newspaper gave more importance to them. More than half of data visualizations were placed on the jacket pages or the first page. On the other hand, the Hindi newspaper placed almost two-thirds of the data visualizations on the inside pages.
Finally, this study concludes that despite having almost four times more circulation than their English counterparts, Hindi newspapers lag behind in using data visualization efficiently. It negatively affects the reading experience of Hindi newspaper readers as effective use of data visualization improves readers' reading experience and also helps in better understanding of the data-based news content.