Semantics of Big Data in Corporate Management Systems
- Authors: Novikova G.M1, Azofeifa E.J1
-
Affiliations:
- Peoples’ Friendship University of Russia (RUDN University)
- Issue: Vol 26, No 4 (2018)
- Pages: 383-392
- Section: Computer Science
- URL: https://journals.rudn.ru/miph/article/view/20227
- DOI: https://doi.org/10.22363/2312-9735-2018-26-4-383-392
Cite item
Full Text
Abstract
The modern development of engineering, telecommunications, information and computer technologies allows for collecting, processing and storing huge volumes of data today. Among the first applications of Big Data there was the creation of corporate repositories that use gathered information for analysis and strategic decision-making. However, an unsystematic collection of information leads to the storage and processing of a large amount of non-essential data, while important information falls out of the analysts’ view. An important point is the analysis of the semantics and purpose of data collection, which define both the collection technology and infrastructure and the direction of subsequent processing and use of Big Data with the help of metrics that reduce data volume, leaving only essential information to process. As a first step towards this goal, we present a formalization approach of corporate Big Data using a partially observable Markov decision process (POMDP), and we show that it naturally aligns itself with the corporate governance system.
Full Text
1. Background The term Big Data is used today in reference to large volumes and a wide variety of structured and unstructured data. One of the first applications of Big Data was the creation of corporate repositories that use the collected information for analysis and strategic decision-making in Business Intelligence class systems. Today, there are technological opportunities not only to store, process and analyze huge amounts of data, but also to generate and transmit them with the help of technical and telecommunication tools. Various technical devices are sources of data that are used in projects such as the creation of digital cities, digital governments and smart homes. On the other hand, Industry 4.0 is a single concept of industrial production based on product life-cycle management [1] and smart production strategy, which involves the use of the Internet of Things, cloud computing and cybernetic systems. Its creation, as well as the development of technical means of data generation and transmission, expand the use of Big Data in activities of artificially created objects [2], primarily in the creation of new mechanisms that improve the governance system of corporations. However, the increasing amount of information raises the problem of choosing essential, reliable and consistent information. Today, Big Data can not only reduce entropy and improve the quality of the control system, but also contribute to increase the entropy of a system if the system does not have the mechanisms to combat noise that distorts information. When working with information, it is necessary to understand for what purpose Big Data is collected and processed, what is the source of Big Data, how to weed out information that is not essential for a given subject area or a given class of tasks. It is obvious that when working with Big Data, one cannot limit oneself to the statistical theory of information proposed by Shannon. The actual problem is the identification of the semantics of information for purposeful and systematic data collecting and processing, as well as the creation of a particular infrastructure and set of metrics for Big Data. This task is of particular practical importance in the creation and operation of cybernetic systems to which a corporation belongs. Received 8th October, 2018. 2. Contextual semantics of Big Data Semantics, broadly speaking, is the relationship between language expressions and the world, real or imagined. Semantics is connected with pragmatics, and in some cases the semantics of the concept is identical to its pragmatics. Today, it is necessary to give a more precise definition of the concept of “semantics”, replacing units of language with units of information and considering different types of information: colors, sounds, images, linguistic and numerical symbols, and even emotional (tonal) signs, since any information type of the above can be compared to a digital analog. In particular, there already exist such fields of knowledge as phonosemantics - direction in linguistics, suggesting that vocal sounds (phonemes) can carry meaning in themselves, - psychology (semantics) of color, personal semantics, semantics of the artistic image, among others. Another important clarification in the definition of semantics is the presence of the context, without which the semantics of a concept cannot be determined accurately. Montague [3] postulates that there are no words whose identification would be possible and correct in the absence of an environment or context. When defining semantics, it is necessary to determine the semantic object (a symbol, a number, a set of letters, a sentence . . . ), its source (corporate business processes and systems, corporate mail and websites, media, social networks . . . ), its atoms (elementary semantic objects with uniquely determined semantics), its context (semantic fragment necessary for determining the meaning of individual atoms) and other characteristics that establish relationships between semantic objects and contexts (classification signs, movement of resources, internal and external state of objects . . . ). A semantic object has no semantics, is not a semantic atom, or is not a set of semantic atoms, if in a given context there is not an interpretation that has a true meaning. The exact meaning of the semantic object cannot be determined if: a) the context or its meaningful parts are missing, b) the semantic object is not a semantic atom, or c) there is no semantic characteristic given. 3. Enterprise management system as an area of context formation What can be the context for Big Data in a corporate management system (CMS)? First of all, the context is the subject area itself (the type of activity of the company), together with entities such as objects, relationships, properties, activities, object states, and complex entities representing clusters such as situations in enterprise and environment management. Depending on the context, Big Data represents characteristics of the listed entities, as well as the presence or absence of these entities and their properties. However, there is a universal semantic context specific to corporate governance as a type of activity, which can be applied on management in any subject area. Considering the CMS as a field of formation of Big Data context, a corporation is a complex dynamic system destined to achieve long-term profit with sustainable development. Management of such a system involves the presence of different contours, goals and objectives of management which can be granular in essence [3]. The objectives of managing a dynamic system such as a corporation are shown in Fig. 1. The corresponding management systems underpinning the achievement of these goals are shown in Table 1. It is important to understand that, in addition to the development of a corporate system (product line development, new activities and markets, technologies and resources), it is necessary to improve the management system itself. Therefore, Quality management systems (QMS), whose goal is TQM (Total quality management, which includes not only the quality of products and services, technologies and resources, but also the quality of the integrated management system), occupies an important place in an integrated corporate management system (ICMS) (see Fig. 2). Figure 1. Purposes of dynamic system control Management purposes in the control system Table 1 Control system (control cycle) Purpose of management Operational (regular) management system System of strategic management System of crisis management System of situational management Stable operation of the system System development Adaptation and prevention of negative impacts and crisis states Exit from crisis situations Figure 2. Integrated corporate management system In addition to the control cycle, other indicators guide the splitting of ICMS into subsystems or control objects. Subsystems can be selected according to their associated type of control object in the strategic management system, and are distinguished in the operational management system: human resources management system, customer relationship management system, production management system, equipment maintenance and repair management system, etc. In addition to the type of production activity, the control cycle and the control object, the context of Big Data semantics is determined by the task that arises in the management process. Many tasks are defined by a management cycle, but the tasks that make up the Deming cycle (Plan-Do-Check-Analyze-Act) are solved in each cycle: planning, accounting, control, analysis and decision-making [4]. Tasks like prediction, modeling goal-setting are also solved in the strategic management cycle. Other tasks are added in the contour of crisis management: diagnostics of the control object state, and recognition of objects, characteristics and situations. The QMS, on the other hand, solves tasks regarding the development of corrective and preventive actions, and is associated to subsystems such as the investment planning system, analysis and forecasting subsystems, and check and monitoring systems. Requirements to the solution of control problems are becoming more strict, and force a continual change in the management’s priorities for defining objectives. Therefore, the first task is to adapt to internal and external changes in the environment [5] that affect both the properties of the system itself, including the control system, and the product range along with its properties, production technologies, business types and their integration [6], etc. Under these conditions, the need for Big Data analysis and processing is seen not only in the strategic management area, but also in the operational and crisis management areas, especially since the processing of Big Data nowadays can take place in real time [7]. 4. Ontological approach to semantics of Big Data Beer compares isomorphism between corporations to a biological (living) system as the human being [8]. Living organisms in the process of life continually increase their entropy and thus approach the dangerous state of maximum entropy representing death. They remain alive by constantly extracting negative entropy from their environment, which is otherwise called negentropy [9]: negative entropy, what the body feeds on. The means by which the organism maintains itself constantly at a sufficiently high level of order (equal to a sufficiently low level of entropy), in fact, consists in the continuous extraction of order from its environment. By analogy with a living organism, with the growing uncertainty of the external and internal environment, the system expands the size of the information search space, increasing entropy and the probability of making wrong decisions. Big Data can both increase and decrease entropy in the system. On the one hand, it is an increasing information flow of structured and unstructured data, which has non-factors that need to be processed and analyzed in decision-making. On the other hand, the solution of such tasks as classification and clustering on the basis of identifying the semantics of the data, is the means by which the organism maintains itself permanently at a fairly high level of order. So, the complexity of the environment and the control object generates entropy, which can be reduced by means of Big Data processing tools. What is the condition, in which Big Data processing reduces entropy in the system, increases the growth of negentropy, contributing to order increase? The most important factor is a meaningful, ordered collection of Big Data, based on its semantics and in accordance with the selected context [10]. Big Data can also indicate the presence of [11]: § relationships, characteristics, states, situations that develop between the object and in the environment, the appearance of objects with new properties, of new elements and systems states, as well as the emergence of new laws and standards, § change and emergence of new trends in the field of sales, technologies and management, the presence of elements whose properties and purposes contradict the goals and properties of the corporate system, § identification of preferences of users and consumers of products, as well as the compliance or non-compliance of consumer properties of products with declared properties, § violation of the laws of management and operation, mismatch of the degree of diversity between the control system and the control object and the accumulation of facts contradicting the legislative basis of both corporate and branch level. The list is not complete, but it shows how diverse Big Data semantics can be. At the same time, the uncertainty space is reduced when Big Data is projected onto contours, objects, and control tasks. On the other hand, domain ontology and ontology of the ICMS are situated in the core of the definition of Big Data semantics. Ontology combines data into a single system, which in turn completes, verifies, and updates the system [12]. Contextual semantics based, on the ontology of the ICMS and the corresponding subject area, is a method of reducing the diversity of data and the resulting entropy. Ontology gives form to the context, which is the basis for determining the semantics of Big Data, and is also the key to its sorting and transformation (saturation, updating) at the stages of preprocessing and standardization in architecture collection (see Fig. 3). Figure 3. Moving from Big Data to Clear Data A set of threshold values can be established around collected information in ontologies, by means of features such as materiality (significance), context, which is defined by metadata (type of production activity, business task, management cycle, goal, object and task of management), and completeness and sufficiency (necessary to minimize redundancy and duplication of information). Concrete examples of ontology applications can be seen in the artificial intelligence approach to education, where ontologies can serve as mindtools for tutoring systems [13]. On the other hand, fuzzy ontologies can be used to tackle complicated and heterogeneous control tasks with granular properties, and can also function as a linguistic basis to enable effective communication between cognitive agents [14]. 5. Characterization of Big Data in a corporation Laney [15] described the kind of information that we now refer to as Big Data as consisting of three main characteristics: the 3 v’s, or volume (large numbers of records), velocity (the frequency of generation and/or the frequency of handling, recording, and publishing), and variety (structured, semi-structured and unstructured data types). Since then, authors have added factors such as veracity (the level to which the data contains noise, uncertainty and error), value (the level to which insights can be extracted and the data can be repurposed), and several other characteristics gathered in [16]. Concretely, measures have been added concerning exhaustivity (the possibility of a system to capture the entire population within its data generation, rather than a sample); resolution (presentation of the minimal elements instead of aggregates), which can be coarse or fine-grained; indexicality (accompanying metadata that uniquely identifies the device, site and time/date of generation, along with other characteristics); relationality (the possibility to link data that share some common fields and to identify relationships between datasets); extensionality (adaptability, flexibility of data generation), and scalability (the extent to which a system can cope with varying data flow). 6. A POMDP quality model for Big Data environments We proceed to apply a formalization of the quality management system considering the Big Data characterizations present in the literature. In order to correctly portray the various sources of uncertainty in real-world corporations, we model the system using an appropriate probabilistic framework, namely a partially observable Markov decision process (POMDP) [17]. We therefore incorporate the Big Data characteristics as possible sources of uncertainty regarding the obtainment of ideal information from a set of states. Let � be defined as a set of fields and � as a set of data types. Thus, �� : � → � corresponds to a function assigning a data type to each field. On the other hand, we define � as a set of headers, and � ∈ � |� | as a header. Representing � as a set of indexes and � as a set of logs or records, a database - our main information destination - can be obtained by applying the relation � : � × � → �|�|×|�|, which corresponds to the association between headers, indexes and records. In this regard, � = {�0, �1, . . . , �� } represents a set of databases. Roughly speaking, a POMDP consists in a normal Markov Decision Process regarding an agent in the presence of certain states, whose actions result in a probabilistic state transition and a corresponding reward. However, it presupposes an impossibility to directly read a true state of the system; rather, the POMDP generates an observation with an associated probability distribution over a set of states, which is known as a belief state. In the present study, we associate the generation of such belief states in a corporation with the level in which Big Data characteristics are present in its databases. For this purpose, we assigned a symbol to each Big Data characteristic studied: �� for volume, �� for velocity, �� for variety, �� for veracity, �� for exhaustivity, �� for resolution, �� for indexicality, �� for relationality, �� for value, �� for extensionality, and �� for scalability. Defining � as a (finite) set of states, we consider an individual state to be derived from database records. This gives rise to an association between database logs and states in the form of the function �� : �|�| × �|�| → �. From the point of view of an agent with incomplete knowledge, such a set � can be ordered to display perceived similarities between states, so that agents’ observations result in sets of similar states. We consider that such a similarity depends on the contingent configuration of the Big Data factors. Thus, we define a partial order of the states in a state space � as the result of a function ��� : [0, 1]4 → �|�|, which takes as inputs the values �� , ��, �� and �� as characteristics directly concerning the way data is structured in a certain domain. Following this definition, we present some helper functions; namely, �� : � × �|�| → N, a function returning the index of a state in a state space; off : [0, 1] → Z, an observation offset from a real state in a state space poset, depending on ��; and �� : [0, 1]4 → N;, a state neighborhood radius depending on �� and �� . This gives rise to a major equation in the Big Data POMDP. ���* = (�*, �*) = �+off(�� ) ⋃︁ �=�-off(�� ) �� , � = ��(��(�*, �*), �*), (1) where �� ∈ �* and �* = ���(�� , ��, ��, ��). Equation 1 corresponds to a function returning an observation or belief state, i.e. a set of states believed to contain the true state, and is further detailed by a function �� : �|�| → [0, 1]|�|, returning a probability distribution over a set of states. Having formalized the basic functioning of the Big Data POMDP, it follows to define a set � of actions, which give rise to a set of conditional transition probabilities between states with the function �� : � × � × � → [0, 1]. Considering the corporation (concretely, the QMS) as an agent, it is subject to a reward function �� : � × � × [0, 1]4 → R; which depends on the current state, the action taken, and the coefficients �� , ��, �������. The latter are closely related to the capabilities of the corporation’s information systems to promptly react to the significant flow of information in Big Data environments. Finally, we introduce � ∈ [0, 1]: the discount factor. Its functioning is portrayed in the main equation (Eq. 2) of the Big Data POMDP. � = � [︃ ∞ ]︃ ∑︁ �� * �� �=0 , (2) where ���� = ��(��(�), �, �� , ��, ��, ��), � ∈ � is the reward at time step �, and �� : N → � is a function associating a time step to its corresponding state. Equation 2 corresponds to the expected future discounted reward (�): the goal of the system is to perform actions at each time step in order to maximize �. When the discount factor (�) is close to zero, the system focuses only on immediate rewards; on the other hand, when its value approaches one, the system dedicates its actions to increase future rewards. In the QMS, the discount factor represents the balance between the strategic and crisis management system, on the one hand, and the operational and situational management system on the other hand: a value of would mean an equivalent ratio of operational decisions over strategic ones. The difference between strategic and crisis management, and equivalently between operational and situational management, radicates on the set � of actions: the set can be partitioned beforehand according to each system. 7. Conclusion The development of means for formation and transmission of Big Data expands the scope of its use in the activities of corporations, especially in the creation of new mechanisms that improve the corporate governance system. However, if we approach the collection and processing of Big Data without analyzing its semantics, believing that we can solve the variety of problems arising in the management process by identifying structured data and knowledge from the information chaos, we will get an increasing entropy due to non-factors: incompleteness, unreliability, inconsistency. There is a need for a systematic approach to the collection and processing of Big Data, and we consider it to be a new generation of sensors based on data semantics. Semantics, in turn, is determined by the context in which the data is generated and used. In this regard, we performed the first step to the formalization of Big Data in ICMS as an architecture of data collection and a processing procedure. Based on the literature, we described several characteristics that Big Data needs to fulfill, and we constructed a decision model based on them. Concretely, we proposed a partially observable Markov decision process (POMDP) to translate uncertainty in the data to probabilistic observations of states. With the corporation as an agent, we defined a method of parameterization of the model into strategic (crisis) management or operational (situational) management, and a method of differentiation between the actions corresponding to each of those systems.
About the authors
Galina M Novikova
Peoples’ Friendship University of Russia (RUDN University)
Author for correspondence.
Email: novikova_gm@mail.ru
Associate Professor, Candidate of Technical Sciences, Associate Professor of Department of Information Technologies of Peoples’ Friendship University of Russia (RUDN University)
6, Miklukho-Maklaya Str., Moscow, 117198, Russian FederationEsteban J Azofeifa
Peoples’ Friendship University of Russia (RUDN University)
Email: esteban.azofeifa@gmail.com
post-graduate student of Information Technologies of Peoples’ Friendship University of Russia (RUDN University)
6, Miklukho-Maklaya Str., Moscow, 117198, Russian FederationReferences
- V. B. Tarasov, Life-Cycle Management of Products and Enterprises: a Key Aspect of Grid Enterprises Engineering, in: Proceedings of the XVIIth Scientific and Practical Conference IP & UZ-2014, MESI, Enterprise engineering and knowledge management, Moscow, 2014, pp. 245-255, in Russian.
- R. M. Yusupov, B. V. Sokolov, A. I. Ptushkin, A. V. Ikonnikova, S. A. Posturyaev, E. G. Tsivirko, Analysis of the State of Research on the Problems of Life Cycle Management of Artificially Created Objects, in: Proceedings of SPIIRAN 2011, Vol. 16, 2011, pp. 37-109, in Russian.
- R. Montague, Pragmatics and Intensional Logic, Semantics of Modal and Intensional Logic (1981) 223-253.
- V. G. Eliferov, V. V. Repin, Process Approach to Management: Business Process Modeling, Mann Ivanov Ferber, Moscow, 2013, in Russian.
- G. Novikova, Intellectual Technology in Corporate Management Systems, Engine 4 (2012) 58-59, in Russian.
- S. L. Nimmagadda, T. Reiners, L. C. Wood, On Big Data-Guided Upstream Business Research and its Knowledge Management, Journal of Business Research 89 (2018) 143-158.
- X. Zheng, Z. Cai, Real-Time Big Data Delivery in Wireless Networks: A Case Study on Video Delivery, IEEE Transactions on Industrial Informatics 13 (4) (2017) 2048-2057.
- S. Beer, Brain of the Firm, ISNM 978-5-397-00156-4, Librokom, 2009.
- Z. Li, J. Jiang, Entropy Model of Dissipative Structure on Corporate Social Responsibility, IOP Conference Series: Earth and Environmental Science 69 (1) (2017) 012126.
- A. Wahyudi, G. Kuk, M. Janssen, A Process Pattern Model for Tackling and Improving Big Data Quality, Information Systems Frontiers 20 (2018) 457.
- G. Novikova, E. Azofeifa, Domain Theory Verification Using Multi-Agent Systems, Procedia Computer Science 103 (2017) 120-125.
- A. Gladun, J. Rogushina, Ontologies in enterprise systems, Corporate system 1, in Russian.
- T. A. Gavrilova, I. A. Leshcheva, D. V. Leshchev, Use of Ontologies as a Didactic Means, Artificial Intelligence 3 (2000) 34-39, in Russian.
- V. B. Tarasov, A. P. Kalutskaya, M. N. Svyatkina, Granular, Fuzzy and Linguistic Ontologies for Providing Mutual Understanding between Cognitive Agents, Open Semantic Technologies for Intelligent Systems (OSTIS-2012) (2012) 267-278In Russian.
- D. Laney, 3-D Data Management: Controlling Data Volume, Velocity and Variety, Application Delivery Strategies by META Group Inc.
- R. Kitchin, G. McArdle, What Makes Big Data, Big Data? Exploring the Ontological Characteristics of 26 Datasets, Big Data & Society 3 (1) (2016) 1-10.
- G. Monahan, State of the Art-A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms, Management Science 28 (1) (1982) 1-16.