Simulation of non-stationary event flow with a nested stationary component

A method for constructing an ensemble of time series trajectories with a non-stationary flow of events and a non-stationary empirical distribution of the values of the observed random variable is described. We consider a special model that is similar in properties to some real processes, such as changes in the price of a financial instrument on the exchange. It is assumed that a random process is represented as an attachment of two processes – stationary and non-stationary. That is, the length of a series of elements in the sequence of the most likely event (the most likely price change in the sequence of transactions) forms a non-stationary time series, and the length of a series of other events is a stationary random process. It is considered that the flow of events is non-stationary Poisson process. A software package that solves the problem of modeling an ensemble of trajectories of an observed random variable is described. Both the values of a random variable and the time of occurrence of the event are modeled. An example of practical application of the model is given.


Introduction
In [1]- [4], a model is presented for predicting the sample distribution function of a non-stationary time series over a certain horizon determined by the level of non-stationary series. The non-stationarity level is a special statistic that is collected from end-to-end samples of a given length, in the form of a distribution of distances between sample distributions in the C norm. The result of these works was the creation of a software package that generates an ensemble of time series trajectories, the distribution of which evolves in accordance with a kinetic equation that preserves the normalization and meets the observed properties of the series: preserving the trend or changing it to the opposite. The time in these works was considered to be the sequence number of the event, i.e. the observation of a random process was carried out at constant intervals.
In practice, there are often situations where time intervals themselves are a random process. This is the specifics of Queuing systems, a special case of which is the dynamics of exchange transactions [5]- [7]. The purchase price of a financial instrument and the time interval between two consecutive acts of sale are two dimensions that characterize this time series. Similar properties are found in the series of durations of telephone or Internet connections, sequences of earthquake magnitudes, polluting emissions in megacities, and other events, the moments of occurrence of which, as well as their values, are random.
Stock market forecasting using time series analysis has been considered by a vast number of research papers. Among the works most related to the topic we should note [8], which analyses point process models that account for the market noise, and various applications of theoretical models [9], [10] towards describing the price movements of financial instruments. Notably, neither of the existing models considers a possibility of nonparametric simulation for the ensemble trajectory analysis of two-dimensional time series (a moment of the transaction -a result of the transaction). The given article proposes an approach to modelling price fluctuation trajectories changing their statistical properties with time.
Traditional time series analysis uses assumptions about the stationarity of the corresponding distribution function (hereinafter referred to as FD). The corresponding methods are described in textbooks on mathematical statistics [11] and books on market analysis methods [12]. These methods include: regression trend selection in the sense of MNC; time series co-integration, which forms a stationary time series (Box-Jenkins, 1972); autoregressive models (Dickey-fuller, 1979).
Adaptive time series models are also considered: multiparametric models of short-term forecasting, in which part of the parameters at each next step in time changes depending on the mismatch of the forecast and the fact (brown, Holt, winters, 1990Holt, winters, -2000, as well as models of weighted moving averages. In the case of non-equidistant time series, we consider QMS models with stationary event flows of various types [13] or, alternatively, systems with double stochasticity [14]. Other stochastic models are also used (A. N. Shiryaev [15] et al., see, for example., [16]), in which the properties of stationary random processes are investigated.
As a result, the results of the analysis of stationary models in practice depend on the sample length and on the current time point. This imposes restrictions on the reliability of the results obtained when testing certain management strategies.
Generating an ensemble of trajectories of a non-equidistant non-stationary time series is thus a practically important task, the solution of which will allow modeling various control functions of the observed random process and optimizing them. This paper presents a software package that implements a time series model with embedding processes of different levels of stationarity.

Method for generating a non-equidistant time series
Generation of a non-equidistant non-stationary time series is based on the following assumptions about the structure of the event flow [5], [6]: -there is a certain period of time, called the period, within which the normalized per unit function of the flow intensity is set; -there is a relatively small part (the first 10-15 % of the period) of the time interval that allows us to estimate the predicted number of events for the period, so that in fact the time series model is built on the remaining part of the period after making the appropriate observations for the start of the process; -we consider a sequence of events with the same values that are most likely (for example, a sequence of absolute price increments of consecutive transactions excluding zero increments), called the "first series"; -the duration of the first series by the number of events is a non-stationary random process; -the sequence of values of other events is considered ("second series"); -the duration of the second series by the number of events is a stationary random process; -the distribution of trend movements over time intervals is a stationary random process, the actual trend is realized by skewing the probability of price increments to take positive or negative values. The assumptions made allow us to build a model of a time series that has properties close to those observed in practice. In particular, in modeling the price movement of individual transactions on the exchange the most likely increase is one point by absolute value. Let's describe the input data for this particular problem.
At the first stage of preparing data for modeling the trajectory of a time series, statistics are collected: -distribution function ( ) the expert selected the trend of price movements for the duration of time general movement of the price trajectory up or down; -probability ± positive and negative price growth on expert selected fragments of trend movements, + + − = 1; -parameter of non-stationary Poisson event flow Λ ( , ) at a time interval Δ ( ) = [ − ; ] inside the period (in relation to stock exchanges this is a single trading session); -distribution function >1 ( ) series of increments, the absolute value of which more than one conditional item, depending on the number events; -joint distribution density =1 ( , ′ ; , ) lengths and their increments ′ for a series of absolute price increments per conditional item on the sample length events in a moment of time . The statistics collected determine the probability ( − , ) number events over time Δ ( ) formula The value entered here ( − , ) called the intensity of the flow in the interval Δ ( ). This is the average number of events over the specified period. It is defined by the formula We believe that events are independent, and the flow is ordinary. We assume the time of aggregation of events to be equal = 1 minutes.
Then set the expected number events on the time horizon time series simulation. It is necessary in order to perform the normalization of the intensity profile Λ ( , ) that's the number of events.
At the next stage from the distribution ( ) the random series of numbers is generated in units of measurement of the time taken in the flow parameter, and ∑ = . ( Condition (3) determines the total number macro-movements up and down and their duration, at each interval the probability + of price movements in a single event up is set and thus the probability of − = 1 − + price movements down is determined.
Random whole numbers are then generated out of distribution (1), that give a number of events during the minute 1 at intervals Δ 1 ( ), where is the current minute number. There is a number of events for this generation (i.e. deals) Next, a sample of numbers ±1 is generated with total length from the piecewise-stationary distribution of probabilities ± according to the random number of macro movements out (3). This sample identifies a price increment sign in a single event. From the density of the distribution function =1 ( , ′ ; , ) by method [3] there are features which are involved in the construction of the Liouville equation to simulate the evolution of distribution =1 ( ; , ) from a time interval Δ 1 ( ) in the interval Δ 1 ( + 1): Thus, from formula (6), non-stationary distributions of the lengths of series of increments per conditional unit are known. Functions =1 ( , ′ ; , ) are calculated in the sliding window, so that their appearance is also affected by the flow parameters selected in the previous stages of the simulation and the lengths of up and down trend intervals.
After the functions =1 ( ; , ) are calculated, samples of lengths are constructed from them as from analogs of general aggregates 1, , 2, , … in such an amount that their sum is equal to the predicted number of transactions from (4): The length of a series of increments per conditional unit is interrupted by a series of increments of large values. Series of the second type, as already mentioned, have a stationary distribution >1 ( ) in length. A random set of integers is generated from this distribution 1, , 2, , … , equal to the series lengths of the specified second type. Further, the lengths of the series , and , alternate until their total length is equal . Then a similar construction begins in the next time interval Δ 1 ( + 1).
The increment signs in all these transactions are determined by a sequence of random signs ±1, which was generated in the previous stages of the simulation.
The generation of a time series with a stationary distribution function is based on the usual algorithm, which is based on the following statement (see, for example, [3]). Let be a random variable with continuous FD ( ).
the elements of the series { } can be calculated. Appeal FD into (8) is possible because of its strict monotony. If FD the series is not stationary, we used a model of the evolution of sample density distribution function (next SDDF), so on a given forecast horizon by length selection forecast data SDDF ( , + ), = 1, 2, … , are constructed. After that a stationary evenly distributed on [0; 1] series of numbers { } length , equal to the forecast horizon. Selective ones FD ( , + ), = 1, 2, … , are also being built according to the model SDDF ( , + ). Let 0 be an initial point in time at which the forecast begins to be built. Then, in subsequent moments of time one of the possible trajectories of a random process for which SDDF changes from ( , 0 ) to ( , 0 + ), is modeled using the formula for the reversal of the corresponding time-local distribution function moving in a sliding window : Thus, a model of the event trajectory for a single trading session is built.

Algorithm for modeling unsteady flow of events
In practice, the flow parameter Λ ( , ) is built directly on observations. Let be a number of events in minute of the day. Then the number of events in a day is and the daily rationed intensity profile is determined by the formula = .
After that, the average number of events (i.e., the actual flow parameter) in the interval Δ ( ) can be entered by building a weighted average daily activity profile for a certain period of time T days'. To do this, enter the average intensity ( ) in j minute and average number of ticks for day. Then the weighted average normalized activity profile is determined: Thus, let the average number of events per day be defined and equal to . Than average number of ticks for interval (minutes) until the time (minutes) is Because the profile ( ) is rationed per unit, it can be considered as a probability of intensity by minutes in a day. Its distribution function is there that's why It defines the event flow model for any moments in time and intervals .

The structure of the software complex
This section contains information about the software package for modeling and calculating statistics for non-stationary non-equidistant time series [16].
1. General information. is based on a method based on the decomposition of the considered time series into stationary and non-stationary components. First, the distribution function of the studied random variable is constructed by its value, and the highest probability is found. Next, we consider sequences of events of two types: those consisting only of the values that have the highest probability, and all the others. For each of the two types of sequences, distributions of these sequences by length are constructed, and then the resulting distributions are tested for stationarity. If one of the components is stationary with the accepted accuracy, then we believe that the filtering has been performed and the model is adequate. The program diagram is shown in Figure 1. (c) Calculation of statistics for a non-stationary marked time series is carried out in 2 stages. On the first one, a matrix of statistics values is built, where the tick number is located on one line, and the number of points for calculation is located on the column. In the next step, assuming that the event flow is ordinary and a number of event moments are described by the Poisson distribution, the resulting tick density mask is used to move from this matrix to the results in terms of moments and time intervals. 4. Input data.
(a) The input data is: -a time series presented in the format of a set of records with values of a random variable; -the time points at which these values were recorded.

Example of a computational experiment
The time series of tick increments of the RTS index is considered. A fragment of the original series is shown in Figure 2.
Distribution of absolute price increases during the trading session is given on Figure 3. One point on the chart corresponds to an increase of 10 points in the RTS index.
The non-stationary index (see [3]) for this series is shown in Figure 4. A series is considered stationary if the index is less than or equal to one. Otherwise, the series is non-stationary at the election of the corresponding lengths.
From Figure 4 it follows that the distribution function of absolute increments of a number of distinct ticks of the RTS becomes stationary at the length of 7 thousand events and then remains stationary. The most noticeable unsteadiness is manifested at the length of 2 thousand ticks. On the one hand, it would be convenient to work with a stationary distribution. However, the sample distribution becomes stationary around the end of the daily trading session, whereas decisions must be made based on data for shorter periods of time, when the distribution is significantly non-stationary.  The most likely is an increase in the price of one conditional point in absolute value, the probability of this event is 0.84. As a result of filtering, the initial tick series of absolute increments is represented as an alternation of two rows -increments by 1 point and other increments. Elements of each of the ranks are integers in the duration of episodes of each type. For series from the duration of the series, the non-stationary indices are considered ( Figure 5).
From the graphs on Figure 5 it can be seen that the non-stationary index of the first row is greater than one in samples up to 10 thousand, while the second row is approximately stationary in almost all samples. This means that the nonstationarity is inherent in the sequence of increments by 1 absolute point, because this series is nonstationally interrupted by the second series, the duration of which is a stationary random process.
Note that a trading day contains an average of 250 thousand ticks, of which about a quarter (i.e. only 60 thousand) are ticks with non-zero increments. Then an average of 4 thousand events of the second type occur per day. As can be seen from Figure 4, at such lengths, the first row does not yet become stationary, but for the analysis of intra-day changes in the distribution function of the first type, this is no longer relevant, because the day has ended. Therefore, it is interesting to model the time series of durations of the first type of series on samples of smaller lengths, for example, on samples of lengths of 1-2 thousand ticks. The quasi-stationary distribution over the duration of series of the second type is shown in Figure 6.
The one unit length of the series of the second type is most likely, the remaining lengths fit into an exponential relationship with determination 0,995: The average length of the series of the second type, as well as the standard deviation, is 1.  The module allows you to test a trading algorithm on an ensemble of nonstationary trajectories and more accurately optimize the parameters of this algorithm compared to testing on a stationary trajectory of a large sample.