Simulation of non-stationary event flow with a nested stationary component
- Authors: Pleshakov R.V.1
-
Affiliations:
- Keldysh Institute of Applied Mathematics
- Issue: Vol 28, No 1 (2020)
- Pages: 35-48
- Section: Modeling and Simulation
- URL: https://journals.rudn.ru/miph/article/view/23695
- DOI: https://doi.org/10.22363/2658-4670-2020-28-1-35-48
Cite item
Full Text
Abstract
A method for constructing an ensemble of time series trajectories with a nonstationary flow of events and a non-stationary empirical distribution of the values of the observed random variable is described. We consider a special model that is similar in properties to some real processes, such as changes in the price of a financial instrument on the exchange. It is assumed that a random process is represented as an attachment of two processes - stationary and non-stationary. That is, the length of a series of elements in the sequence of the most likely event (the most likely price change in the sequence of transactions) forms a non-stationary time series, and the length of a series of other events is a stationary random process. It is considered that the flow of events is non-stationary Poisson process. A software package that solves the problem of modeling an ensemble of trajectories of an observed random variable is described. Both the values of a random variable and the time of occurrence of the event are modeled. An example of practical application of the model is given.
Full Text
Introduction In [1]-[4], a model is presented for predicting the sample distribution function of a non-stationary time series over a certain horizon determined by the level of non-stationary series. The non-stationarity level is a special statistic that is collected from end-to-end samples of a given length, in the form of a distribution of distances between sample distributions in the C norm. The result of these works was the creation of a software package that generates an ensemble of time series trajectories, the distribution of which evolves in accordance with a kinetic equation that preserves the normalization and meets the observed properties of the series: preserving the trend or changing it to the opposite. The time in these works was considered to be the sequence number of the event, i.e. the observation of a random process was carried out at constant intervals. In practice, there are often situations where time intervals themselves are a random process. This is the specifics of Queuing systems, a special case of which is the dynamics of exchange transactions [5]-[7]. The purchase price of a financial instrument and the time interval between two consecutive acts of sale are two dimensions that characterize this time series. Similar properties are found in the series of durations of telephone or Internet connections, sequences of earthquake magnitudes, polluting emissions in megacities, and other events, the moments of occurrence of which, as well as their values, are random. Stock market forecasting using time series analysis has been considered by a vast number of research papers. Among the works most related to the topic we should note [8], which analyses point process models that account for the market noise, and various applications of theoretical models [9], [10] towards describing the price movements of financial instruments. Notably, neither of the existing models considers a possibility of nonparametric simulation for the ensemble trajectory analysis of two-dimensional time series (a moment of the transaction - a result of the transaction). The given article proposes an approach to modelling price fluctuation trajectories changing their statistical properties with time. Traditional time series analysis uses assumptions about the stationarity of the corresponding distribution function (hereinafter referred to as FD). The corresponding methods are described in textbooks on mathematical statistics [11] and books on market analysis methods [12]. These methods include: regression trend selection in the sense of MNC; time series co-integration, which forms a stationary time series (Box-Jenkins, 1972); autoregressive models (Dickey-fuller, 1979). Adaptive time series models are also considered: multiparametric models of short-term forecasting, in which part of the parameters at each next step in time changes depending on the mismatch of the forecast and the fact (brown, Holt, winters, 1990-2000), as well as models of weighted moving averages. In the case of non-equidistant time series, we consider QMS models with stationary event flows of various types [13] or, alternatively, systems with double stochasticity [14]. Other stochastic models are also used (A. N. Shiryaev [15] et al., see, for example., [16]), in which the properties of stationary random processes are investigated. As a result, the results of the analysis of stationary models in practice depend on the sample length and on the current time point. This imposes restrictions on the reliability of the results obtained when testing certain management strategies. Generating an ensemble of trajectories of a non-equidistant non-stationary time series is thus a practically important task, the solution of which will allow modeling various control functions of the observed random process and optimizing them. This paper presents a software package that implements a time series model with embedding processes of different levels of stationarity. 1. Method for generating a non-equidistant time series Generation of a non-equidistant non-stationary time series is based on the following assumptions about the structure of the event flow [5], [6]: § there is a certain period of time, called the period, within which the normalized per unit function of the flow intensity is set; § there is a relatively small part (the first 10-15 % of the period) of the time interval that allows us to estimate the predicted number of events for the period, so that in fact the time series model is built on the remaining part of the period after making the appropriate observations for the start of the process; § we consider a sequence of events with the same values that are most likely (for example, a sequence of absolute price increments of consecutive transactions excluding zero increments), called the “first series”; § the duration of the first series by the number of events is a non-stationary random process; § the sequence of values of other events is considered (”second series”); § the duration of the second series by the number of events is a stationary random process; § the distribution of trend movements over time intervals is a stationary random process, the actual trend is realized by skewing the probability of price increments to take positive or negative values. The assumptions made allow us to build a model of a time series that has properties close to those observed in practice. In particular, in modeling the price movement of individual transactions on the exchange the most likely increase is one point by absolute value. Let’s describe the input data for this particular problem. At the first stage of preparing data for modeling the trajectory of a time series, statistics are collected: § distribution function () the expert selected the trend of price movements for the duration of time general movement of the price trajectory up or down; § probability ± positive and negative price growth on expert selected fragments of trend movements, + + - = 1; § parameter of non-stationary Poisson event flow Λ (, ) at a time interval Δ() = [ - ; ] inside the period (in relation to stock exchanges this is a single trading session); § distribution function >1() series of increments, the absolute value of which more than one conditional item, depending on the number events; § joint distribution density =1 (, ′; , ) lengths and their increments ′ for a series of absolute price increments per conditional item on the sample length events in a moment of time . The statistics collected determine the probability ( - , ) number events over time Δ() formula (Λ (, )) ( - , ) = exp (-Λ (, )) , Λ (, ) = ( - , ) . (1) ! The value entered here ( - , ) called the intensity of the flow in the interval Δ(). This is the average number of events over the specified period. It is defined by the formula ( - , ) = 1 ∞ ∑ ( - , ) . (2) =1 We believe that events are independent, and the flow is ordinary. We assume the time of aggregation of events to be equal = 1 minutes. Then set the expected number events on the time horizon time series simulation. It is necessary in order to perform the normalization of the intensity profile Λ (, ) that’s the number of events. At the next stage from the distribution () the random series of numbers is generated in units of measurement of the time taken in the flow parameter, and ∑ = . (3) Condition (3) determines the total number macro-movements up and down and their duration, at each interval the probability + of price movements in a single event up is set and thus the probability of - = 1 - + price movements down is determined. Random whole numbers are then generated out of distribution (1), that give a number of events during the minute 1 at intervals Δ1(), where is the current minute number. There is a number of events for this generation (i.e. deals) = ∑ . (4) =1 Next, a sample of numbers ±1 is generated with total length from the piecewise-stationary distribution of probabilities ± according to the random number of macro movements out (3). This sample identifies a price increment sign in a single event. From the density of the distribution function =1 (, ′; , ) by method [3] there are features =1 (; , ) = ∑ =1 (, ′; , ) , ′ (; , ) =1 (; , ) = ∑ ′=1 (, ′; , ) , ′ (5) which are involved in the construction of the Liouville equation to simulate the evolution of distribution =1 (; , ) from a time interval Δ1() in the interval Δ1( + 1): =1 (; , + 1) = =1 (; , ) + + =1 ( - 1; , ) ( - 1; , ) - =1 (; , ) (; , ) . (6) Thus, from formula (6), non-stationary distributions of the lengths of series of increments per conditional unit are known. Functions =1 (, ′; , ) are calculated in the sliding window, so that their appearance is also affected by the flow parameters selected in the previous stages of the simulation and the lengths of up and down trend intervals. After the functions =1 (; , ) are calculated, samples of lengths are constructed from them as from analogs of general aggregates 1,, 2,, … in such an amount that their sum is equal to the predicted number of transactions from (4): ∑ , = . (7) The length of a series of increments per conditional unit is interrupted by a series of increments of large values. Series of the second type, as already mentioned, have a stationary distribution >1() in length. A random set of integers is generated from this distribution 1,, 2,, … , equal to the series lengths of the specified second type. Further, the lengths of the series , and , alternate until their total length is equal . Then a similar construction begins in the next time interval Δ1( + 1). The increment signs in all these transactions are determined by a sequence of random signs ±1, which was generated in the previous stages of the simulation. The generation of a time series with a stationary distribution function is based on the usual algorithm, which is based on the following statement (see, for example, [3]). Let be a random variable with continuous FD (). Then a random variable = () has a uniform distribution on [0;1]. Then at the first step we generate an arbitrary sequence of numbers {}, evenly distributed on [0; 1], then according to the formula = () , = -1 () (8) the elements of the series {} can be calculated. Appeal FD into (8) is possible because of its strict monotony. If FD the series is not stationary, we used a model of the evolution of sample density distribution function (next SDDF), so on a given forecast horizon by length selection forecast data SDDF (, + ), = 1, 2, … , are constructed. After that a stationary evenly distributed on [0; 1] series of numbers {} length , equal to the forecast horizon. Selective ones FD (, + ), = 1, 2, … , are also being built according to the model SDDF (, + ). Let 0 be an initial point in time at which the forecast begins to be built. Then, in subsequent moments of time one of the possible trajectories of a random process for which SDDF changes from (, 0) to (, 0 + ), is modeled using the formula for the reversal of the corresponding time-local distribution function moving in a sliding window : = (, 0 + ) . (9) Thus, a model of the event trajectory for a single trading session is built. 2. Algorithm for modeling unsteady flow of events In practice, the flow parameter Λ (, ) is built directly on observations. Let be a number of events in minute of the day. Then the number of events in a day is 1440 = ∑ , (10) =1 and the daily rationed intensity profile is determined by the formula = . (11) After that, the average number of events (i.e., the actual flow parameter) in the interval Δ() can be entered by building a weighted average daily activity profile for a certain period of time T days’. To do this, enter the average intensity () in j minute and average number of ticks for day. Then the weighted average normalized activity profile is determined: () = () . (12) Thus, let the average number of events per day be defined and equal to . Than average number of ticks for interval (minutes) until the time (minutes) is Λ() (, ) = ∑ ( - + 1) = ∑ ( - + 1). (13) =1 =1 Because the profile () is rationed per unit, it can be considered as a probability of intensity by minutes in a day. Its distribution function is there that’s why () = ∑ (), (14) =1 -1 Λ() (, ) = ⋅ (() - ( - )) = ∑ Λ() ( - , 1) . (15) =0 It defines the event flow model for any moments in time and intervals . 3. The structure of the software complex This section contains information about the software package for modeling and calculating statistics for non-stationary non-equidistant time series [16]. 1. General information. 1. Program Name - “Module for modeling and calculating statistics for non-stationary non-equidistant time series” NSTS. 2. The complex requires.NET 4.0. 3. The system is designed using programming languages - C#/C++. 2. Functional purpose of the complex. 1. The complex is designed to build a set of non-stationary time series that have properties characteristic of a given series or set of series. The possibility of calculating standard sample statistics for non-stationary time series with a random distribution of time intervals between consecutive events is implemented. 2. Module NSTS implements following main functions: § generating a specified number of non-stationary time series with a non-stationary event flow; § calculation of sample statistics for a set of time series; § calculation of sample statistics for functional values defined along the trajectory of such two-dimensional time series. 3. There are no functional restrictions on the described operations. 3. Description of the logical structure. 1. The complex consists of the following main components: § check the validity of input data; § calculation of the tick density mask depending on time; § the calculation of the density distribution of tick increments; § calculation of the density of the distribution of consecutive tick durations with the value of the most likely increase; § statistical functions block; § block of possible functions of financial mathematics for building statistics; § building a stationary series for the most likely absolute increment; § construction of non-stationary series based on tick increment distribution masks; § combining stationary and non-stationary components. 2. The operation of the non-equidistant time series generation module is based on a method based on the decomposition of the considered time series into stationary and non-stationary components. First, the distribution function of the studied random variable is constructed by its value, and the highest probability is found. Next, we consider sequences of events of two types: those consisting only of the values that have the highest probability, and all the others. For each of the two types of sequences, distributions of these sequences by length are constructed, and then the resulting distributions are tested for stationarity. If one of the components is stationary with the accepted accuracy, then we believe that the filtering has been performed and the model is adequate. The program diagram is shown in Figure 1. Figure 1. Diagram of the time series generation module 3. Calculation of statistics for a non-stationary marked time series is carried out in 2 stages. On the first one, a matrix of statistics values is built, where the tick number is located on one line, and the number of points for calculation is located on the column. In the next step, assuming that the event flow is ordinary and a number of event moments are described by the Poisson distribution, the resulting tick density mask is used to move from this matrix to the results in terms of moments and time intervals. 4. Input data. 1. The input data is: § a time series presented in the format of a set of records with values of a random variable; § the time points at which these values were recorded. 2. The settings block for generating a series bundle includes the following options: § target number of rows; § time interval for generation; § the density distribution of tick increments; § the level of trend slope for the target series. 3. Output data. 1. Output in the generation module is a set of time series. 2. The output in the statistical calculation unit is a series of calculated values and moments of time in which these values were obtained. 3. Also in the output are distributions of statistics calculated by the full file as functions from the length of the sample, such as: o volatility; o autocorrelation; o the Hurst exponent. 4. Example of a computational experiment The time series of tick increments of the RTS index is considered. A fragment of the original series is shown in Figure 2. Distribution of absolute price increases during the trading session is given on Figure 3. One point on the chart corresponds to an increase of 10 points in the RTS index. The non-stationary index (see [3]) for this series is shown in Figure 4. A series is considered stationary if the index is less than or equal to one. Otherwise, the series is non-stationary at the election of the corresponding lengths. From Figure 4 it follows that the distribution function of absolute increments of a number of distinct ticks of the RTS becomes stationary at the length of 7 thousand events and then remains stationary. The most noticeable unsteadiness is manifested at the length of 2 thousand ticks. On the one hand, it would be convenient to work with a stationary distribution. However, the sample distribution becomes stationary around the end of the daily trading session, whereas decisions must be made based on data for shorter periods of time, when the distribution is significantly non-stationary. Figure 2. Fragment of the RTS index series Figure 3. SDDF number of absolute increments in conditional points The most likely is an increase in the price of one conditional point in absolute value, the probability of this event is 0.84. As a result of filtering, the initial tick series of absolute increments is represented as an alternation of two rows - increments by 1 point and other increments. Elements of each of the ranks are integers in the duration of episodes of each type. For series from the duration of the series, the non-stationary indices are considered (Figure 5). From the graphs on Figure 5 it can be seen that the non-stationary index of the first row is greater than one in samples up to 10 thousand, while the second row is approximately stationary in almost all samples. This means that the nonstationarity is inherent in the sequence of increments by 1 absolute point, because this series is nonstationally interrupted by the second series, the duration of which is a stationary random process. Note that a trading day contains an average of 250 thousand ticks, of which about a quarter (i.e. only 60 thousand) are ticks with non-zero increments. Then an average of 4 thousand events of the second type occur per day. Figure 4. The index of nonstationarity of the absolute increases a number of distinctive tics of RTS Figure 5. Index of non-stationary series series I and II types As can be seen from Figure 4, at such lengths, the first row does not yet become stationary, but for the analysis of intra-day changes in the distribution function of the first type, this is no longer relevant, because the day has ended. Therefore, it is interesting to model the time series of durations of the first type of series on samples of smaller lengths, for example, on samples of lengths of 1-2 thousand ticks. The quasi-stationary distribution over the duration of series of the second type is shown in Figure 6. The one unit length of the series of the second type is most likely, the remaining lengths fit into an exponential relationship with determination 0,995: 2() = 0, 76-0,85, ⩾ 2. The average length of the series of the second type, as well as the standard deviation, is 1. Figure 6. Distribution of series of the second type by duration The Figure 7 is an example of how the time series generation module works. The bold line indicates the source row, and the remaining curves are the results of the program. Figure 7. Example of working process of the time series generation module The module allows you to test a trading algorithm on an ensemble of nonstationary trajectories and more accurately optimize the parameters of this algorithm compared to testing on a stationary trajectory of a large sample. Conclusion The described software package allows you to model a non-stationary Queuing system when the event flow and the value of a random variable itself are non-stationary processes. In addition to the exchange series, the objects of modeling can be the actual CMO, when the flow of phone calls or requests to visit the site and download certain information has non-stationary characteristics. For such systems the built complex allows to optimize the functional of the control. This functional can be the algorithm of the trading system on the exchange, blocking certain requests to the site, etc. In addition, the complex allows you to collect complex nonlinear statistics on an ensemble of trajectories that can not be obtained in practice for a single implementation of a non-stationary time series.
About the authors
Ruslan V. Pleshakov
Keldysh Institute of Applied Mathematics
Author for correspondence.
Email: ruslanplkv@gmail.com
PhD student
4, Miusskaya Sq., Moscow, 125047, Russian FederationReferences
- A. D. Bosov and Y. N. Orlov, “Kinetic and hydrodynamic approach to the non-stationary time series forecasting on the base of Fokker-Planck equation [Kinetiko-gidrodinamicheskiy podkhod k prognozirovaniyu nestatsionarnykh vremennykh ryadov na osnove uravneniya FokkeraPlanka],” in Proceedings of MIPT [Trudy MFTI], 4. 2012, vol. 4, pp. 134- 140, in Russian.
- Y. N. Orlov and S. L. Fedorov, “Modeling and statistical analysis of functionals set on samples from a non-stationary time series,” in Preprints of IPM im. M. V. Keldysh, 43. 2014, in Russian.
- Y. N. Orlov, Kinetic methods for studying non-stationary time seriesy [Kineticheskiye metody issledovaniya nestatsionarnykh vremennykh ryadov]. Moscow: MIPT, 2014, in Russian.
- Y. N. Orlov and S. L. Fedorov, Methods of numerical modeling of nonstationary random walk processes [Metody chislennogo modelirovaniya protsessov nestatsionarnogo sluchaynogo bluzhdaniya]. Moscow: MIPT, 2016, in Russian.
- Y. N. Orlov and K. P. Osminin, “Sample distribution function construction for non-stationary time-series forecasting [Postroyeniye vyborochnoy funktsii raspredeleniya dlya prognozirovaniya nestatsionarnogo vremennogo ryada],” Mathematical modeling, no. 9, pp. 23-33, 2008, in Russian.
- D. S. Kirillov, O. V. Korob, N. A. Mitin, Y. N. Orlov, and R. V. Pleshakov, “On the stationary distributions of the Hurst indicator for the non-stationary marked time series [Raspredeleniya pokazatelya Hurst nestatsionarnogo markirovannogo vremennogo ryada],” in Preprints of IPM im. M. V. Keldysh, 11. 2013, in Russian.
- M. H. Numan Elsheikh, D. O. Ogun, Y. N. Orlov, R. V. Pleshakov, and V. Z. Sakbaev, “Averaging of random semigroups and quantization [Usredneniye sluchaynykh polugrupp i neodnoznachnost’ kvantovaniya gamil’tonovykh sistem],” in Preprints of IPM im. M. V. Keldysh, 19. 2014, in Russian.
- E. Bacry, S. Delattre, M. Hoffmann, and J. F. Muzy, “Modeling microstructure noise with mutually exciting point processes,” Quantitative Finance, vol. 13, no. 1, pp. 65-77, 2013.
- G. Bhardwaj and N. R. Swanson, “An empirical investigation of the usefulness of ARFIMA models for predicting macroeconomic and financial time series,” Journal of Econometrics, vol. 131, pp. 539-578, 2006. doi: 10.1016/j.jeconom.2005.01.016.
- P. Embrechts, T. Liniger, and L. Lin, “Multivariate Hawkes Processes: an Application to Financial Data,” Journal of Applied Probability, vol. 48A, pp. 367-378, 2011. doi: 10.1239/jap/1318940477.
- N. S. Kremer and B. A. Putko, Econometrica [Ekonometrika]. Moscow: UNITY-DANA, 2005, in Russian.
- D. E. Bestens, V. M. van der Berth, and D. Wood, Neural networks and financial markets: decision-making in trading operations [Neyronnyye seti i finansovyye rynki: prinyatiye resheniy v torgovykh operatsiyakh]. Moscow: TVP, 1998, in Russian.
- A. Zeifman, A. Korotysheva, K. Kiseleva, V. Korolev, and S. Shorgin, “On the bounds of the rate of convergence and stability for some queueing models [Ob otsenkakh skorosti skhodimosti i ustoychivosti dlya nekotorykh modeley massovogo obsluzhivaniya],” Informatics and Applications, vol. 8, no. 3, pp. 19-27, 2014, in Russian. DOI: 10.14357/ 19922264140303.
- V. I. Khimenko, “Scatterplots in Analysis of Random Streams of Events [Diagrammy rasseyaniya v analize sluchaynykh potokov sobytiy],” Informatsionno-upravlyayushchiye sistemy, no. 4, pp. 85-93, 2016, in Russian. doi: 10.15217/issn1684-8853.2016.4.85.
- M. V. Zhitlukhin, A. A. Muravlyov, and A. N. Shiryaev, “On confidence intervals for Brownian motion changepoint times,” Russian Mathematical Surveys, vol. 71, no. 1, pp. 159-160, 2016. doi: 10.1070/RM9702.
- R. V. Pleshakov, “NSTS Software package for modeling non-stationary non-equidistant time series,” 2018.