Study of the impact of the COVID-19 pandemic on international air transportation

Time Series Forecasting has always been a very important area of research in many domains because many different types of data are stored as time series. Given the growing availability of data and computing power in the recent years, Deep Learning has become a fundamental part of the new generation of Time Series Forecasting models, obtaining excellent results. As different time series problems are studied in many different fields, a large number of new architectures have been developed in recent years. This has also been simplified by the growing availability of open source frameworks, which make the development of new custom network components easier and faster. In this paper three different Deep Learning Architecture for Time Series Forecasting are presented: Recurrent Neural Networks (RNNs), that are the most classical and used architecture for Time Series Forecasting problems; Long Short-Term Memory (LSTM), that are an evolution of RNNs developed in order to overcome the vanishing gradient problem; Gated Recurrent Unit (GRU), that are another evolution of RNNs, similar to LSTM. The article is devoted to modeling and forecasting the cost of international air transportation in a pandemic using deep learning methods. The author builds time series models of the American Airlines (AAL) stock prices for a selected period using LSTM, GRU, RNN recurrent neural networks models and compare the accuracy forecast results.

Time Series Forecasting has always been a very important area of research in many domains because many different types of data are stored as time series. Given the growing availability of data and computing power in the recent years, Deep Learning has become a fundamental part of the new generation of Time Series Forecasting models, obtaining excellent results.
As different time series problems are studied in many different fields, a large number of new architectures have been developed in recent years. This has also been simplified by the growing availability of open source frameworks, which make the development of new custom network components easier and faster.
In this paper three different Deep Learning Architecture for Time Series Forecasting are presented: Recurrent Neural Networks (RNNs), that are the most classical and used architecture for Time Series Forecasting problems; Long Short-Term Memory (LSTM), that are an evolution of RNNs developed in order to overcome the vanishing gradient problem; Gated Recurrent Unit (GRU), that are another evolution of RNNs, similar to LSTM.
The article is devoted to modeling and forecasting the cost of international air transportation in a pandemic using deep learning methods. The author builds time

Introduction
In 2020, there was a significant drop in quotations of American Airlines (AAL) associated with the COVID-19 pandemic and a record-breaking decrease in the number of air travel in the world. The generally accepted econometric methods of modeling and forecasting financial time series in these conditions turned out to be ineffective for making even short-term forecasts [1], [2]. In the present paper, methods for modeling and forecasting international air traffic in the 2019-2020 pandemic are explored using recurrent neural networks with different architectures. As an object of research, the day quotes of the American company AAL, traded on the NASDAQ exchange, were selected; data from September 27, 2005 to September 30, 2020 from the information portal Yahoo Finance [3] were taken. The shares of this US company were selected due to its leading positions in the international air transportation market, high values of the trading turnover on the NASDAQ exchange, which in turn provides liquidity and shows investor interest in this exchange commodity [4]. Using the example of the value of AAL shares, we will try to build a reliable forecast using deep learning methods, in particular, recurrent neural networks [5]- [7].

Pre-processing of input data
As input data for the neural network model, we will take a sequence consisting of the following values: -Opent-1 -opening price for the previous period; -Lowt-1 -the minimum price for the previous trading day; -Hightt-1 -the maximum price for the previous trading day; -Volumet-1 -the amount of shares sold and bought for the previous trading day; -Closet-1 -closing price for the previous trading day. Based on the input data, neural networks will generate an output value that can be interpreted as the predicted value of the closing quotation today. For the correct operation of neural networks, it is necessary to normalize the data within the limits of [0 ∶ 1], as well as create training and test samples in the ratio 80:20 from the initial data having the dimension 3636. Thus, 2909 observations for the training sample and 727 observations for the test sample were obtained. The table 1 shows a fragment of the input data.
It is necessary to remove the Date and Adj Close columns from the received data. The table 2 presents descriptive statistics of input data. It is seen that the average closing price is $27.13 and the standard deviation is $16.74.
To study the statistical properties of the data further, let us build scatter diagrams of the profitability of the opening price and the closing price, as well as the profitability of the closing price shifted by one lag, and the closing price today. To calculate the profitability, we will use the following formula [8]- [10]: where is the profitability; -1 is the previous observation value; are the values for the current time period.
The scatter diagram of the profitability of the opening and closing prices is shown in the figure 1.
The figure 1 shows that there is no correlation between the variables under consideration. Next, we will construct a histogram of the distribution of the profitability of closing prices (figure 2). Obviously, most of the observations are in the range from −0.1 to 0.1. This means that in most observations, the price changed from −10% to 10% in one period. To test the hypothesis about whether the distribution of the closing price profitability is a special case of the normal distribution, we use the Shapiro-Wilk and Jarque-Bera tests. The Jarque-Bera test rejected the null hypothesis at a significance level of = 0.05. The results of the Shapiro-Wilk test and the Jarque-Bera test coincided. This means that the profitability of closing prices has a distribution that is different from the normal one. To check the stationarity of the profitability series, we will use the Dickey-Fuller test, which is one of the unit root tests. A time series has a unit root if its first differences form a stationary series, i.e. a series whose properties do not change over time. This condition is written as ∼ (1) if the series of the first differences Δ = − −1 is a stationary series Δ ∼ (0) [11]. If the time series has a unit root, then it is not a stationary time series, but an integrated first-order time series [12]- [14]. As one would expect, the observed time series has no unit roots and, therefore, is stationary. For the convenience of using the input data, we will normalize them. The results are presented in the table 3. Next, we turn to the description of the main models of recurrent neural networks and their application in the analysis of financial time series.

Basic recurrent neural network
The architecture of the proposed basic recurrent neural net (RNN) is as follows. A matrix with a dimension of 1 by 5 is fed to the input of the neural network, then the values are transferred to a recurrent layer with 25 neurons, after which the operation is repeated and the values are again fed to the recurrent layer with 25 neurons. At the penultimate step, the values are transferred to an aggregating layer with a dimension of 5 neurons, the result is displayed as a predicted value. Hidden layers have a hyperbolic tangent as an activation function. This activation function is nonlinear, which allows layers to be linked, i.e. combines them, because the combination of non-linear functions is also a non-linear function. Another advantage of the hyperbolic tangent function is that it is a smooth function, and this function is not binary and takes values in the range (-1, 1), which eliminates overloading from large values. The hyperbolic tangent is very similar to the sigmoid with the difference that it has a larger gradient than the sigmoid. On the aggregate layer, a linear function is used as the activation function. The proposed neural network model, all procedures for its training and testing were implemented in the Keras library of the Python programming language [15].
The mean squared error (MSE) will be used as the loss function, and the optimization is performed using the Adam algorithm. The epoch parameter of the fit function reflects how many times the sample is passed through the neural network, in this case epoch = 150. The batch_size parameter is responsible for the size of the so-called batch. In cases where the training sample is too large, there is a need to divide it into parts. These parts are called batches. Thus, the training set with 2109 observations is divided into 210 batches with a size of 10, except for the last one with 9 observations. Thus, 210 iterations were required to pass one epoch.
Due to the tendency of recurrent neural networks to overfit, it is necessary to apply various regularization algorithms [10], [16]. As such an algorithm, the early stop method is used, which tracks the amount of losses. If during 20 epochs the improvement is less than 0.000002, then the training of the model will be stopped. The graph of the loss function on the training sample is shown in the figure 3. After checking and training the neural network, we will construct a forecast of closing prices for the test sample. For a better visual appearance, the predicted values are shifted ten units up. Let us display the forecast of the last 50 observations of the test sample for a more accurate visual examination (figure 4). It can be seen from the figure that the neural network predicts closing prices closely enough.

Neural network with a gated recurrent unit
A recurrent neural network based on a cell architecture with a gated recurrent unit (GRU) repeats the structure of the RNN model of a recurrent network. The input layer takes the values of a matrix with a dimension of 1 by 5. Then, recurrent layers with 25 neurons and a hyperbolic tangent as an activation function are sequentially accepted and processed. The aggregating layer has 5 neurons with a linear activation function. After processing by the last layer, the predicted value is supplied. It should be noted that the default activation function for layers with the GRU architecture is the hyperbolic tangent [16], [17]. The loss plot for the GRU recurrent neural network is shown in the figure 5.

Neuron network with long short-term memory (LSTM)
Just like the previous networks, constructively recurrent neural network with long short-term memory (LSTM) will repeat the previous values. An input that accepts a 1-by-5 matrix transmits information to two recurrent layers with 25 neurons per layer and a hyperbolic tangent as an activation function. Then an aggregating layer of five neurons with a linear activation function passes the value to the output layer.
The closing price prediction plot calculated using a recurrent neural network with the LSTM architecture is shown in the figure 8. Forecasted values are shifted ten points. Based on the plot, we can conclude that the neural network under consideration predicts the required values quite accurately.
The forecast of the closing price for the entire LSTM test sample and for the last 50 values is shown in figures 9 and 10 respectively. For this recurrent neural network, MSE = 0.8508, 2 = 0.99.
Let us display a comparative plot of losses during training of various constructions and architectures of the considered neural networks ( figure 11). Note that the RNN recurrent neural network demonstrated the highest loss rates on the training set. Except for separately taken random epochs, its loss value was greater than that of the rest. LSTM and GRU recurrent neural networks have close values of losses on the training set. It is worth noting that the early stopping algorithm worked for all types of recurrent neural networks. For the RNN model, the algorithm stopped training at 71 epochs, for GRU -at 72. The least number of epochs -62 -was required to train the neural network built using the LSTM architecture.
The table 4 shows the values of the mean square error and the coefficient 2 for all constructed neural networks.

Discussion of results of computer experiments
In the process of investigating the impact of the COVID-19 pandemic on AAL stock quotes, recurrent neural network models were built with various architectures, such as cells with long short-term memory LSTMs, cells with gated recurrent unit GRU, and a basic recurrent network. The analysis of the constructed models was carried out, as well as the comparison of the results on the training and test data. During the analysis, it was found that the neural network with long short-term memory cells (LSTM) coped best with the task of predicting the data under study.
Summing up, we can say that all networks have shown a satisfactory result, but they predict the price with a certain delay, which may entail unplanned financial losses. In view of this, it can be concluded that these models are not suitable for carrying out short-term operations in the financial market, are not able to serve as an indicator that helps to improve the efficiency of a trading strategy and cannot be used for risk management tasks.

Conclusion
The purpose of the article was to investigate the quality of various neural network models that predict the closing price of a stock. In the course of the study, sufficiently accurate results of modeling and forecasting financial time series for the intraday closing prices of shares of the American airline ALL were obtained, which confirmed the effectiveness of using the proposed models of deep neural networks. However, in the context of the practical application of the developed models, it is necessary to take into account time delays in obtaining forecast results, as well as the horizon of financial forecasting. Прогнозирование временных рядов играет важную роль во многих областях исследований. Вследствие растущей доступности данных и вычислительных мощностей в последние годы глубокое обучение стало фундаментальной частью нового поколения моделей прогнозирования временных рядов, получающих отличные результаты.