Time Series Analysis
Time Series Analysis
Abstract—Because of the rotational components on quantum algorithm in an unordered list [4], and the quantum principal
arXiv:2205.00986v2 [quant-ph] 22 Jun 2022
circuits, some quantum neural networks based on variational component analysis (PCA) [5].
circuits can be considered equivalent to the classical Fourier
networks. In addition, they can be used to predict the Fourier
coefficients of continuous functions. Time series data indicates A. Classical neural networks
a state of a variable in time. Since some time series data
can be also considered as continuous functions, we can expect The representation theorem indicates any multivariate con-
quantum machine learning models to do many data analysis tasks tinuous function can be represented as a superposition of
successfully on time series data. Therefore, it is important to functions of one variable and addition [6, 7]. An artificial
investigate new quantum logics for temporal data processing and
analyze intrinsic relationships of data on quantum computers.
neural network can be considered as a composition of simple
In this paper, we go through the quantum analogues of nonlinear activation function[8]: Let I n be the n-dimensional
classical data preprocessing and forecasting with ARIMA models unit-cube [0, 1]n and C(I n ) be the space of continuous func-
by using simple quantum operators requiring a few number of tions on I n . In addition, let σ be a sigmoidal function. Given
quantum gates. Then we discuss future directions and some of any f ∈ C(I n ) and > 0, there is a G of the following form:
the tools/algorithms that can be used for temporal data analysis
on quantum computers. n
X
Index Terms—quantum machine learning, quantum algo- G(x) = αj σ(wjT x + θj ) (1)
rithms, quantum optimization, quantum time series, quantum j=1
temporal data,
for which |G(x) − f (x)| < .
Gallant et al. [9] showed a single feed-forward neural
I. I NTRODUCTION network where the output is squashed by using a cosine
It is shown that nature itself is explained better in quantum activation function: i.e. by considering σ as cosine, sine, or
T
physics than classical one. The idea of quantum computers exponential (i.e. in the form eiwj x+θj ) functions, one can
was put forward by Feynman [1] and other pioneers [2] so approximate the Fourier transform of a continuous function.
as to simulate and understand intrinsic physical properties These networks are often called Fourier neural networks
of nature in a better way: Simulating quantum systems with (FNN). On the other hand, Fourier analysis of data also
many particles is difficult on classical computers because sheds lights on the generalization performances of deep neural
the computational complexity in general grows exponentially networks[10].
with the number of involved particles. On the other hand,
quantum computers are based on qubits implemented by B. Time series problems
quantum particles which in turn indicates that in principle, we
can increase the computational power of quantum computers Temporal data indicates a state of a variable in time,
exponentially by adding controllable new qubits. Therefore, which could be collected through a wide variety of monitor-
the difficulty of simulating nature for classical computers ing devices. Examples include financial market data; sensor
would become the main advantage of quantum computers. data which may indicate temperature, humidity and so on;
Unfortunately since the input/output of quantum computers and medical data collected by different monitoring medical
are still in classical ones and zeros, it necessitates a state devices. Depending on the collection method, the data can
preparation that involves classical processing of the data be continuous or discrete. Continuous temporal data sets are
and a measurement that requires many repetitions of the generally referred to as times series data. [11] The data mining
same computation. Therefore, the complexity boundaries in tasks on time series data include clustering, classification, and
classical computation mostly remain the same for quantum forecasting: In general it is required to specialize the known
computing. Nonetheless, it is still possible to design more algorithms in data analysis in order to use in the analysis of
efficient algorithms by employing quantum entanglement and time series data: e.g. see the classification with deep learning
superposition (quantum parallelism): Some known examples for time series data [12] and the distance based time series
are Shor’s integer factoring algorithm [3], Grover’s search classification [13]. It is shown that time series data can be also
modeled by using Fourier deep neural networks [14], which
adaskin25@gmail.com is beriefly explained above.
2
Fig. 1. General picture for processing temporal data on quantum computers (the measurement step is not shown.)
intervals of size k: [t1 , tk ], [tk+1 , t2k ]. Then, the data points In the wavelet transform, y can be represented with the
are assigned into the corresponding bins and the mean values basis vectors that form Hn :
of the bins are used for the analyses: q
X
Pk y0 = ai wi , (17)
0 yi.k+r
yi+1 = r=1 (13) i=1
k
where wi is the ith basis vector and ai is the normalized
Here, the number of the data points are reduced by a factor coefficient. The dimension reduction is done by disregarding
of k. the lowest ai s.
Utilizing the Hadamard gates on certain qubits, one can In quantum computing, since the Haar matrix is orthogonal
obtain the average of the neighboring states on certain quantum and can be built from the recurrence that is described by the
states. After finding the average, we can also disregard the rest Kronecker product, it can be implemented as a quantum gate in
of the state by collapsing quantum state onto the interested part a similar fashion to the gate. Therefore, given |yi,
Hadamard
of the system: One example could be as follows: y0 . The elimination of lowest parts of
one
0 can easily obtain
yi1 y can be done through either sampling or measuring a few
. qubits. This leads to a problem of finding the best qubits that
• Let |yi i = .. . If we multiply this quantum state
gives the maximum reduction with the minimum information
yid loss.
Ubin = I ⊗ log2 (d/k) ⊗H ⊗ log2 k , where H is the Hadamard 1) Quantum Fourier transform (QFT): The discrete Fourier
gate: ! transform (DFT) can be used to transform a data vector into
√ 1 1 a linear combination of sinusoidal series. The similarity of
H = 1/ 2 (14) two vectors can be measured by taking the Euclidean distance
1 −1
of the Fourier coefficients. In quantum computing, DFT-which
If we apply Ubin to the quantum state |yi i, then when has the complexity bound O(nlogn) for n-dimensional vector-
the first log2 k qubits in |0 . . . 0i state, we would have an can be implemented with O(logn) computational steps. It
0
equivalent yi+1 which could be considered reduced data provides the main speedup of many quantum algorithms such
points. as integer factoring. The quantum Fourier transform for the
The classical moving average smoothing uses overlapping data |yi can be obtained by applying the operation QF T . If
bins which can be easily done on quantum case by utilizing a we have two data vectors y1 and y2 of the same dimension
quantum operator of the form: H ⊗ log2 k ⊗ I ⊗ log2 (d/k) . n (we assume this for convenience and brevity in notations, it
can be easily generalized to any dimension.), we first construct
the following quantum state:
C. Data transformation !
In classical data analyses, often the temporal data is made y1
|yi = (18)
smaller or discreticezed through data transformation methods y2
such as discrete wavelet and Fourier transforms (DWT and Then we apply the quantum Fourier transform to each part of
DFT respectively). the vectors:
Similar to the Fourier transform, DWT represents a data ! !
(function) in terms of an orthonormal basis. One of the QF T QF T y1
|yi = (19)
simplest wavelet transform is the Haar transform which can QF T QF T y2
be described a orthogonal matrix: e.g., 4 × 4 Haar transform Here note that
is !
1 1 1 1 QF T
= I ⊗ QF T, (20)
1 1 1 −1 −1 QF T
H4 = √ √ (15)
2 2 − 2 √0 0
√ where I is a 2×2 identity matrix. We then apply the Hadamard
0 0 2 − 2 gate to the first qubit:
Similarly to Walsh-Hadamard transform, the unnormalized
! !
⊗log(n)
QF T y
1 1 QF T y1 + QF T y2
Haar transform matrix can be obtained by the following H ⊗I =√
QF T y2 2 QF T y1 − QF T y2
recurrence: " # (21)
HN ⊗ [1, 1]
H2N = . (16) We can also start with a superposition of the input state:
IN ⊗ [1, −1]
1
|yi = √ |y1 i − |y2 i . (22)
The transformation on an input vector y0 is described by y0 = 2
HN x. From this product, the rows of the Haar matrix allow us Then we apply QF T :
to draw different properties of the temporal data: For instance,
1
for H4 , the first row takes the average of the input vector, while QF T |yi = √ QF T |y1 i − QF T |y2 i . (23)
the second row measures the low frequencies. The rest of the 2
rows measures different parts of the input vector. This also generates a kind of the desired distance.
5
As mentioned before, we can also use a separate register Then we apply laddered Toffoli operations (similar to a bit-
for each data point at different time: e.g., the quantum state adder: See quantum adders in Ref.[23]) to the second part of
|y1 i|y2 i is for the data measured at time t=1 and t=2. In this the state to convert into the following:
case we apply QF T to both registers and then use the SWAP
test to create the distance measure between data points. y0
y
1
(QF T ⊗ QF T ) |y1 i |y2 i = QF T |y2 i QF T |y2 i (24) .
.
! .
In all cases, the Euclidean distance can be obtained by the 1 |yi y
√ = n−1 (28)
measurement results. However, if d number of yi are given as
2 −Uadder |yi −yn−1
classical input, the first approach requires O(log(dn)) number −y0
of qubits and the third approach requires O(dlog(n)) since .
.
a separate register is used for each yi . Since the second .
approach uses the superpositioned state, the number of qubits −yn−2
is O(log(n)) which is less than the first and third approaches.
On the other hand, the classical processing time is almost the Then, we apply a Hadamard gate again to the first qubit to
same for all approaches and is O(nd) since it requires linear obtain the following
scanning of the every data points. However, if |yi i are stored
y0 y0 − yn−1
on a random accessed quantum memory (qRAM), then all y y −y
1 1 0
approaches requires only O(d) number of queries to qRAM
. ..
.
to load the data into quantum registers. . .
1 y 1
yn−1 − yn−2
⊗n−1 n−1
H ⊗I √ = (29)
IV. T IME SERIES FORECASTING 2 −yn−1 2 y0 + yn−1
−y0 y1 + y0
If the statistical parameters of time series such as the mean .
and variance do not change with time, it is called stationary: .
..
. .
i.e. the probabilistic distribution of the parameters in any time −yn−2 yn−1 + yn−2
interval is the same as or very close to the time interval
found by shifting from this interval. And non-stationary if the Note that by applying Hadamard gates on different
qubits and
distributions change with time, which is mostly the case when some swap operations we can easily built y 00 or any high
we deal with real world data: e.g., if we look at the number of order differencing from the above state. For instance, one can
daily COVID cases, we see that the daily cases changes based obtain some seasonal differencing as follows: For a natural
on the season and different factors for the past few years. number s < i, let the seasonal difference be defined as
A nonstationary series can be made stationary by different
operations on the times series data. yi0 = yi − yi−s . (30)
combination of the values preceding window of some length auto-correlation between a value and its neighbors. In some
p (AR(p) model)[11]: ARIMA like models, p can be considered as the period of
p
X the time series. On quantum computers the value of p can be
yt = ai · yt−i + c + t , (32) found along with the other optimization parameters by using a
i=1 classical optimization. In addition, since it is the period of the
where c and the coefficients a1 . . . ap are learned through a data, one can also adapt quantum phase estimation algorithm
learning process. to find this period as done in Shor’s factoring algorithm.
In the moving average model (MA(q)), the behavioral
attribute value at any timestamp is determined from the un- C. Forecasting Models As Circulant Matrix Operators
expected historical variations in the time series data (shocks): If the equation for yt in any of the models is stacked up for
q the values t = 0, 1, . . . , T − 1, the above generic models can
X
yt = bi · t−i + c + t , (33) be rewritten as a system of equations constructed by circulant
i=1 matrices [30]. The circulant matrices can be implemented
very efficiently on quantum computers since their eigenspace
where c is the mean value and the coefficients bi s are learned
is formed by the Fourier transform (see Ref.[31] for circuit
from the data. A more powerful and general model can
implementation of these matrices). Therefore, one can use
be obtained by combining the two aforementioned models
those matrices in conjunction with the variational quantum
(ARMA(p, q)):
circuits to optimize the parameters of the forecasting models.
p
X q
X
yt = ai · yt−i + c + t + bi · t−i + c + t . (34)
V. D ISCUSSION AND F UTURE D IRECTIONS
i=1 i=1
In this paper, we go through data processing and forecasting
Autoregressive moving average models (ARMA) are used
with ARIMA models and show how to implement them on
best with the stationary data (when the mean, variance, and
quantum computers. As future direction, more example data
autocorrelation does not change in time). The non-stationary
analysis are needed to investigate algorithms and methods for
data are handled by integrating a series of differences into
preprocessing and transformation of time series data into the
the model (ARIMA(p, q, q)). For instance with the first-order
Hilbert space in which quantum computers work better than
difference (d = 1) the model can be written as:
classical computers; and investigate quantum machine learning
p q
X X methods and algorithms for univariate and/or multivariate time
yt0 = ai · 0
yt−i + c + t + bi · t−i + c + t . (35)
series data to forecast, classify and cluster the data. In addition,
i=1 i=1
for time series data, there are many classical packages such
We can easily generate the above by using a variational as Facebook’s- Prophet [32] and ARIMA (Autoregressive
quantum circuit including two and single qubit gates (as done Integrated Moving Average) models. There is also need to
in many of quantum machine learning models) and apply to design a quantum packages-that can be used with the current
the quantum state prepared in one of the following forms: quantum libraries such as IBM-QISKIT[33]-for time series
data to forecast and do other tasks.
!
|yi
|yi |i , θy |yi + θ |i , or . (36)
|i
A. Data representation and Measurement
Note that |yi may also represent the state after differencing. Representation of data on quantum computers may not be
Also note that the angle values of the gates in the circuit without noise. Therefore, the small changes in any feature
are related to the parameters a and b so that we can obtain of data may not be observed on quantum state by using
the parameterized model in Eq.(35) or any of the previous simple measurements. If one qubit is used for one feature,
models. Furthermore, note that following the Ref.[29] where then we expect any change in the input data to impact the
it is shown how to map the Schmidt decomposition of a measurement statistics of a qubit. However, representing whole
vector into a quantum circuit, one can also prepare a quantum feature vector as a quantum state with fewer qubits enforces an
operator whose first row is the parameters a and b. automatic dimension reduction in the measurement and hence
a1 . . . ap b1 . . . bq may cause a significant data loss that impedes the prediction
• ... •
... •
results.
. (37)
.. .. .. .. ..
. . . . B. Quantum optimization for autoregressive models
If the dimensions of a and b are k, this operator can be Note that autoregressive models can be also combined
generated by using O(k) number of quantum operations. We with quantum neural networks as done with the classical
map the parameters to the angles of the rotation gates. Then, neural networks [34]. Also note that autoregressive models
we adjust the rotation gate parameters by using a classical can be formulated as combinatorial optimization problems
optimization technique as done in the variational solvers. in the framework of graph learning [35]. In addition, it is
Since the value of p determines the seasonality (period) shown that quadratic unconstrained optimization formulation
in the time series. Therefore, ARIMA model indicates the can be used for forecasting in finance [36]. With the help
7
of these and other similar studies, autoregressive models can [14] M. S. Gashler and S. C. Ashmore, “Modeling time series
be formulated as combinatorial optimization in the form of data with deep fourier neural networks,” Neurocomput-
quadratic unconstrained binary optimization. Therefore, the ing, vol. 188, pp. 3–11, 2016.
quantum optimization approaches such as adiabatic quantum [15] V. Dunjko and P. Wittek, “A non-review of quantum
computation [37], quantum unconstrained bounded optimiza- machine learning: trends and explorations,” Quantum
tion algorithm [38], quantum power iteration [39] can be used Views, vol. 4, p. 32, 2020.
for these models. [16] V. Dunjko and H. J. Briegel, “Machine learning &
artificial intelligence in the quantum domain: a review of
recent progress,” Reports on Progress in Physics, vol. 81,
R EFERENCES no. 7, p. 074001, 2018.
[17] P. Singh, G. Dhiman, and A. Kaur, “A quantum approach
[1] R. P. Feynman, “Quantum mechanical computers,” Foun- for time series data based on graph and schrödinger
dations of physics, vol. 16, no. 6, pp. 507–531, 1986. equations methods,” Modern Physics Letters A, vol. 33,
[2] P. Benioff, “The computer as a physical system: A no. 35, p. 1850208, 2018.
microscopic quantum mechanical hamiltonian model of [18] D. Emmanoulopoulos and S. Dimoska, “Quantum ma-
computers as represented by turing machines,” Journal chine learning in finance: Time series forecasting,” arXiv
of statistical physics, vol. 22, no. 5, pp. 563–591, 1980. preprint arXiv:2202.00599, 2022.
[3] P. W. Shor, “Algorithms for quantum computation: dis- [19] M. Schuld, R. Sweke, and J. J. Meyer, “Effect of data en-
crete logarithms and factoring,” in Proceedings 35th coding on the expressive power of variational quantum-
annual symposium on foundations of computer science. machine-learning models,” Physical Review A, vol. 103,
Ieee, 1994, pp. 124–134. no. 3, p. 032430, 2021.
[4] L. K. Grover, “Quantum mechanics helps in searching for [20] N. Killoran, T. R. Bromley, J. M. Arrazola, M. Schuld,
a needle in a haystack,” Physical review letters, vol. 79, N. Quesada, and S. Lloyd, “Continuous-variable quantum
no. 2, p. 325, 1997. neural networks,” Physical Review Research, vol. 1,
[5] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum no. 3, p. 033063, 2019.
principal component analysis,” Nature Physics, vol. 10, [21] A. Daskin, “A simple quantum neural net with a periodic
no. 9, pp. 631–633, 2014. activation function,” in 2018 IEEE International Confer-
[6] A. N. Kolmogorov, “On the representation of continuous ence on Systems, Man, and Cybernetics (SMC). IEEE,
functions of many variables by superposition of continu- 2018, pp. 2887–2891.
ous functions of one variable and addition,” in Doklady [22] V. Giovannetti, S. Lloyd, and L. Maccone, “Quantum
Akademii Nauk, vol. 114, no. 5. Russian Academy of random access memory,” Physical review letters, vol.
Sciences, 1957, pp. 953–956. 100, no. 16, p. 160501, 2008.
[7] R. Hecht-Nielsen, “Kolmogorov’s mapping neural net- [23] M. A. Nielsen and I. Chuang, “Quantum computation
work existence theorem,” in Proceedings of the interna- and quantum information,” 2002.
tional conference on Neural Networks, vol. 3. IEEE [24] A. Pérez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, and
Press New York, NY, USA, 1987, pp. 11–14. J. I. Latorre, “Data re-uploading for a universal quantum
[8] G. Cybenko, “Approximation by superpositions of a classifier,” Quantum, vol. 4, p. 226, 2020.
sigmoidal function,” Mathematics of control, signals and [25] M. Schuld and N. Killoran, “Quantum machine learning
systems, vol. 2, no. 4, pp. 303–314, 1989. in feature hilbert spaces,” Physical review letters, vol.
[9] A. R. Gallant and H. White, “There exists a neural 122, no. 4, p. 040504, 2019.
network that does not make avoidable mistakes.” in [26] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart,
ICNN, 1988, pp. 657–664. V. Stojevic, A. G. Green, and S. Severini, “Hierarchical
[10] Z.-Q. John Xu, Y. Zhang, T. Luo, Y. Xiao, and quantum classifiers,” npj Quantum Information, vol. 4,
Z. Ma, “Frequency principle: Fourier analysis sheds no. 1, pp. 1–8, 2018.
light on deep neural networks,” Communications in [27] H. Buhrman, R. Cleve, J. Watrous, and R. De Wolf,
Computational Physics, vol. 28, no. 5, pp. 1746– “Quantum fingerprinting,” Physical Review Letters,
1767, 2020. [Online]. Available: http://global-sci.org/ vol. 87, no. 16, p. 167902, 2001.
intro/article detail/cicp/18395.html [28] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and
[11] C. C. Aggarwal et al., Data mining: the textbook. R. D. Somma, “Exponential improvement in precision
Springer, 2015, vol. 1. for simulating sparse hamiltonians,” in Forum of Math-
[12] H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, ematics, Sigma, vol. 5. Cambridge University Press,
and P.-A. Muller, “Deep learning for time series classifi- 2017.
cation: a review,” Data mining and knowledge discovery, [29] A. Daskin, A. Grama, G. Kollias, and S. Kais, “Universal
vol. 33, no. 4, pp. 917–963, 2019. programmable quantum circuit schemes to emulate an
[13] A. Abanda, U. Mori, and J. A. Lozano, “A review on operator,” The Journal of chemical physics, vol. 137,
distance based time series classification,” Data Mining no. 23, p. 234112, 2012.
and Knowledge Discovery, vol. 33, no. 2, pp. 378–412, [30] D. S. G. Pollock, “Circulant matrices and time-series
2019. analysis,” International Journal of Mathematical Edu-
8