Efficient On-Line Computation of Visibility Graphs
Efficient On-Line Computation of Visibility Graphs
Abstract—A visibility algorithm maps time series into complex Both of these approaches comprising the current existing
networks following a simple criterion. The resulting visibility methods to compute visibility graphs, are off-line algorithms,
graph has recently proven to be a powerful tool for time as they require all the data points in the time series to
series analysis. However its straightforward computation is time-
consuming and rigid, motivating the development of more effi- be available before the graph is constructed. Consequently,
cient algorithms. Here we present a highly efficient method to the integration of new data points normally requires to re-
compute visibility graphs with the further benefit of flexibility: compute the visibility graph from scratch, representing a major
on-line computation. We propose an encoder/decoder approach,
arXiv:1905.03204v1 [cs.DS] 8 May 2019
From the definition of visibility it immediately follows that, graph of a time series is always a sub-graph of the natural
for a set visibility criterion, the visibility graph associated to visibility graph associated to the same time series.
a given time series is unique. Moreover, any two subsequent
data points of the time series are always connected by an edge,
thus visibility graphs are connected and Hamiltonian [20]. III. S TATE OF THE ART
In addition, visibility graphs are also invariant to re-scaling
on both horizontal and vertical axes (i.e., the first point on A straightforward approach to compute visibility graphs
either side of a node i remains visible from i no matter how consists in checking whether any of the points of the time
far apart they are), and invariant to vertical and horizontal series is visible or not from every other point. This corresponds
translations (i.e., only the relative values of point determine to evaluating the visibility criteria for every pair of points in
visibility relations). the time series. Since we consider visibility as a symmetric
In Figure 1.f. we show both the natural and horizontal relation, the total number of checks needed to obtain a
visibility criteria at work on an arbitrary time series. Notice visibility graph of a time series of n data points is equal to
that horizontal visibility is a more stringent criterion than n(n − 1)/2, corresponding to a O(n2 ) time complexity.
natural visibility, meaning that if two points are horizontally In the case of horizontal visibility, one can take a step
visible then they are also trivially visible when using the nat- further and safely assume that no point after a value larger
ural visibility criterion. Consequently, the horizontal visibility than the current value ta will be horizontally visible from
ta . This observation effectively reduces the time complexity
of the construction to O(n log(n)) and, in the case of noisy
(stochastic or chaotic) signals, it can be proved that this
algorithm has an average-case time complexity O(n) [20].
Nevertheless, all pairs of points need to be checked in the
case of natural visibility. From now on, this simple approach
will be referred to as the basic method for both natural and
horizontal visibility computation 1 .
As an improved alternative for visibility computation, Lan
et al. presented a ‘Divide & Conquer’ approach [17]. This
algorithm reduces the average case time complexity of the
construction of the natural visibility graph to O(n log(n)) and
it significantly reduces computation time for most balanced
time series.
The basic idea behind the ‘Divide & Conquer’ algorithm
is related to the horizontal visibility optimisation mentioned
above. Once the maximum value M of the time series is
known, one can safely assume that the points on the right of
M will not be naturally visible from the points on the left of
M (the point M is effectively acting as a wall between the two
sides of the time series). The same argument is then applied
recursively on the two halves of the time series separated
by M , where the local maxima subsequently found at each
level are connected with an edge to the maxima at the level
immediately above them. From now on, this improved method
will be referred to as ‘Divide & Conquer’ (or DC for short).
Both the basic method and DC are off-line approaches,
meaning that they require all the points of the time series
to be accessible at the beginning of the computation. This
rigid requirement limits the applicability of visibility graphs,
specially in fields like telecommunications or finance, where
there is a constant incoming flow of new data to be processed
and assimilated. Moreover, in such big data scenarios, one
tends to favour an initial overall high level analysis that will
Fig. 1. Representation of the different steps of the proposed algorithm reveal the need for further processing. This work-flow would
for visibility graphs computation. In section A, the sample time series and benefit from dynamic algorithms unlike the ones presented
its correspondent maximum binary search tree. Section B represents the above.
connections deduced by the first connectivity rule. The second and third
connectivity rules are illustrated in section C and D respectively. Section
E shows the remaining checks needed to ascertain natural visibility. Finally, 1 The original Fortran 90 implementations of basic algorithms to con-
section F reports the horizontal and natural visibility graph associated to the struct visibility graphs can be found at http://www.maths.qmul.ac.uk/∼lacasa/
original time series. Software.html
3
Fig. 4. Visual representation of the proposed method to merge two maxima binary trees, covering both append (A) and insert (B) operations. This corresponds
to an on-line scenario where a new batch (red) needs to be incorporated to an existing structure (blue).
the right branch as it index is larger than the chosen blue root, and that child will be broken thereafter. For example, in Figure
leaving the left branch of the chosen blue root untouched. 4.B., this situation takes place in layer 3, where Node 7, the
Consequently the right blue child is to be compared with child of Node 2 belongs on the right branch of Node 5 unlike
the red root. In this case, the red root has a larger value and its parent.
so it will take the right branch position in the resulting tree.
Now is the turn to the right blue child to descend down the VI. N UMERICAL E XPERIMENTS
red tree. Since the blue child happens to be the lowest value In this section we present empirical results in order to show
in the series, it will just descend layers following the binary how the proposed visibility algorithm compares to the state of
search tree rules until is reaches an empty spot. the art. All the code related to this paper and necessary to run
Usually, as one may observe in Figure 4, the children of the following experiments is implemented in Python 2.7 and
the nodes that travel down in depth are not included in the freely available online 2 . The machine used in the simulations
level comparison. However, when new data is to be inserted is an early 2015 MacBook Pro Retina with a 2.9GHz Intel
to the existing series, the child of the node traveling down Core i5 processor and 16GB of RAM.
could have an index corresponding to the other branch of the
resulting tree. In this case, the connection between the node 2 Available at https://github.com/delialia/bst
7
Fig. 5. Computation time of the natural visibility graph (nvg, second row) and horizontal visibility graph (hvg, third row) of different time series (examples
on first row) using the current visibility algorithms: Basic, Divide & Conquer (DC), and the proposed binary search tree (BST) method. Each point at every
series size is the mean of the computation time for 10 series of that size.
To put the presented algorithm into context [17], in Figure 5 to vary very little between data types in comparison to the
we report the computation time needed by current visibility relatively high spread observed for the basic algorithm.
algorithms on different synthetic time series of increasing
The horizontal visibility computation remains stable in both
length. Since the actual efficiency of each algorithm depends
the DC and proposed method, and could potentially be con-
to some extent on the character of the original time series,
sidered independent of the data type given a time computation
we considered uniform random noise (which has no structure
scaling factor. This behaviour was expected as the proposed
and on average produces almost-balanced binary search trees),
method is fully defined by the aforementioned connectivity
a Conway series (which has a quite rich structure and corre-
rules and has average-case time complexity O(n log n).
sponds to a quite unbalanced tree), and a random walk series
(which represents the more realistic scenario of a signal with On the other hand, Figure 6 suggests that the efficiency of
both structure and noise). the computation of natural visibility graphs is subject to wider
In the first case we observe the largest gap in computation fluctuations. The position of the maximum in the time series
time between the basic algorithm and the more efficient affects the efficiency of both the DC and the proposed method,
ones as it corresponds to the aforementioned average case as it will determine the number of additional visibility checks
where both algorithms (DC and the proposed one) significantly needed to obtain the natural visibility graph.
reduce the number of operations. Such differences are more An English speech time series will typically have its max-
prominent in the computation of the horizontal visibility graph. imum somewhere towards the middle section of the signal
Additionally, in Figure 6 we present a similar computa- (since we rarely tend to raise our voice at the end of our
tional time analysis over real samples of speech (English speech). Therefore the speech time series proposed codec will
language) [10] and financial data [23]. Figure 6 is particularly most probably produce an almost balanced binary search tree,
interesting as it clearly shows a correlation between time com- yielding a time complexity of O(n log n). For this reason, one
putation and the time series structure (please note the different may observe a wider gap in computation time between the
scale for time computation). Even though the time computation basic method and the faster alternatives for the speech data in
may differ, the DC and proposed method distribution seem Figure 6 than for the financial time series.
8
Fig. 6. Current and proposed visibility algorithms computation time for 100 speech and finance time series of 1000 points. The speech time series are sampled
from the training TIMIT dataset [10]. The finance time series corresponds to the 2013 quarterly data used in [23].
In terms of computation time, the proposed method and the the new batch is simply added to the time series itself and
DC one are closely related. They are both quicker than the then the binary tree codec must be re-computed from scratch.
basic implementation in both natural and horizontal visibility In the proposed on-line approach, the next batch is encoded
and they both present similar trends for increasing time series into its own binary tree that is then merged to the existing
size (Figure 5). However, the proposed algorithm has proven codec using the procedure detailed in Algorithm 2. Note that
to consistently be the quickest option for horizontal visibility the decoding step remains the same for the on-line and off-line
graph computation. On the other hand, the DC algorithm in approach, and so the comparison will essentially be between
general does perform better than the proposed method for computing a codec from scratch (off-line) and merging two
natural visibility computation. Even though at this point both codecs into a single binary search tree (on-line).
DC and the proposed method seem equally good of an option Figure 7 shows how much quicker the computation of
for fast visibility computation, the presented algorithm has the the on-line method (codec for new data + merging) is in
additional property of allowing on-line assimilation of new comparison to the computation time of the off-line approach
data, which is something not easily achievable in either the (codec from scratch), for different time series and batch sizes.
basic approach or the DC algorithm. In particular, the on-line approach is always better if the new
The most straightforward way to asses the on-line func- batch to be added is equal or bigger than the existing time
tionality of the proposed method is to compare it with the series, especially for large time series.
equivalent off-line approach. In our case, it directly relates to
the binary tree codec. Given a batch of new points to be added
to the time series visibility analysis, in the off-line approach,
9