Data Traffic Prediction Using Fuzzy Methods: Anthony Chiaratti Pedro Dias de Oliveira Carvalho
Data Traffic Prediction Using Fuzzy Methods: Anthony Chiaratti Pedro Dias de Oliveira Carvalho
Data Traffic Prediction Using Fuzzy Methods: Anthony Chiaratti Pedro Dias de Oliveira Carvalho
1
Email: achiaratti@gmail.com
2
Email: pedrodoc@gmail.com
1. Introduction
Data links flow is essential discussion to avoid congestion and valuable
information loss during high usage traffic, this problem come from long date on
telephone links that used to direct connect each subscriber to other, with its
expansion it became impractical to dedicate each user to a channel and the
natural solution was multiplex the signal and share the media.
In the other hand the use of FTS (Fuzzy Time Series) in data traffic
prediction problematic is expect to bring inherit information diversity, also there
is a very large amount of publications regarding improvements in FTS methods,
each method impose its own data restriction. This paper provides an overview
comparison between some FTS methods in traffic throughput based in high
value changes within samples, seasonality and information diversity, this make
data traffic prediction a great challenge.
Figure 1 – Time-Series
Figure 2 – ACF
First and highest correction is with past hour, in other words we can state
that grows or reduces depends on last sample. As discussed previously a daily
season can be notice in each 24 hour period, each period reduces its
correlation as the get far from most recent data, although after reaching its
minimum, correction start increase until a new peak dated 169 hours later, it’s
very near from expect week season in 24 times 7 equal 168 hour.
3. Methodology
FTS methods are based in same logic and then, for elucidation only, Chen
(1996) algorithm will be presented. For others algorithm their differences from
Chen method will be discussed.
Step 1 is defining the universe U of discourse within historical data that shall
include Dmin as minimum value and Dmax as maximum value. Defining D1 and
D2 as U=[Dmin – D1 ; Dmax + D2] where D1 and D2 are positive numbers.
Step 4 fuzzify historical data there means assign data values into fuzzy sets.
Step 5 shall establish FLR (Fuzzy Logical Relationship), these FLR defines
relation between time differences in fuzzy sets, Chen’s method recurrence is
eliminated.
Step 6 calculates the forecasted value follow simple rules. If FLR is one-to-
one the forecasted value is the midpoint of followed interval. If FLR is one-to-
many the forecasted value is the mean of midpoints of followed intervals.
Step 6 will define weights for each FLRG observed, a classical method by
linear weight starting from 1 to N (Number of relationships) normalized by the
sum of weight. A new comparison will be implemented by using weights logic,
the exponential first proposed by Lee and Javedani (2011).
4. Evaluation
Motivated by reasons presented on introduction and abstract classic
methods will be performed first by methods itself and their comparison with
naïve method, it is important to observe that from our motivation the one step
ahead is not the most important forecast, but free running forecast, we choose
24 hours for network engineers alert when congestion is predicted and
preventive resources allocation.
From database were extract hourly traffic from a wide area then this data
were divided into two sets. First one was used for algorithm training, running
proposed methods presented in previous section and their variants for
seasonality.
Second dataset was used as validation, running one step ahead and 24
hours free run.
For simplicity this universe was divided using Huarng (2001) effective length
into equals intervals his paper proposed to use half of the absolute differences
mean and round it on defined base, from training database Table 3 shown
divided intervals: Shown in Megabytes
Table 3 – Intervals
∑𝑇𝑡=1|𝑒𝑡 |
𝑀𝐴𝑆𝐸 =
𝑇
∑𝑇 |𝑌 |
𝑇 − 1 𝑡=2 𝑡 − 𝑌𝑡−1
Where T is total time and |𝑒𝑡 | is the absolute error defined by difference
between actual value 𝑌𝑡 and forecasted value 𝑌̂𝑡 . The denominator calculates
the mean absolute error of one step ahead, in other words, the naïve error.
∑𝑇𝑡=1|𝑒𝑡 |
𝑀𝐴𝑆𝐸 =
𝑇
∑𝑇 |𝑌 |
𝑇 − 𝑚 𝑡=𝑚+1 𝑡 − 𝑌𝑡−𝑚
5. Results
After a careful analysis were decided to perform comparison for 1 step
ahead using naïve, Chen, Yu with equal weight, Yu with linear weight, Yu with
exponential weights (C=2) and differential transformation.
100
109
118
127
136
145
154
163
172
1
73
82
10
19
28
37
46
55
64
91
Figure 4 – Forecast data 1 hour ahead
18000000
16000000
14000000
12000000 Real
10000000
Naive
8000000
Yu_EqualWeight
6000000
4000000 Yu_LinearWeight
2000000
0
100
109
118
127
136
145
154
1
10
19
28
37
46
55
64
73
82
91
For 24 hour free run the AR-FTS present the best results compared with
first order methods. Data presents a very difficult behavior and is still a
challenge to be well forecasted.
7. References
[1] Ian Angus, Telemanagement #187
[2] https://en.wikipedia.org/wiki/Erlang_distribution
[3] http://ita.ee.lbl.gov/html/contrib/LBL-CONN-7.html
[6] A fuzzy time series-markov chain model with an application to forecast the
exchange rate between the Taiwan and us dollar (2011)