An Integrated Machine Learning Model For Day-Ahead Electricity Price Forecasting
An Integrated Machine Learning Model For Day-Ahead Electricity Price Forecasting
An Integrated Machine Learning Model For Day-Ahead Electricity Price Forecasting
AbstractThis paper proposes a novel model for short-term
electricity price forecasting based on an integration of two
machine learning technologies: Bayesian Clustering by Dynamics
(BCD) and Support Vector Machine (SVM). The proposed
forecasting system adopts an integrated architecture. Firstly, a
BCD classifier is applied to cluster the input data set into several
subsets in an unsupervised manner. Then, groups of 24 SVMs for
the next days electricity price profile are used to fit the training
data of each subset in a supervised way. To demonstrate the
effectiveness, the proposed model has been trained and tested on
the data of the historical energy prices from the New England
electricity market.
Index TermsElectricity Price, Non-stationarity, Machine
Learning.
I. INTRODUCTION
S.Fan, K.Kaneko and L.Chen are with Osaka Sangyo University, 3-1-1
Nakagaito, Daito, Osaka, 574-0013, Japan (e-mail: fanshu@hotmail.com;
k-kaneko@eic.osaka-sandai.ac.jp, chen@eic.osaka-sandai.ac.jp).
James R.Liao is with the Western Farmers Electric Cooperative, Anadarko,
OK, 73005 USA (e-mail: j_liao@wfec.com)
142440178X/06/$20.002006IEEE
1643
PSCE2006
150
100
50
5000
10000
15000
20000
Hour
Fig.1. Hourly electricity price in NEPOOL from March 1, 2003 to July 31, 2005
1644
data in the same segment can be modeled by the same SVR due
to similar property. Notice that there are 24 SVRs for each
subset to train and predict 24-hour prices of next day
respectively.
Prior
probability
L, W, F
Subset 1
P1
L1
SVR1
SVR2
SVR24
Subset 2
BCD classifier
P0
P1
L1
.
.
..
SVR1
SVR2
SVR24
.
.
Subset n
P2
L2
.
.
..
.
.
.
.
Table II. List of input data of the SVM network for regular days
Input
Variable name
Lagged value (hours)
24,25,26,48,72,96,120,
1-9
Hourly price (L1)
144,168
-1,0,1,24,48,72,96,120,
10-19
Hourly load (T1)
144,168
The hour of load predication is assumed at 0, the lag 0 represents the target
instant, and the 24 lagged hours means the values that were measured 24 hours
earlier than the hour of predication.
SVR1
SVR2
SVR24
Fig .2. Integrated machine learning model for the electricity price forecasting
Input
1-10
Price series P0
11-13
Variables to
determine prior
probability
1645
X k
xk
xkmk
X kmk
Given a set of possible clustering models, the task is to rank
them according to their posterior probability. The posterior
probability of the model MC is computed by Bayes Theorem as
(3)
P ( M C | x) v P ( M C ) f ( x | M C )
where P(MC) is the prior probability of MC and f (x| MC) is the
marginal likelihood. Assuming independent uniform prior
distributions on the model parameters and a symmetric
Dirichlet distribution on the cluster probability pk, the marginal
likelihood of each cluster model MC can be easily computed in a
closed form by solving the integral:
f (x | M C )
f (x | T
C)
f (T C )dT C
(4)
f (x | M c )
c
k 1
* (1)
* (1 m )
RSS k ( q n k ) / 2 n k q
*(
)
)
* (mk / m mk )
2
2
( q nk ) / 2
* (mk / m )
( 2S )
det( X kT X k ) (1 / 2 )
D( S i , S j )
(x
it
x jt ) 2
(6)
t 1
(5)
where nk is the dimension of the vector xk, and
RSS k xkT ( I n X k ( X kT X k ) 1 X kT ) xk is the residual sum of
squares in cluster Ck. When all clustering models are a priori
equally likely, the posterior probability P(MC| x) is proportional
to the marginal likelihood f(x|MC), which becomes our
probabilistic scoring metric.
1646
min
Z [[
,b, ,
1 T
Z Z C ([ i [ i* )
2
i 1
Subject to yi (Z T I ( xi ) b) d H [ i* ,
(7)
(Z I ( xi ) b) yi d H [ i ,
[ i , [ i* t 0, i=1,,n
where xi is mapped to a higher dimensional space by the
function , and i* is slack variables of the upper training error
(i is the lower) subject to the -insensitive tube
(Z T I ( xi ) b) yi d H . The constant C>0 determines the trade
off between the flatness and losses. The parameters which
control regression quality are the cost of error C, the width of
the tube , and the mapping function .
The constraints of (7) imply that we put most data xi in the
tube . If xi is not in the tube, there is an error i or i* which we
tend to minimize in the objective function. SVR avoids
under-fitting and over-fitting of the training data by minimizing
n
([
i 1
max
1647
C. Numerical Results
The criteria to compare the performance are the mean
absolute error (MAE) and mean absolute percentage error
(MAPE) in this paper, which indicate the accuracy of recall.
Numerical results with the proposed method are presented.
To demonstrate the effectiveness of the integrated structure, a
SVM network without the BCD classifier is also adopted for
forecasting. The MAEs and MAPEs for the two test months are
shown in Tables III.
REFERENCES
Table III
Forecasting results of May and August 2005
Single SVM
Integrated architecture
MAE
MAPE
MAE
MAPE
May 2005
3.78
6.83
2.80
5.35
August 2005
8.04
8.11
6.58
6.82
Average
5.91
7.47
4.69
6.08
[1]
[2]
[3]
Actual
Forecast
125
Then, in each subset, different SVMs have been used to fit the
input data belonging to different market state in a supervised
manner. The proposed method has been applied to the
prediction of New England spot market price and demonstrates
a high degree of effectiveness and efficiency in learning and
prediction compared to other methods. As shown in the
numerical experiments, the proposed method predicts the
prices with a high degree of accuracy.
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Price ($/MWh)
[11]
100
[12]
[13]
75
[14]
50
[15]
25
350
400
450
500
550
Hour
[16]
[17]
V. CONCLUSION
[18]
[19]
[20]
[21]
1648
Shu Fan received his B.S. M.S. and Ph.D. degrees in Department of Electrical
Engineering, from Huazhong University of Science and Technology (HUST),
in 1995, 2000 and 2004 respectively. Presently he is carrying out postdoctoral
research in Osaka Sangyo University. His research interests are power system
control, high-power power electronics and hybrid intelligent forecasting
system.
James R. Liao (M89) received the M.S. degree from the University of
Missouri, Rolla, in 1980, and the Ph.D. degree from the University of
Oklahoma, Norman, in 1992, both in electrical engineering. Since 1980, he has
been with the Western Farmers Electric Cooperative, Anadarko, OK. He was a
Transmission/Generation Systems Analyst from 1980 to 1985 and an EMS
System Software Engineer from 1985 to 1999. Since 1999, he has been
Principal Operations Engineer. Dr. Liao is a Registered Professional Engineer
in the State of Oklahoma.
Luonan Chen (M94, SM98) received the B.E. degree from HUST, Wuhan,
China in 1984, and the M.E. and Ph.D. degrees from Tohoku University,
Sendai, Japan, in 1988 and 1991, respectively. Since 1997, he has been a faculty
of Osaka Sangyo University, Osaka, Japan, where he is currently a Professor
with the department of Electrical Engineering and Electronics. His fields of
interest are nonlinear dynamics and optimization in power systems.
1649