Detection - and - Classification of Line
Detection - and - Classification of Line
3, MAY 2021
Abstract—We present a novel transient fault detection and the transmission systems can the operator take effective emer-
classification approach in power transmission lines based on gency control actions according to the classification results,
graph convolutional neural network. Compared with the existing which facilitates the location of the faults and reduces the
techniques, the proposed approach considers explicit spatial
information in sampling sequences as prior knowledge and it has time of eliminating them.
stronger feature extraction ability. On this basis, a framework For transient fault detection and classification, the extraction
for transient fault detection and classification is created. Graph of fault features is a key task. Different from the fault
structure is generated to provide topology information to the identification based on image data [3], the feature extraction
task. Our approach takes the adjacency matrix of topology of voltage and current data involved in this paper is more
graph and the bus voltage signals during a sampling period after
transient faults as inputs, and outputs the predicted classification abstract. Early researches are based on transmission line fault
results rapidly. Furthermore, the proposed approach is tested mechanism model. In reference [4], the mechanism of fault
in various situations and its generalization ability is verified current generation and the fault features are analyzed by
by experimental results. The results show that the proposed establishing the expression of fault current. Reference [5] uses
approach can detect and classify transient faults more effectively the fault equivalent circuit to determine the fault current and
than the existing techniques, and it is practical for online
transmission line protection for its rapidness, high robustness threshold to classify the faults. The above researches derive the
and generalization ability. expression of fault current or voltage through fault mechanism
analysis, and finally make fault diagnosis. The above model-
Index Terms—Graph convolutional network (GCN), power driven techniques may achieve good results under specific
transmission line, fault detection and classification, spatio- scenarios, but poor generalizability is their drawback. The key
temporal data, topology information.
reason is that single model cannot fully depict the various
mechanisms involved in electrical events, and it will become
invalid in variable environments [6]. Moreover, these methods
usually require many assumptions, and the modeling process
I. I NTRODUCTION involves a lot of manual calculation which is time-consuming
explore the spatio-temporal correlation of abnormal data in the problem statement and the proposed method framework.
reference [14], but the spatio-temporal matrix in it does not Section IV is the case study. Section V summarizes our work.
introduce the connection relationship between nodes. In other
words, the spatial relationship of data is not provided as prior II. G RAPH C ONVOLUTIONAL N EURAL N ETWORK
knowledge. In this section, we first introduce the graph structure and
As pointed out above, the evolution of fault detection and provide an example to illustrate the workflow of graph con-
classification methods is from model-driven to data-driven. volution.
With the power industry gradually becoming intelligent, the
more complex characteristics of transient faults and the multi- A. Graph Structure
variation of transmission system operation make data-driven In our work, we treat a transmission system as an undirected
fault detection and classification techniques still the leading graph G = (V, E, A) with N nodes vi ∈ V, edges (vi , vj ) ∈ E,
approaches at present [1], [16]. By reviewing, we find that an adjacency matrix A ∈ RN ×N (weighted) and a degree
even though many studies investigated new approaches to
P
matrix Dii = j Aij . The structure of undirected graph is
improve the effect of fault detection and classification, few depicted in Fig. 1.
researchers considered the explicit spatial relations among the
fault data of transmission systems. However, power system V6
data, as a typical “industrial big data”, is indeed a kind of e16
spatio-temporal data [17], [18]. Although the spatio-temporal
correlation of fault data is vital, it is difficult to introduce it e23
explicitly into the detection and classification task through the e12 V3
V1
existing techniques. Our work aims to fill this gap. Along with e15
the well-established research line of GCN, we come up with V2
V5 e24
more new ideas as we deal with the graph structure data.
Graph NN is first introduced by Bruna et al. [19]. It applies
convolutional layers on the graph structured data rather than V4
just regular data such as images. Compared with the case
that CNN cannot effectively process irregular data, researchers Fig. 1. Undirected graph structure with nodes vi and egde weights eij
could make effective use of explicit spatial information when (i, j = 1, 2, 3, 4, 5, 6).
using GCN. Due to the universality and diversity of graph
structure data in our life, the development and application of The adjacency matrix A represents the connection relation-
GCN are rising rapidly. It has been successfully applied in ships of all nodes in a graph, as follows:
recommendation system, social network, life science and other
0 e12 0 0 e15 e16
fields [20]. In our view, the power system topology is naturally e21 0 e23 e24 0 0
a graph and the edge information in the graph can also be
0 e32 0 0 0 0
extended to “electrical distance”. The topological structure of A= 0 e42 0
(1)
0 0 0
power transmission system and large amounts of measurement
e51 0 0 0 0 0
data provide a new opportunity for proper applications of e61 0 0 0 0 0 6×6
GCN in power systems. In our work, a GCN based on power
topology is used to detect and classify transient faults. where eij in the matrix represents the correlation between the
Briefly, this paper has the following contributions: ith node and the jth node. If two nodes are connected by an
1) To the best of our knowledge, this is the first work of edge, eij is equal to the weight coefficient of this edge; if not,
leveraging GCN to implement the transient fault detection and then eij = 0. It is worth noting that eij = eji in the undirected
classification task. graph. Besides, degree matrix D is diagonal and the value of
2) A drawback of the existing techniques is pointed that the the diagonal element equals the number of adjacent nodes of
effect of explicit spatial information has not been taken into the corresponding node.
account. Therefore, we provide a novel idea of embedding B. Workflow of Graph Convolution
the spatio-temporal relations between data into detection and
Graph convolution was originally derived based on graph
classification models. To be brief, we propose to regard the
theory and convolution theorem with the purpose of applying
transmission line topology as a graph and utilize topology
it to graph data processing [22]. Through constant refinement
parameters to construct graph elements.
and optimization of the model, the expression of GCN has
3) In addition, we introduce the receiver operating charac-
become more understandable.
teristic (ROC) curve instead of only using accuracy to charac-
In practical application, we utilize the commonly used GCN
terize the overall performance of the classifier [21]. Further,
for graph convolution operation, in the form of feature transfer
comparison studies are implemented from three aspects to
and aggregation through self-normalized adjacency matrix.
analyze the generalization ability of our proposed method and
This GCN is proposed by Kipf et al. [23], and its one layer
the existing machine learning techniques.
operation is as follows:
The rest of our paper is organized as follows. Section II
1 1
introduces graph convolutional network. Section III discusses Z = σ(D̃ − 2 ÃD̃ − 2 XW ) = σ(ÂXW ) (2)
458 CSEE JOURNAL OF POWER AND ENERGY SYSTEMS, VOL. 7, NO. 3, MAY 2021
where the resultant à = A + I is the adjacency matrix with accounting for more than 90% of the total faults. When
self-loop. Self-loop can maintain the information of the target transient fault occurs in transmission line, the nodal voltage
station itself in the convolution part,
P which is a required design will drop to different degrees. To show the characteristics
strategy in GCN. And D̃ii = j Ãij is the degree matrix of different faults, some tests are implemented on a small
1 1
of Ã, so  = D̃ − 2 ÃD̃ − 2 represents the self-normalized transmission system with few nodes. Fig. 3 show the changes
adjacency matrix. W is the trainable weight matrix and σ(.) of nodal voltage waveform before and after the occurrence of
is the activation function. various faults.
The workflow for graph convolution is depicted in Fig. 2. 1) Normal Condition
Firstly, taking the graph in Fig. 1 as an example, we assume In the normal condition of a transmission system, the nodal
that the feature of node vi is Xi = [xi1 . . . xin ]T , so voltage range usually stays around 1.0 p.u. Unlike the steady
X ∈ R6×n . Secondly, the function of multiplying  by state simulation data mentioned in other papers, slight voltage
X is to transfer and aggregate the features of the adjacent fluctuations may occur in the transient data under normal
nodes, as shown in the middle part of Fig. 2. Finally, Z is the operating conditions.
output of this GCN layer, on which all nodes contain first-order 2) Fault Condition
neighborhood information. It is easy to deduce that the output
As shown in Fig. 3, the voltage reduction amplitude of
neurons obtained through k GCN layers can express k-order
different nodes under different fault conditions is different.
neighborhood information (spatial information). Therefore, the
Furthermore, influences on different nodes under the same
hidden layer data of GCN can provide more prior information
fault are different due to different distances from the fault
for the model training, so that the trained hidden layer neurons
location.
have a deeper feature expression ability.
The final purpose of our work is to detect the occurrence of
faults and to determine which kind of fault occurs by learning
III. D ETECTION AND C LASSIFICATION OF T RANSIENT deep representations of system nodal voltages.
FAULTS BASED ON G RAPH C ONVOLUTIONAL
N EURAL N ETWORK B. Construction of Transmission System Graph Structure
A. Problem Statement In this section we elaborate on how to construct the graph
Transmission line transient faults can be divided into single- structure data in transmission system. And this proposed way
phase ground fault, two-phase short circuit fault, two-phase to construct graph can be generalized to other engineering
ground fault and three-phase short circuit fault. Common tasks involving topology structure in power system. Here we
causes of these faults are lightning strike, wind deviation, take the IEEE 9-bus system as an example to illustrate the
pollution flashover, icing, external force, bird damage and construction process of the graph which is shown in Fig. 4.
some internal faults of the system. The severity of the four First of all, we define the nodes and their features. Obvi-
main types of faults is obviously different. Three-phase short ously, nodes here are the bus nodes in the transmission system,
circuit fault is the most harmful fault in the power transmission whose features are bus voltages or other electrical data.
system and requires the shortest clearing time. Single-phase Secondly, edges are defined. The lines in the topology are
ground fault is not as harmful as other kinds of faults, but the edges of the graph, which means that if there is a line
should not be neglected due to its high occurrence frequency, between two nodes, an edge can be assumed to exist between
X2 e^12
^
Z6 ęRm
X6 ęRn e15
X1 X5 Y1
e^16
e16
X6 WęRn×m
X3 Z1 Z3
X1 e23 ^ ^
AX Z=σ(A XW)=σ(YW)
e12
Z2
e15 X2
X5 node features one GCN layer Z5
e24 aggregation Self-loop
X1 Z1
X2 e^16 Z2 z11 Ă z1m Z4
x11 Ă x1n X4 X6 X1 Y6
X Z
output: Z = 3 =
Ă
Ă
input: X = 3 =
Ă
Ă
, and A Self-loop Z4
X4 z61 Ă z6m
X5 x61 Ă x6n Z5 6×m
6×n
spatial information Z6
X6
Fig. 2. Workflow of graph convolution (left part: input nodal features and edge weights; middle part: transfer and aggregate nodal features; right part: output
new representations of nodal features).
TONG et al.: DETECTION AND CLASSIFICATION OF TRANSMISSION LINE TRANSIENT FAULTS BASED ON GRAPH CONVOLUTIONAL NEURAL NETWORK 459
1.0 1.0
Voltage (p.u.)
Voltage (p.u.)
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
Sampling number (episode) Sampling number (episode)
(a) Single-phase-ground (b) Two-phase
1.0 1.0
0.8 0.8
Voltage (p.u.)
Voltage (p.u.)
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
Sampling number (episode) Sampling number (episode)
(c) Two-phase-ground (d) Three-phase
Gen 1
Fig. 4. Graph construction (bus → node; line → edge; line impedance → edge weight).
these two nodes in the graph. does not quantify the correlation between nodes. In reality,
In addition, we reckon that edges should be informative. The if the bus nodes are far apart from each other, the similarity
consideration of nodal features and the existence of edges only between them may not be high even if they are connected. If
covers the topological connections in spatial information, but the edge weights are not considered according to the actual
460 CSEE JOURNAL OF POWER AND ENERGY SYSTEMS, VOL. 7, NO. 3, MAY 2021
scene, the aggregation of neighborhood information will be result when no fault occurs, and will detect the corresponding
less accurate in the process of graph convolution. Therefore, fault when a specific fault type occurs.
we create a calculation criterion to get the weight of each To illustrate the whole workflow concisely, voltages are
edge using the line parameters. The calculation formula is as selected as the nodal features. Then the input matrix X of
follows: GCN can be written in the form of the following matrix (IEEE
1 9-bus system as an example):
q if node i, j are connected
eij = 2
Rij + Xij2
(3) u1,t1 u1,t2 · · · u1,ts
u2,t1 u2,t2 · · · u2,ts
0 else
X= . (4)
.. .. ..
.. . . .
where eij represents the edge weight coefficient between node u9,t1 u9,t2 · · · u9,ts 9×s
i and node j, and Rij and Xij represent the line resistance
and reactance parameters respectively. The significance of where each row of X represents the series of voltage magni-
this equation is that a longer distance or a larger impedance tude of a particular node in IEEE 9-bus system and s means the
indicates a smaller correlation between nodes. The idea of number of voltage samplings in a sampling time. We consider
introducing this criterion comes from the definition of edges using a multi-layer GCN for supervised fault classification.
in relevant applications of social recommendation [24]. In We first construct the self-normalized adjacency matrix Â
the interpersonal graph, edge information is used to represent according to the edge weights calculated by formula (3), and
the “user-user” relationship, which is usually called “social then our feedforward network takes the simple form:
distance”. Therefore, we come up with the idea of extending
the “social distance” to the “Electrical distance” in the power H (l+1) = σ ÂH (l) W (l) l = 0, 1, 2, . . . (5)
system. Specifically, “social distance” is the quantification of
Here, σ is an activation function such as RELU, Sigmoid,
the relationship between people, while “electrical distance” is
and H (0) ∈ Rn×s is equal to X; if we assume that the
the quantification of the spatio-temporal correlation of power
first hidden layer has H feature maps, then W (0) ∈ Rs×h
data. In addition, we propose (3) to provide a reasonable idea
is an input-to-hidden weight matrix. The lth hidden-layer also
for constructing edge weights. In reality, there can be various
determines the vector dimension by designing the number of
ways of defining edge information according to different
feature maps. As described in Section II-B, the convolution
requirements.
operation process will not be repeated here. Here we assume
Finally, we construct the graph structure based on the
that the final GCN layer output H ∈ Rn×m .
transmission system, as shown in Fig. 4.
Since it is a classification task, the last layer of our model
C. Workflow of GCN Based on Fault Detection and Classifi- needs to be a fully connected layer, as shown in Fig. 5.
cation The output features of the last hidden layer are stacked into
a long feature vector S (i) , which is used as the input vector
After building the graph structure, the next step is to build of a softmax classifier. The length of S (i) , ns , is calculated as
a GCN model based on fault detection and classification.
Concretely, this network is expected to output a non-faulty ns = n × m (6)
h1,1
Softmax classifier
Ⅰ
h1,m
h1,1 h1,m
Fault
type
Ⅹ
hn,1
Ⅺ
hn,m
hn,1 hn,m
S (i)
where t(i) represents the ith class of the output layer. Finally, B. Selection of Network Parameters
the network is backpropagated and trained according to the 1) Simulation Parameters
error between the output category and the real fault label.
As mentioned above, we build an IEEE 39-bus standard
D. Framework for Fault Detection and Classification system simulation model based on PSCAD/EMTDC. To obtain
independent and identically distributed samples, we configure
The overall framework of fault detection and classification
three types of fault parameters, i.e. different fault types, fault
approach based on GCN proposed in this paper is shown in
positions, and fault resistances as shown in Table I. The fault
Fig. 7, which includes: (i) Construct transmission line model
inception time is 1.0 s and fault duration is 0.1 s. The sampling
based on PSCAD/EMTDC simulation software in order to
frequency in PSCAD is set as 4 kHz. That is to say, 400 fault
generate massive fault data samples through Python scripts. (ii)
sampling values will be generated in the fault period of 0.1 s.
Build a graph classification model, including graph structure
We take 80 out of the 400 sampling values at equal intervals
construction and parameter settings of the neural network. (iii)
as nodal features. The transmission system used in this paper
Visualize the results. The three steps communicate and inte-
has 34 transmission lines, and each line has 10 fault points.
grate through Python API, and finally achieve the integration
Therefore, we get 5 (fault types) × 34 (lines) × 10 (positions)
goal of simulation, classification and analysis.
× 7 (resistances)= 11900 samples, which are divided into
training set and testing set according to the ratio of 7:3. We
IV. C ASE S TUDY consider these 11900 samples as standard fault data.
A. Transmission System studied and Data Acquisition
TABLE I
To obtain massive labeled transient fault samples, we intend FAULT PARAMETERS U SED FOR S IMULATION
to build a simulated power transmission system with reference Fault parameter Values or types
to the IEEE 39-bus standard test system. Considering that Single-phase-G, two-phase,
Fault type
the generation of fault samples needs an excellent transient Two-phase-G, three-phase, non-faulty
simulation environment, PSCAD/EMTDC is chosen as the 1 2 10
Fault position , ,..., of length of lines
10 10 10
simulation software. Fig. 6 shows the electrical single-line
Fault resistance (Ω) 0.01, 1, 10, 20, 30, 40, 50
diagram of the 39-bus system. The IEEE 39-bus standard test
system consists of 10 generators, 12 three-phase transformers,
34 transmission lines and some loads. 2) GCN Parameters
This task requires sufficient training data, so we generate a The hyperparameter selection of GCN is depicted in Ta-
series of labeled sample data with independent and identical ble II. As reference [26] shows, the hidden layer number in a
distribution by setting up different fault locations, fault types graph convolution network is usually set to 2 or 3. There is a
and fault impedances. We leverage “mrhc-automation” library problem of excessive smoothing in deep graph convolutional
(PSCAD-Python interface library) to realize batch simulations, network [23], which can be simply explained as the features
thereby avoiding repetitive manual operations. of each node tend to be homogeneous. Therefore, we finally
462 CSEE JOURNAL OF POWER AND ENERGY SYSTEMS, VOL. 7, NO. 3, MAY 2021
Fault simulation
begin
Generator Line
parameter parameter
Transmission data input
system
Transformer Load
parameter parameter
fault
simulation
fault
working
Single-phase-G
conditions
Two-phase
Two-phase-G Fault data
Three-phase-G tyoes normalization
Non-faulty
Topology
Training set information Testing set
TABLE II
H YPERPARAMETERS OF GCN adjustment of the learning rate.
Hyperparameter Values or types C. Performance of the Proposed Method with Standard Data
GCN layer Three layers (150, 300, 150 neurons)
Loss function Cross-entropy loss To validate the overall performance of our proposed method,
Optimizer Adam we will demonstrate the detection and classification effect
of the proposed method from three aspects: classification
performance in various situations, response speed, robust-
choose a model with 3 hidden layers by testing and comparing ness. Performance comparison with common machine learning
the effects of different layers. The numbers of hidden neurons methods are indispensable. In this section, we first test the
are determined to be 150, 300 and 150 respectively through classification performance of our method with data obtained
constant tuning and optimization [27] And Relu is selected in a standard system.
as the activation function of each layer. When propagating to For clarity, the accuracies and recall rates of the proposed
the last hidden layer, dimension of the nodal features becomes method for the classification of five types are calculated and
150, while the number of nodes remains 39. Thus the input depicted in Table III. The overall classification accuracy is
size of softmax classifier is 39 × 150 = 5850. For supervised 98.28%, and the classification accuracy for each type is higher
multi-classification problems, we usually choose the cross- than 97.4%. This result shows that our proposed method is
entropy error as the cost function because it can be used to capable of classifying faults with quite high accuracies.
calculate the loss through a simple derivative and has a fast rate
TABLE III
of convergence [28]. The calculation formula is as follows: C LASSIFICATION R ESULTS OF THE P ROPOSED M ETHOD FOR D IFFERENT
C
FAULT T YPES
X
CE(p, q) = − pi log (qi ) (9) Fault type Accuracy (%) Recall (%)
Single-phase-G 98.18 98.18
i=1
Two-phase 98.04 98.04
where C represents the number of categories, pi is the true Two-phase-G 97.76 97.76
Three-phase-G 97.48 97.48
value and qi is the predicted value. Non-faulty 99.93 99.93
Adam algorithm is chosen as the optimizer owing to its (Average) 98.28 98.28
fast convergence speed, high learning efficiency and small
memory requirement. It is exceedingly suitable for processing Further, the classification performance of the proposed
large data set pairs and has great processing capacity for method is compared with that of the common machine learning
sparse data and data with noise samples [29]. In our test, algorithms including support vector machine (SVM), decision
Adam performs better than other optimizers such as Stochastic tree (DT), K nearest neighbor algorithm (KNN), random
Gradient Descent (SGD) and Batch Gradient Descent (BGD). forest (RF), linear regression (LR), naive bayes algorithm
Therefore, Adam is finally selected to realize the automatic (NB), fully connected network (FCN) and convolutional neural
TONG et al.: DETECTION AND CLASSIFICATION OF TRANSMISSION LINE TRANSIENT FAULTS BASED ON GRAPH CONVOLUTIONAL NEURAL NETWORK 463
network (CNN). The first six methods belong to the traditional is the closest to ideal classification goal (AUC = 0.9994).
classification algorithm, while the latter two are the neural This result shows that the proposed method not only has high
network end-to-end classification algorithm. To compare the classification accuracy and recall rate, but also has remarkable
performance of various classifiers more comprehensively, we comprehensive performance.
use receiver operating characteristic (ROC) curve to measure
the classification effect, which is a comprehensive index that D. Performance of the Proposed Method With Renewable
can best reflect the overall performance of a classifier in Energy Generation Integration
classification problems [30]. The horizontal axis of ROC curve In order to simulate the operation of a real transmission grid,
represents false positive rate (FPR), while the vertical axis more environmental factors need to be considered. New energy
represents true positive rate (TPR). Formulas of the two are power generation has increasingly become a hot spot in the
as follows: industry. In view of the fact that more and more renewable
Σn FPi energy power generations are connected to power grid, we
FPR = Pn i=1 (10) add the renewable energy module to IEEE 39-bus system to
i=1 (FPi + TNi )
Pn simulate this situation.
TPi
TPR = Pn i=1 (11) We introduce a wind turbine into the IEEE 39-bus system
i=1 (TPi + FNi ) to simulate the renewable energy generation integration, as
where n is the number of fault types, T/F means true or shown in Fig. 9.
false, P/N means positive or negative, and TPi /FPi denotes
the TP/FP of the ith type. So FPR represents the proportion G G
of real negative samples with redect to all negative samples 30 37
in positive-predicted samples. Similarly, TPR represents the 25 26 28 29
proportion of real positive samples with redect to all positive 2 27
samples in positive-predicted samples. By setting different 38
1
thresholds for softmax output, we get different (FPR, TPR) wind G
G 3 18 17
points which constitute the ROC curve. One of the great
advantages of the ROC curve is that when the distribution 39 16 21
of positive and negative samples changes, the curve’s shape
remains basically unchanged. Therefore, this evaluation index
can not only reduce the interference brought by different G
testing sets, but also measure the performance of a model more 4 14 24 36
objectively. Further, the area under curve (AUC) is calculated
in order to quantify classification performance. The results are 5 13 23
6
shown in Fig. 8. 9 12 19
7 11 20 22
ROC curve comparison of different methods
10
1.0 8
31 32 34 33 35
G G G G G
0.8
Fig. 9. Topology of IEEE39 transmission system with a wind turbine.
True positive rate
0.6
The selected wind turbine is a PSCAD-based calculation
svm_average ROC (area=0.9484)
model [31], and we connect it to the No. 3 bus of the IEEE
0.4 rf_average ROC (area=0.9657) 39-bus system. We set the fault parameters as before, and get
knn_average ROC (area=0.9779)
lr_average ROC (area=0.9608) 3500 samples for testing the generalizability of the trained
nb_average ROC (area=0.9260) model. Fig. 10(a) and (b) represent the voltage waveforms of
0.2 DT_average ROC (area=0.8092)
FCN_average ROC (area=0.9916) partial nodes under the two-phase short circuit fault. It can be
CNN_average ROC (area=0.9918)
GCN_average ROC (area=0.9994) seen that the characteristics of the fault data before and after
0.0 the wind turbine integration are apparently different.
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate The classification accuracies of five types of fault data with
the wind turbine are shown in Table IV below. We can see that
Fig. 8. Comparison of ROC curves. when the characteristics of fault data become complicated due
to the wind turbine integration, the well-trained model can still
According to the definitions of TPR and FPR, the ideal goal identify the fault with an averaged accuracy of 97.68%.
should be TPR=1 and FPR = 0. Moreover, the AUC of the In addition, we depict the loss and accuracy curves of the
ideal goal is 1.0. In other words, the closer a ROC curve is training set and the testing set to verify that the model is
to the point (0, 1), the better the classification performance less susceptible to over-fitting. According to the curves shown
will be. We can tell from Fig. 8 that the ROC curve of GCN Fig. 11(a), the loss of the testing set does not increase and
464 CSEE JOURNAL OF POWER AND ENERGY SYSTEMS, VOL. 7, NO. 3, MAY 2021
Voltage (p.u.)
Voltage (p.u.)
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
Feature number (time dimension) Feature number (time dimension)
(a) Standard system (b) System with the wind turbine
Fig. 10. Fault waveforms before and after the wind turbine integration.
TABLE IV
C LASSIFICATION ACCURACY (%) OF THE P ROPOSED M ETHOD IN THE 3) Data loss is simulated by arbitrary discarding sampling
P RESENCE OF W IND T URBINE G ENERATOR points and is set as 1% of the total sampling data.
Fault type Standard system with wind turbine We add the three types of bad data to the original sample
Single-phase-G 97.48 set, and get 11900 new samples which are still divided into
Two-phase 97.20
Two-phase-G 97.20
training set and testing set at a ratio of 7:3.
Three-phase-G 96.64 Figures 12 and 13 represent the voltage waveforms of 39
Non-faulty 99.86 nodes before and after adding bad data under the single-phase
(Average) 97.68
ground fault. It is obvious that the waveform of fault data after
adding bad data is more complicated. As can be seen from
Table V, the averaged detection and classification accuracy of
remains very low in the later stage of training convergence, fault samples with bad data is still up to 96.71%. Results of
indicating that the model is not subjected to over-fitting [32]. the testing set indicate that the proposed approach has good
ability of bad data tolerance.
E. Performance of the Proposed Method with Bad Data
Data measurement and acquisition usually brings lots of bad TABLE V
C LASSIFICATION ACCURACY (%) OF THE P ROPOSED M ETHOD IN THE
data in the real power grid. Therefore, we add some bad data P RESENCE OF BAD DATA
to the standard fault data to further test the performance of Fault type Standard fault samples Fault samples with bad data
the model. Single-phase-G 98.18 96.86
Three types of bad data are considered in our paper: Two-phase 98.04 96.78
Two-phase-G 97.76 95.24
1) Inaccurate measuring is simulated by multiplying stan- Three-phase-G 97.48 96.64
dard measurements with a random number ranging from 0.75 Non-faulty 99.93 98.04
to 1.25 and is set as 1% of the total sampling data. (Average) 98.28 96.71
2) Asynchronous sampling is simulated by selecting 5% of
all PMUs and randomly moving the measurements forward or Similarly, the loss and accuracy curves of training and
backward n sampling values. (n ∈ [1, 5], n ∈ Z). testing are depicted in Fig. 14(a) and (b). We can see that
1.25 70
Loss
1.00 60
0.75 50
0.50 40
Standard_Train_Acc
0.25 30 Wind_Test_Acc
0.00 20
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Epoch Epoch
(a) Curves of loss (b) Curves of accuracy
Loss
Voltage Wavefotrm of 39 nodes with bad data 1.00
1.4
Sad data 0.75
1.2 Standard data 0.50
1.0 0.25
Voltage (p.u.)
0.00
0.8 0 5 10 15 20 25 30 35 40
Epoch
0.6 (a) Curves of loss
Training and Testing Accuracy
0.4 100
0.2 90
80
Accuracy (%)
0.0
0 500 1000 1500 2000 2500 3000 70
Numbering of voltage sampling values 60
2 27 97
38
1
3 18 17 G 94
G
Accuracy (%)
39 16 21
91
15
G 88
4 14 24 36
85 0.5 kHZ
2 kHZ
13 23 4 kHZ
5 10 kHZ
6 82
9 12 19 accuracy=97%
7 11 20 22 0.0000 0.0125 0.0250 0.0375 0.0500 0.0750 0.1000
Time after the fault (s)
10
8
31 32 34 33 35 Fig. 17. Response speed for fault identification at different sampling
G G G G G frequencies, “0” point on the x-axis represents the moment of the fault
occurrence.
activation function. Further, dropout [37] mechanism is also model could still reach over 96%, which is a quite encouraging
used in the CNN classifier. The evaluation results are depicted result. In order to illustrate the marvelous robustness of our
in Fig. 18. method more convincingly, data waveforms under various
scales of noise are shown in Fig. 19. The waveform curves
Anti-noise performance of methods of five colors in the subfigure respectively represent the data
of five categories. Subfigures show that the raw data becomes
99
very chaotic when the SNR is 25 dB, not to mention 15 dB.
96 Moreover, this is only the voltage waveform of a single node
of the transmission system. If the voltages of all nodes in
93 the whole system are considered, the task will be much more
Accuracy (%)
0.8 0.8
Voltage (p.u.)
Voltage (p.u.)
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Sampling number (episode) Sampling number (episode)
(a) SNR=15dB (b) SNR=25dB
0.8
Voltage (p.u.)
0.6
0.4
0.2
0.0
0 10 20 30 40 50 60 70 80
Sampling number (episode)
(c) SNR=35dB
Firstly, high impedance faults mainly occur in distribution fault conditions (including high impedance fault conditions
networks (15 kV–25 kV), and power transmission systems with and etc.), we need to combine the proposed method with
higher voltage level have a low probability of occurrence of the traditional protection theory to form a complete fault
high impedance fault [38]. For instance, in the case of high identification system [38]. The advantage of our method lies
voltage levels, the grounding medium may be broken down on that we undertake the role of data analysis when the amount
when a high impedance ground fault occurs and then the high of data in the power grid is huge, so that a concise and clear
impedance fault will become a low impedance fault. conclusion can be drawn from the overall analysis.
Secondly, the high impedance fault cannot be identified only In order to determine the impedance boundary that our
through the features of nodal voltage. method can identify the five types of faults, we added extra
As shown in Fig. 20(a) and (b), the nodal voltage waveform experiments, the results of which are depicted in Fig. 21.
of single phase ground fault with resistance of 1 ohm is very At total of 3500 samples are simulated and tested (700 sam-
similar to that of three-phase short-circuit fault with resistance ples for each fault resistance). It can be seen from Fig. 21 that
of 100 ohm. And when the fault resistance is 300 ohm, the the highest fault impedance of the sample that our detection
nodal voltage waveform of three phase short circuit fault and classification model can classify with an accuracy rate of
even tends to the normal operating condition, as depicted not less than 95% is 55 ohm. In addition, our detection and
in Fig. 20(c). The above conditions make it very difficult classification model can identify fault samples with a fault
for only data-driven methods to accurately identify the fault impedance of 63 ohm under the condition that the accuracy
types. In general, high impedance problems require many rate is not less than 90%.
effective features such as the functional relationship between 2) Validity of Adjacency Matrix
fault resistance and voltage variation before we can use these In this paper, we use the GCN-based method to detect and
features to realize big data level identification. classify power system transient faults and its main advantage
Thirdly, the proposed approach starts from the perspective is the explicit extraction of spatio-temporal relations between
of mass data processing in the power grid. In fact, if we data. Further, we add comparative experiments to verify the
want to realize the detection and classification of various effective role of spatial information on fault detection and
Nodal voltage waveform (single phase fault—1 ohm) Nodal voltage waveform (three phase fault—100 ohm)
1.2 1.2
1.1 1.1
1.0 1.0
Voltage (p.u.)
Voltage (p.u.)
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160
Sampling number (episode) Sampling number (episode)
(a) single_waveform_1ohm (b) three_waveform_100ohm
Nodal voltage waveform (three phase fault—300 ohm)
1.2
1.1
1.0
Voltage (p.u.)
0.9
0.8
0.7
0.6
0.5
0.4
0 20 40 60 80 100 120 140 160
Sampling number (episode)
(c) three_waveform_300ohm
TABLE VII
C LASSIFICATION ACCURACIES (%) OF GCN BASED ON D IFFERENT M ATRICES
Matrix type Only added in the first layer Added in the first two layers Added in the first three layers
Gaussion matrix 96.47 96.18 95.29
Uniform matrix 85.88 72.83 72.55
All-ones matrix 75.91 75.07 73.95
Identity matrix 96.76 96.76 96.76
Unweighted adjacency matrix 97.87 97.65 98.23
Weighted adjacency matrix 97.76 97.76 98.28
classification tasks. The above experiments prove that explicitly extracting the
In essence, the difference between GCN and general neural spatio-temporal relations between nodal data helps to improve
networks lies on the adjacency matrix which is used to the accuracy of transient fault detection and classification, and
represent topological information. Therefore, we replace the the adjacency matrix is the key factor.
weighted adjacency matrix A with different matrices (with the
same dimension as A) in the GCN framework, and retrain the
V. C ONCLUSION
model to compare the detection and classification results. The
results are shown in Table VII. This paper presents a novel method for the detection and
classification of power transient faults. Considering electric
Impedance boundary experiment
100 power data is a kind of spatio-temporal data, we regard the
transmission line topology as a graph, so as to construct a
95 graph classification model. Firstly, we propose a method for
defining nodes and edge weights in the power grid topology.
Accuracy (%)
Dongxia Zhang received the M.S. degree in Elec- Qi Ding is currently pursuing the master degree
trical Engineering from the Taiyuan University of from Shanghai Jiao Tong University. His current
Technology, Taiyuan, Shanxi, China, in 1992 and research interests include model compression and
the Ph.D. degree in Electrical Engineering from machine learning application on Smart Grid.
Tsinghua University, Beijing, China, in 1999. From
1992 to 1995, she was a Lecturer with Taiyuan
University of Technology. Since 1999, she has been
working at China Electric Power Research Institute.
She is the co-author of four books, and more than 40
articles. Her research interests include power system
analysis and planning, big data and AI applications
in power systems. She is an Associate Editor of Proceedings of the CSEE.
Haosen Yang received the B.S degree from the Xin Shi (S’19) received the Ph.D. degree from the
South China University of Technology in 2017, and School of Electronics and Electrical Engineering,
the M.S degree in Shanghai Jiao Tong University, Shanghai Jiao Tong University, Shanghai, China. He
2019. He is pursuing the Ph.D. degree from the is now a Lecturer in the School of Control and
School of Electronics and Electrical Engineering, Computer Engineering, North China Electric Power
Shanghai Jiao Tong University. His research interests University, Beijing, China. His research interests in-
include voltage stability and state estimation of clude power system analysis, random matrix theory,
power grids, machine learning and data science. and machine learning.