Efficient Identification of The Pareto Optimal Set
Efficient Identification of The Pareto Optimal Set
Australia
April 2014
1 Introduction
Many real-world optimization applications in engineering involve problems where
analytical expression of the objective function is unavailable. Such problems usu-
ally require either an underlying numerical model or expensive experiments to
be conducted. In an optimization setting, where objective functions are evalu-
ated repeatedly, evaluation demands may result in unaffordably high cost for
obtaining solutions. Therefore, the number of function evaluations is limited by
available resources. Consequently, the solution of such global optimization prob-
lems is challenging because many global optimization methods require a large
number of function evaluations.
This task becomes even more difficult in the case of multiple conflicting
objectives, where there is no single optimal solution optimizing all objective
functions simultaneously. Rather there exists a set of solutions representing the
best possible trade-offs among the objectives — the so-called Pareto optimal
solutions forming a Pareto optimal set. Unreasonably high evaluation costs could
also prevent designers from comprehensively exploring the decision space and
learning about possible trade-offs. In these cases, it is essential to find reliable
and efficient methods for estimating the Pareto optimal set within a limited
number of objective function evaluations.
Recently, researchers have developed methods to solve expensive problems by
exploiting knowledge acquired during the solution process [1]. Knowledge of past
2
The Pareto active learning (PAL) method for predicting the Pareto optimal
set at low cost has been proposed in [10]. Like ParEGO, it employs a Gaussian
process (GP) model to predict objective function values. PAL classifies all sam-
pled decision vectors as Pareto optimal or not based on the predicted objective
function values. The classification accuracy is controlled by a parameter defined
by the user which enables a trade-off between evaluation cost and predictive
accuracy.
2 Background
vectors and their labels “nondominated” and “dominated” are used to train a
classifier, for example, support vector machines or a Naive Bayes classifier. After
a classifier is trained, unevaluated decision vectors comprising a set XU are given
as an input to the classifier to predict their labels. The classifier provides prob-
ability p for each decision vector x that it belongs to the nondominated class.
If the probability of nondominance p is not lower than a predefined probability
pnond , the decision vector is included in a predicted Pareto optimal set P . Then
a new decision vector xe is selected for evaluation. The implemented selection
strategy in this paper is simply to select a decision vector xe whose probability
of nondominance p is closest to a predefined value pnext . Next, the evaluated
set E and its labels are updated and used to retrain the classifier. The method
continues until the number of evaluations reaches a predefined limit n.
If the sampling of the design space is sufficiently dense, and the number of
evaluations n is sufficiently large, then EPIC should provide a good approx-
imation of the Pareto optimal set. The efficiency of EPIC is determined by
the classification quality and the strategy for determining which decision vector
should be evaluated next.
It is clear that the next vector to be evaluated should be selected based on the
maximal information gain or uncertainty reduction which would help to improve
classification. We assume that evaluating a decision vector near to the boundary
that separates the two classes is most likely to provide more information and
improve classifier performance. An obvious choice is to set pnext = 0.5. However,
further thought suggests that alternative values of pnext may be preferred. The
initial set used to train the classifier is a small subset of the sampled decision
space and the vectors labelled “nondominated” might be mis-classified since they
are assigned with respect to the evaluated vectors. Therefore it may be better to
first evaluate decision vectors that are most likely nondominated (i.e., pnext >
0.5) in order to get a better representation of the nondominated class. Later, it
5
4 Experimental Results
4.1 Experimental setup
To assess the performance of EPIC, we compare it to PAL and ParEGO. For
that we need to have some performance measures. One of the measures is based
on a hypervolume (HV) metric [13] that describes the spread of the solutions
over the Pareto optimal set in the objective space as well as the closeness of
the solutions to it. Moreover, HV is the only measure that reflects the Pareto
dominance [14]. That is, if one set entirely dominates another, the HV of the
former will be greater. As PAL and EPIC are based on classification, the quality
of prediction is also measured by the percentage of correctly classified decision
vectors. We calculated other metrics as well such as set coverage. However, they
are not so informative and we do not include them in this work.
The performance of the methods was measured at every iteration to assess the
progress obtained after each decision vector evaluation. We calculated the HV
metric of evaluated decision vectors for all the methods. For ease of comparison,
we considered the ratio between the HV obtained by each method and the HV of
the true Pareto optimal set. It should be noted the HV value calculated using the
evaluated vectors does not decrease with an increasing number of evaluations.
For the average HV metric calculation, when for some runs the PAL method
terminated earlier and did not used the maximum number of iterations, we
used the same nondominated set evaluated at the last iteration for the rest of
iterations.
The methods were tested on the following set of standard benchmark prob-
lems in multiobjective optimization with different features:
OKA2 [15]. This problem has two objective functions and three decision vari-
ables. Its true Pareto optimal set in the objective space is a spiral shaped
curve and the density of the Pareto optimal solutions in the objective space
is low. (The reference point is R = (6.00, 6.0483).)
Kursawe [16]. This problem has two objective functions and a scalable number
of decision variables. In our experiment, three decision variables were used.
Its Pareto optimal set in the decision space is disconnected and symmetric,
and disconnected and concave in the objective space. (The reference point
is R = (-3.8623, 25.5735).)
ZDT3 [17]. This has two objective functions and three decision variables. The
Pareto optimal set in the objective space consists of several noncontiguous
convex parts. However, there is no discontinuity in the decision space. (The
reference point R = (2.0000, 2.6206).)
Viennet [18]. This consists of three objective functions and two decision vari-
ables. This problem has not been solved with the PAL algorithm as we used
an implementation provided by the authors which is suitable only for prob-
lems with two objective functions. (The reference point is R = (9.249, 62.68,
1.1964).)
DTLZ4 [19]. This problem is scalable and has M objective functions and k +
M −1 of decision variables, where k = 10 as recommended by the authors. We
solved this problem consisting of 5, 6 and 7 objectives and, respectively, 14, 15
and 16 decision variables. (The reference points for the problem with 5, 6 and
7 objectives is R = (3.9324, 3.2452, 3.4945, 3.4114, 3.3022), R = (3.9943,
3.2319, 3.3666, 3.1851, 3.3236, 3.2196) and R = (3.7703, 3.3593, 3.3192,
3.3825, 3.4326, 3.2446, 3.3209), respectively.)
To classify decision vectors as dominated and nondominated we applied a
support vector machine (SVM) [20]. The basic idea of SVM classifiers is to
choose the hyperplane that has the maximum distance between itself and the
nearest example of each class [20, 21]. SVMs are computationally deficient clas-
sifiers and can deal with both linear and nonlinear as well as separable and
nonseparable problems. We used SVM with a radial basis function kernel allow-
ing to capture nonlinear relation between class labels and features by mapping
data to a higher dimensional space. The drawback of using SVMs is that they
7
OKA2
100
Correct classification, %
99
98
97
0 20 40 60 80 100 120 140 160
Iterations
PAL EPIC ParEGO
80
75
70
HV, %
65
60
55
0 20 40 60 80 100 120 140 160
Iterations
Kursawe
100
99
98.5
98
97.5
0 20 40 60 80 100 120 140 160
Iterations
PAL EPIC ParEGO
100
90
HV, %
80
70
60
0 20 40 60 80 100 120 140 160
Iterations
ZDT3
100
Correct classification, %
95
90
85
80
0 20 40 60 80 100 120 140 160
Iterations
PAL EPIC ParEGO
100
98
HV, %
96
94
92
0 20 40 60 80 100 120 140 160
Iterations
Viennet
100
Correct classification, % 95
90
85
80
0 20 40 60 80 100 120 140 160
Iterations
EPIC ParEGO
100
98
96
HV, %
94
92
90
0 20 40 60 80 100 120 140 160
Iterations
DTLZ4: obj = 5
82
Correct classification, %
81
80
79
78
77
0 10 20 30 40 50 60 70 80 90
Iterations
EPIC ParEGO
88
86
HV, %
84
82
0 10 20 30 40 50 60 70 80 90
Iterations
DTLZ4: obj = 6
65
Correct classification, %
60
55
50
0 10 20 30 40 50 60 70 80
Iterations
EPIC ParEGO
88
87
86
HV, %
85
84
83
0 10 20 30 40 50 60 70 80
Iterations
DTLZ4: obj = 7
52
Correct classification, %
51
50
49
48
47
0 10 20 30 40 50 60 70
Iterations
EPIC ParEGO
88
87
HV, %
86
85
84
0 10 20 30 40 50 60 70
Iterations
5.1 Conclusions
Possible future research includes developing a strategy for selecting more than
one decision vector at each iteration for evaluation. It might consider some clus-
tering approach to choose more diverse vectors in the objective space in order to
ensure a better approximation of the Pareto optimal set in the sense of uniform
distribution of the vectors in the objective space. Also, we need to explore the
influence of the probability values pnext and pnond used to select and classify
decision vectors, and to investigate how these can be chosen in an automatic
way based on knowledge about the problem at hand.
References