Matlab
Matlab
Matlab
AbstractThe importance of the Dempster-Shafer theory (DST) for modeling and propagating uncertainty has grown in the recent past. An obstacle for wider application of this theory in industrial practice is the lack of software support for analysts. The few tools available depend on oating point arithmetic and do not consider the inherently interval-based nature of the DST to the full extent. Therefore, the obvious next step is to combine the DST ideas with those from interval arithmetic. An additional advantage of employing interval methods is the guarantee that the results obtained on a computer are mathematically correct. In this paper, we introduce a new veried DST implementation for M ATLAB based on the previously developed IPP T OOLBOX. It extends this software using interval arithmetic and simultaneously takes care of the rounding errors. After giving a short overview of the Dempster-Shafer theory and interval methods, we describe the main features of the new toolbox and show its potential using several examples.
Keywords: Dempster-Shafer, interval arithmetic, MATLAB, INTLAB. I. I NTRODUCTION The experience of the last decades shows that while the design process in many application elds becomes shorter due to time-to-market pressure, the requirements on numerical accuracy and performance grow stricter. However, engineers lack precise knowledge regarding the process and its input data in early design stages. Therefore, to assess how reliable a system is, they have to deal with uncertainty. That is the reason why methods to propagate uncertainties through the system gain more and more importance. The overall imprecision in the outcome can be specied by providing upper and lower bounds on all possible results using interval or other veried methods. As further options, probability theory, the Dempster-Shafer theory or the Bayes theory can be used. In this paper, we concentrate on the use of the Dempster-Shafer theory (DST), the signicance of which for modeling and propagating uncertainty has grown recently [1]. However, the few existing DST implementations [1], [2], [3] rely on oating point arithmetic and do not exploit to the full extent the inherently interval based nature of the theory. A method is called veried if it guarantees the correctness of its output. In this context, interval arithmetic [4] is a widely used approach to verifying results obtained on a computer. It provides a (multidimensional) box described in terms of oating point arithmetic which is guaranteed to contain the exact result. Besides allowing for uncertainty in parameters,
interval arithmetic helps to generate more realistic mathematical models or to take into account measurement errors. In this paper, we describe a new DST implementation that makes use of the advantages of interval methods. The DSI T OOLBOX (Dempster-Shafer with intervals) is a M ATLAB package and based on the software IPP T OOLBOX [1] available in M ATLAB [5] and R [6]. The new implementation is designed to dene, aggregate and evaluate precise DempsterShafer structures by using directed rounding and veried methods made accessible in M ATLAB by the INTLAB library [7]. This paper is structured as follows. First, we give a brief introduction to the fundamentals of the DST and the interval methods. In the next section, we describe the main features of the DSI T OOLBOX and discuss several basic usage examples after a short overview of the IPP and INTLAB libraries. Over the course of section IV, we exemplify the applicability of the new software using tasks from fault tree analysis and non-monotonous uncertainty propagation. We conclude by recapitulating the main results and providing a perspective for future research. II. F UNDAMENTALS In this section, we describe the fundamentals of the Dempster-Shafer theory and interval methods briey. A. The Dempster-Shafer Theory The Dempster-Shafer theory [8] allows us to combine evidence from different experts or other sources and provides a measure of condence that a given event occurs. A special feature of this theory is the possibility to characterize uncertainties arising because of the lack of knowledge as discrete probability assignments associated with the power set of values X . Due to the presence of imprecision, it is only possible to compute a lower and an upper bound (belief and plausibility) of the probability of a subset of X . The DST equivalent of a random variable is the belief variable, which is characterized by its basic probability assignment (BPA) m. If A1 , . . . , An are the sets of interest where each Ai 2X , then
n
m : 2X [0, 1],
i=1
m(Ai ) = 1,
m() = 0.
(1)
In the continuous case, we restrict the sets Ai to intervals of the form ([x, x]), x x, where x denotes the lower bound or inmum and x the upper bound or supremum, for computational simplicity. Such intervals can be considered as evidence by an expert, for example, [10,20] hours lifetime for a sensor. The restriction in (1) concerning the sum of masses shows the necessity to normalize real life evidence because experts tend to provide BPAs for which it does not hold. The plausibility (worst case) and belief (best case) functions can be dened with the help of the BPAs for all i = 1 . . . n and Y X as
III. DSI T OOLBOX T HE D EMPSTER -S HAFER T HEORY WITH I NTERVAL A RITHMETIC The main focus of this section is the newly implemented DSI T OOLBOX which combines the DST approach with rigorous interval methods. This toolbox is implemented in M ATLAB and uses INTLAB for basic interval functionalities. After a short overview of the IPP T OOLBOX from which the DSI software originated, we outline the functionalities of INTLAB used by the DSI and nally describe the new toolbox in detail and with usage examples. A. IPP T OOLBOX
P L(Y ) :=
A i Y =
m(Ai ),
BEL(Y ) :=
Ai Y
m(Ai ). (2)
Every element A with a mass unequal zero is known as a focal element. All focal elements including their masses are called a random set. If two or more experts provide different estimations in the same areas, the BPAs have to be aggregated. There exist several methods for this purpose [8], of which Dempsters rule and mixing based on arithmetic averaging are those used in this paper. B. Interval arithmetic Interval arithmetic ( [4] and the references therein) is a well developed eld of mathematics with applications in many areas of engineering, medical science, (bio)mechanics and others. It belongs to the group of the veried methods, that is, methods that guarantee the correctness of the outcome of a simulation using mathematically exact proofs. An interval [x, x], where x is the lower, x the upper bound, is dened as [x, x] = {x R|x x x}. For any operation = {+, , , /} and intervals [x, x], [y, y], the corresponding interval operation can be dened as [x, x] [y, y ] = [min(x y, x y, x y, x y ), max(x y, x y, x y, x y)]. It can be shown that the result of an interval operation is also an interval. Every possible combination x y with x [x; x] and y [y ; y] lies inside this interval. (For division of intervals, usually 0 / [y, y] is assumed.) To be able to work with this denition on a computer using a nite precision arithmetic, the concept of machine intervals is necessary. They are represented by oating point numbers for the lower and upper bounds. To obtain the corresponding machine interval for the real interval [x, x], the lower bound is rounded down () to the largest representable machine number equal or less than x, and the upper bound is rounded up () to the smallest machine number equal or greater than x. These notions can be extended to dene interval vectors and matrices. There is a number of software libraries implementing this theory in different programming languages such as C++ or F ORTRAN and computer algebra packages such as M APLE or M ATLAB.
The Imprecise Probability Propagation (IPP) Toolbox [1] is a collection of methods for uncertainty quantication and propagation in the framework of the Dempster-Shafer theory and imprecise probabilities. This library uses oating point arithmetic as a computational basis. It is available as a MATLAB version [5] and an extended R package [6]. The IPP T OOLBOX contains a broad range of methods for practical application of the DST such as construction of belief functions from bounds on distributions, computation of empirical BPAs from data, evaluation of BPA ts using Kolmogorov-Smirnov tests, various aggregation methods, propagation through arbitrary monotonous and non-monotonous system functions and computation of statistical properties. The toolbox was applied in several case studies, such as [9], [10]. The feature we would like to concentrate on is the propagation of uncertainties through non-monotonous system functions. This could be a major obstacle for using DST in practice. In most applications, the analysts either employ a Monte Carlo sampling approach or simply propagate all focal elements of (small) discrete BPAs through the system. In each case, a large amount of intervals need to be propagated through the system function F (x). In the conventional DST, this is formally done by solving two optimization problems for each (multidimensional) interval [x, x]: [F , F ] = F ([x, x]) = [ min F (x), max F (x)].
x[x,x] x[x,x]
The amount of computing power required in this step varies widely with the complexity of the function representing the system model. While monotonous functions do not increase it substantially, complex, non-monotonous ones render the propagation task very time consuming. The IPP supports three different propagation algorithms: Monotonous: A fast algorithm for monotonous functions (increasing/decreasing/mixed). Only the two point values F (x), F (x) have to be evaluated because of the monotony property. Regular: An approximation algorithm that evaluates the function value in all corners of the focal element. The assumption is, that the extrema of F ([x, x]) are located on the boundary of [x, x]. Optimization: An algorithm compatible with arbitrary nonmonotonous functions, which uses gradient descent optimization to propagate focal elements.
In case of non-monotonous functions, the regular algorithm is fast but provides only a coarse approximation of the true propagation result, whereas the optimization algorithm can be very time consuming, as one gradient descent run is carried out for each interval propagated. In section IV-B, we show that such computations can be performed more effectively using interval methods in the newly developed DSI-T OOLBOX. B. INTLAB Interval Laboratory INTLAB [7] is a M ATLAB library implementing the basics of (multiple precision) interval arithmetic for real and complex numbers (cf. section II-B) along with providing linear algebra methods for intervals. Besides, it features automatic differentiation up to the second order, rigorous real and complex interval standard functions, accurate summation, dot product and matrix-vector residuals. We can use either the inmum-supremum notation [x, x] x xx for intervals. Point or midpoint-radius notation x+ 2 , 2 intervals in INTLAB can be dened by using the function intval(x), where x is a real number. A mathematically correct interval enclosure of the real number x that is not simultaneously a oating point number is given by intval(x). By specifying the number as a string, we can ensure that the result is enclosed between the greatest machine number smaller and the smallest machine number greater than x. One important feature of this library is directed rounding. With the help of the INTLAB function setround(par), it is possible to take inuence on the current rounding mode. If par is equal to one the rounding mode is set to positive innity (or upwards), for par=-1 to the negative innity (or downwards) and for par=0 the mode is set to roundto-nearest. This function can be used to generate an exact enclosure of a piece of evidence. INTLAB supports a number of interval standard functions such as sine or cosine. Using them, it is easy to evaluate an interval enclosure of an arbitrary function given as a combination of standard ones and interval operations { + , - , , / }. The current version of INTLAB is written completely in M ATLAB to assure the ability to run identical code on different machines. The requirement for such portability is that the considered architecture uses IEEE754 arithmetic and can switch the rounding mode permanently. It is important to keep in mind that interval vector and matrix operations are fast in INTLAB due to its extensive employment of BLAS (basic linear algebra subsystems) routines. Rump shows in [11] that the unrestricted use of the midpoint-radius interval notation and BLAS type three leads to very fast algorithms. However, loops and nonlinear tasks slow down the computations which is characteristic of M ATLAB as a whole. C. Main Features of the DSI-T OOLBOX The IPP T OOLBOX described in section III-A uses oating point arithmetic and therefore does not exploit to the full extent the inherently interval nature of the DST. With this software as a basis, we developed a new veried implementation called DSI T OOLBOX (Dempster Shafer with intervals) for M ATLAB
to work with rigorous DST structures that rely on interval calculus and directed rounding. DSI contains both functions from the IPP T OOLBOX, which were rewritten to take into account all rounding errors and adjusted to intervals, and newly designed functions. The main task of the new toolbox is to guarantee correctness of the solution. For that purpose, we take care of all rounding errors that might occur during the computation by enclosing real numbers in their corresponding machine intervals. Note that we do not take into account the modeling error present in DST or other probability based methods. A further goal is to provide enclosures with minimal overestimation, that is, with the minimum degree of conservativeness or pessimism. That is why we use sharp matrix multiplication provided by INTLAB which, however, needs slightly more CPU time. Using DSI, we can dene DST structures either directly by their focal elements with masses (routine dsistruct) or by cumulative distribution functions such as the triangular or the Weibull distribution (dsitriangleinv, dsiweibullinv). In the rst case, the masses of the resulting structure have to be normalized according to Eq. (1). The normalization is not trivial in our case because we represent masses as point or very tight intervals to provide veried solutions. However, it is not our goal at the moment to introduce interval masses with diameters considerably greater than a couple of ulps because it is difcult to think of a practical interpretation in this case. Our approach is as follows. If a random set consists of n focal elements A1 , . . . , An with interval masses m(Ai ) = [m(Ai ), m(Ai )], then the normalized masses mnew (Ai ) for i = 1 . . . n are computed as a hull of the two intervals
n
m(Ai ) /
j =1
m(Aj ) ,
m(Ai ) /
j =1
m(Aj ) .
To optimize the CPU time, we compute all steps using vectormatrix operations. Let us consider further functionalities of the DSI T OOLBOX using the following example. Two experts give estimations about a robot failure. The rst expert provides an assessment in form of a triangular distribution function. The important feature which the DSI T OOLBOX offers in this case is the possibility to dene it with an uncertain mode and lower/upper bounds. To obtain the corresponding BPA, the user has to specify the number of samples to be computed. For a subdivision consisting of oating point numbers, which means a subdivision of purely point intervals in our framework, we recommend using 2y samples with an arbitrary positive integer y . In Fig. 1, the solution space of the triangular distribution with lower bound [1, 2], upper bound [10, 12], mode [4, 6] and 212 samples is shown. This space lies between the belief and the plausibility function dened in Eq. (2). To compute plausibility, all upper bounds of the masses are sorted in the ascending order and stored as point intervals in a matrix. To compute function
Pl Bel 8 10 12
Figure 1. Triangular distribution with lb=[1, 2], ub=[10, 12] and mode=[4, 6]
values, we iterate from one to the length of the matrix adding the current element to the result of the last iteration. The solution array is sorted in the ascending order with the last element equal to one. The belief function can be computed in parallel to the plausibility by using the lower bounds. The second expert provides an assessment in form of a BPA directly. Using the DSI T OOLBOX, we dene the BPA by the routine dsistruct( [infsup(1,3),2/6;infsup(1.5,6),1/6; infsup(5,15),3/6]). Here, infsup(x,y) is the standard INTLAB function to dene an interval in inmum-supremum notation. In this example, the rst focal element [1, 3] has the mass 2/6, the second ([1.5, 6]) 1/6 and the third ([5, 15]) 3/6. In Fig. 2, the solution space of this BPA is shown.
To aggregate these two structures, we use Dempsters rule and mixing. DSI provides the routines dsidempstersrule(x) and dsiwmixing(x,y) where x is an array containing all BPAs (two in our example) and y is an array of corresponding weights. The main issue with these two functions is the CPU time. It is acceptable for the weighted mixing because only matrix multiplication is in use. In contrast, Dempsters rule needs to be computed for every element in the rst BPA giving a complexity of O(n2 ). In contrast to IPP T OOLBOX, we evaluate necessary intersections using matrices and the fast INTLAB function intersect(x,y) to accelerate computations. In Fig. 3, the results of the application of Dempsters rule and unweighted mixing for the two BPAs from our example are shown. The BPAs and their aggregation by Dempsters rule are computed in 6.932 seconds on an Intel Core 2 DUO @ 2.1 GHz platform with 2 GB RAM. The overall CPU time for computing the BPAs and their unweighted mixing is 0.0925 seconds only on the same platform. Fig. 3 shows in addition that Dempsters rule aggregates only for intersecting focal elements while the mixing takes into account all of them.
Pl: Dempsters Rule Bel: Dempsters Rule Pl: Mixing Bel: Mixing 10 15
Figure 3.
IV. E XAMPLES In this section, we rst show briey how fault tree analysis can be performed using the DSI T OOLBOX and then consider a simple example of uncertainty propagation through a non-monotonous system function. In each case, we compare obtained results with those from the IPP T OOLBOX. A. A Simple Example of Fault Tree Analysis
Pl Bel 2 4 6 8
units of interest
14
Figure 2.
The Dempster-Shafer theory can be used in fault tree analysis [9]. Consider a robot arm consisting of three rigid links L1 , L2 , L3 and three joints J1 , J2 , J3 shown in Fig. 4. Each joint is driven by an autonomous motor. At the end of the arm, a utility is mounted which can only be manipulated by modifying the angles of the three joints.
intersection which leads to information loss. After that, we obtain the belief and plausibility for the root element from the fault tree, that is the overall failure probability for the robot shown in Fig. 6.
Figure 4.
The distribution on the time to failure for each motor is assessed by three experts. The rst expert estimates it for J1 and J2 to be in intervals [700, 1400] and [1000, 1500], respectively. The second expert provides an opinion in form of cumulative distribution functions for motor two and three. The failure distribution for motor two follows a triangular distribution between 700 and 1200 hours, with an unsure mode of [800,900], for motor three the same distribution in the interval [900,1500] hours with mode [1300,1400]. The third expert estimates motor three only and states that its time to failure is between 1000 and 1200 hours of work with 80 percent condence and between 1250 and 1700 hours with 20 percent condence. We suppose that the robot is out of order only if all motors fail at the same time which corresponds to the fault tree in Fig. 5.
Figure 6.
The probability of a complete fault is zero for less than 900 hours of work. The robot fails with the certainty of 100 percent after 1400 hours. Note that the experts and their estimations are hypothetical in this example. The results are therefore not objective. The plausibility and belief values for failure probabilities between 900 and 1400 hours can be computed by the function dsigetprob(x). To provide a comparison between the IPP (the R version) and DSI toolboxes, we start a benchmark with 200, 1024 and 2048 samples. IPP supplies two sampling techniques called dsodf and dsadf [1], [13], the former of which is an outer discretization generating small and sharp intervals while the latter is more conservative and computes larger intervals. In Tab. I, we show results obtained with both methods.
Table I C OMPARISON OF CPU TIMES FOR THE FAULT TREE samples IPP (dsadf) IPP (dsodf) DSI 200 0.4641 s 0.5226 s 0.3368 s 1024 2.9110 s 4.0561 s 1.6899 s 2048 11.3253 s 11.3214 s 4.5552 s
Figure 5.
Each cell of the table contains the corresponding CPU time at the same platform as in section III. The DSI toolbox is faster for this example than the IPP. B. Uncertainty Propagation for Non-monotonous Functions In this example, we demonstrate the potential of using interval methods and the DSI T OOLBOX for propagation of uncertainties through non-monotonous system models. We consider the function sin x2 , which is simple, but highly nonmonotonous. The argument x follows a normal distribution
In an analogy to the denition from [12], we can compute the failure probability at the AND gate by using Dempsters rule. We aggregate the evidence for the motors with the help of the unweighted mixing because every expert has the same condence level. We prefer mixing to Dempsters rule in this case because the latter ignores focal elements without
with the uncertain lower and upper bounds lying in intervals [0, 0.1] and [99.9, 100], respectively. We use the Monte Carlo sampling approach to propagate this uncertainty through a hypothetical system described by sin x2 in DSI and IPP (the R version). In the latter toolbox, the monotonous and regular methods described in section III-A produce wrong results. The only possible propagation algorithm for IPP is therefore the optimization based one, which we used for the comparison. The results are shown in Tab. II and in Fig. 7 (for 1024 samples with dsodf sampling method in IPP). As demonstrated in the Table, the DSI T OOLBOX is considerably faster than the IPP. The reason for this is that the DSI exploits the ability of interval methods (and INTLAB in particular) to compute interval enclosures of functions over interval arguments directly rather via optimization problems. Directly means that enclosures of functions consisting of combinations of {+, , , /} and standard functions such as trigonometric ones as well as their compositions are evaluated by substituting their interval counterparts for them on an operation-by-operation basis (the so called natural interval extension). That is, the optimization problem for the complex original function does not have to be solved. For such interval extensions, the relation FOpt FN at always holds, where FOpt is the enclosure obtained via optimization, FN at the natural one.
Table II C OMPARISON OF CPU TIMES FOR THE NON - MONOTONOUS PROPAGATION samples IPP (dsadf) IPP (dsodf) DSI 100 0.2850 s 0.2870 s 0.0585 s 1000 2.7190 s 2.6920 s 0.8237 s 1000 47.518 s 48.407 s 8.1557 s
0
solution space of sin(x )
2
0.5
Figure 7.
Our future work will consist of combining the veried DST with the fault tree analysis and Markov chains. R EFERENCES
[1] P. Limbourg, Imprecise probabilities for predicting dependability of mechatronic systems in early design stages, Ph.D. dissertation, University of Dusburg-Essen, 2007. [2] A. Martin, Implementing general belief function framework with a practical codication for low complexity, in Advances and Applications of DSmT for. American Research Press, 2009, vol. Collected Works, Vol. 3, pp. 217273. [3] G. Nassreddine, F. Abdallah, and T. Denoeux, A state estimation method for multiple model systems using belief function theory, in Information Fusion, 2009. 12th International Conference on Information Fusion (FUSION 09), 2009, pp. 506513. [4] E. Moore, B. Kearfott, and M. Cloud, Introduction to Interval Analysis. Society for Industrial Mathematics, 2009, vol. 1. [5] The mathworks deutschland - matlab - the language of technical computing, 2009. [Online]. Available: http://www.mathworks.de/ products/matlab/ [6] The r project for statistical computing. [Online]. Available: http://www.r-project.org/ [7] S. Rump, Intlab - interval laboratory, Developments in Reliable Computing, pp. 77104, 1999. [8] S. Ferson, V. Kreinovich, L. Ginzburg, D. Myers, and K. Sentz, Constructing probability boxes and dempster-shafer structures, no. SAND2002-4015, 2003. [9] P. Limbourg, R. Savic, J. Petersen, and H. D. Kochs, Modelling uncertainty in fault tree analyses using evidence theory, Journal of Risk and Reliability, vol. 222, pp. 291302, 2008. [10] E. de Rocquigny, N. Devictor, and S. Tarantola, Uncertainty in Industrial Practice - A guide to quantitative uncertainty management. United Kingdom: John Wiley & Sons, 2008, vol. 1. [11] S. Rump, Fast and parallel interval arithmetic, BIT, vol. 39(3), pp. 539560, 1999. [Online]. Available: http://www.ti3.tu-harburg.de/paper/ rump/Ru99b.pdf [12] H. Traczinski, Integration von algorithmen und datentypen zur validierten mehrk orpersimulation in mobile, Ph.D. dissertation, Universit at Duisburg-Essen, 2006. [13] F. Tonon, Using random set theory to propagate epistemic uncertainty through a mechanical system, Reliability Engineering and System Safety, vol. 85, no. 1-3, pp. 169181, 2004.
Note that FOpt obtained by a oating point based method is not proved to contain all possible solutions whereas FN at is. However, FN at might become too conservative if the function is evaluated for interval arguments with large widths (cf. Fig. 7). The results obtained with IPP lie inside the veried bounds provided by DSI. In this case, the reason is not only the conservativeness of enclosures because the arguments are tight intervals. The difference might also result from inaccuracies of the oating point based algorithm. V. C ONCLUSIONS
AND
O UTLOOK
In this paper, we presented a new toolbox for M ATLAB implementing a combination of the Dempster-Shafer theory with rigorous interval arithmetic. With its help, it was possible to work with DST structures in a natural way and take into account all rounding errors. We demonstrated the applicability of the box using several examples and shown that it worked faster than the oating point based IPP T OOLBOX from which it originated. Besides, we demonstrated that it was more advantageous to use DSI for uncertainty propagation through systems described by non-monotonous functions, both in terms of CPU time and correctness of results.