Deriving CDF of Kolmogorov-Smirnov Test Statistic: Jan Vrbik

Applied Mathematics, 2020, 11, 227-246
https://www.scirp.org/journal/am
ISSN Online: 2152-7393
ISSN Print: 2152-7385
Deriving CDF of Kolmogorov-Smirnov Test

Statistic
Jan Vrbik
Department of Mathematics and Statistics, Brock University, St. Catharines, Ontario, Canada
How to cite this paper: Vrbik, J. (2020) Abstract

Deriving CDF of Kolmogorov-Smirnov Test
Statistic. Applied Mathematics, 11, 227-246. In this review article, we revisit derivation of the cumulative density function
https://doi.org/10.4236/am.2020.113018 (CDF) of the test statistic of the one-sample Kolmogorov-Smirnov test. Even
though several such proofs already exist, they often leave out essential details
Received: November 25, 2019
necessary for proper understanding of the individual steps. Our goal is filling
Accepted: March 14, 2020
Published: March 17, 2020 in these gaps, to make our presentation accessible to advanced undergra-
duates. We also propose a simple formula capable of approximating the exact
Copyright © 2020 by author(s) and distribution to a sufficient accuracy for any practical sample size.
Scientific Research Publishing Inc.
This work is licensed under the Creative
Keywords
Commons Attribution International
License (CC BY 4.0). Kolmogorov-Smirnov Test, Asymptotic Distributions, Generating Functions,
http://creativecommons.org/licenses/by/4.0/ Jacobi Theta
Open Access
1. Introduction
The article’s goal is to present a comprehensive summary of deriving the distri-
bution of the usual Kolmogorov-Smirnov test statistic, both in its exact and ap-
proximate form. We concentrate on practical aspects of this exercise, meaning
that
• reaching a modest (three significant digit) accuracy is usually considered
quite adequate,
• computing critical and P-values of the test is the primary objective, implying
that it is the upper tail of the distribution which is most important,
• methods capable of producing practically instantaneous results are preferable
to those taking several seconds, minutes, or more,
• simple, easy to understand (and to code) techniques have a great conceptual
advantage over complex, black-box type algorithms.
This is the reason why our review excludes some existing results (however
deep and mathematically interesting they may be); we concentrate only on the
DOI: 10.4236/am.2020.113018 Mar. 17, 2020 227 Applied Mathematics

J. Vrbik
most relevant techniques (this is also the reason why our bibliography is delibe-
rately far from complete).
1.1. Test Statistic

The Kolmogorov-Smirnov one-sample test works like this: the null hypothesis
states that a random independent sample of size n has been drawn from a spe-
cific (including the value of each of its parameters, if any) continuous distribu-
tion. The test statistics (denoted Dn ) is the largest (in the limit-superior sense)
absolute-value difference between the corresponding empirical cumulative den-
sity function (CDF) and the theoretical CDF, denoted F ( x ) , of the hypothe-
sized distribution; the former is defined by
1 n
Fe ( x ) = ∑ I X <x
def
(1)
n i =1 i
where X 1 , X 2 ,, X n are the individual sample values and I X i < x is the usual
indicator function (equal to 1 when X i is smaller than x, equal to 0 otherwise).
1
Note that Fe ( x ) is a step function which starts at 0 and increases, by at
n
each X i , until it reaches the value of 1.
To complete the test, we need to know the CDF of Dn under the assumption
that the null hypothesis is correct. Deriving this CDF is a difficult task; there are
several exact techniques for doing that; in this article, we expound only the ma-
jor ones. We then derive the n → ∞ limit of the resulting distribution, to serve
as an approximation when n is relatively large. Since the accuracy of this limit is
not very impressive (unless n is extremely large), we show how to remove the
1 1
-proportional, -proportional, etc. error of this approximation, making it
n n
sufficiently accurate for samples of practically any size.
1.2. Transforming to  ( 0,1)
The first thing we do is to define
Ui = F ( X i ) (2)
def
where F ( x ) is the CDF of the hypothesized distribution; the U1 ,U 2 ,,U n

then constitute (under the null hypothesis) a random independent sample from
the uniform distribution over the ( 0,1) interval, the new theoretical CDF is
then simply F ( u ) = u . It is important to realize that doing this does not change
the vertical distances between the empirical and theoretical CDFs; it transforms
only the corresponding horizontal scale as Figure 1 and Figure 2 demonstrate
(the original sample is from Exponential distribution).
This implies that the resulting value of Dn (and consequently, its distribu-
tion) remains the same. We can then conveniently assume (from now on) that
our sample has been drawn from  ( 0,1) ; yet the results apply to any hypothe-
sized distribution.
DOI: 10.4236/am.2020.113018 228 Applied Mathematics

J. Vrbik
Figure 1. Both CDFs, before
Figure 2. and after transformation.
1.3. Discretization
In this article, we aim to find the CDF of Dn , namely
Pr ( Dn ≤ d ) (3)
1 2 n
only for a discrete set of n values of d, namely for d = , ,, , even though
n n n
 1 
Dn is a continuous random variable whose support is the  ,1 interval.
 2n 
This proves to be sufficient for any (but extremely small) n, since our discrete
results can be easily extended to all values of d by a sensible interpolation.
There are technique capable of yielding exact results for any value of d (see [1]
or [2]), but they have some of the disadvantages mentioned above and will not
be discussed here in any detail; nevertheless, for completeness, we present a Ma-
thematica code of Durbin’s algorithm in the Appendix.
2. Linear-Algebra Solution
This, and the next two section, are all based mainly on [3], later summarized by
[4].

J. Vrbik
We start by defining n + 1 integer-valued random variables
n ⋅ ( Fe ( d i ) − d i )
def
Ti = (4)
i
where d i = , i = 0,1,2,, n ; note that n ⋅ Fe ( d i ) equals the number of the
n
U i observations which are smaller than d i , also note that T0 and Tn are
always identically equal to 0. We can then show that
Claim 1. Dn > d j if and only if at least one of the Ti values is equal to j or
–j.
Proof. When Ti = j , then there is a value of d to the left of d i such that
j
Fe ( d ) − d > j , implying that Dn > ; similarly, when Ti = − j then there is a
n
value of d to the right of d i such that Fe ( d ) − d < − j , implying the same.
To prove the reverse, we must first realize that no one-step decrease in the
T0 , T1 ,, Tn sequence can be bigger than 1 (this happens when there are no ob-
servations between the corresponding d i and d i +1 ); this implies that the T se-
quence must always pass through all integers between the smallest and the larg-
est value ever reached by T.
Since n ⋅ Dn > j implies that either n ⋅ ( Fe ( d ) − d ) has exceeded the value
of j at some d, or it has reached a value smaller than −j, it then follows that at
least one Ti has to be equal to either j or −j respectively. ■
2.1. Total-Probability Formula

Now, consider the sample space of all possible (integer) values of T1 , T2 ,, Tn −1 ,
and a fixed integer J between 1 and n − 1 inclusive (we use the capital font to
emphasize J’s special role in all subsequent formulas). If Ti is the first of the
T1 , T2 ,, Tn −1 random variables to reach the value of either J or –J, we denote
the corresponding event A i and Bi respectively ( C means that none of the
Tis have ever reached either J or −J); A1 , A 2 ,, A n −1 , B1 , B 2 ,, B n −1 , C then
constitute a partition of this sample space.
By a routine application of the formula of total probability, we can write, for
any k between 1 and n − J ( Tk = J cannot happen for any other T)
n −1 n −1
Pr (Tk =
J)=
∑ Pr ( Ai ) ⋅ Pr (Tk =
J | A i ) + ∑ Pr ( Bi ) ⋅ Pr (Tk =
J | Bi )
=i 1 =i 1 (5)
+ Pr ( C ) ⋅ Pr (Tk =
J | C)
We know that, given C , Tk = J could not have happened. Similarly, given

A i (given Bi ), Tk = J cannot happen any earlier than at k ≥ i ( k > i ). And
finally, Pr ( Bi ) is equal to 0 when i < J (we need at least J steps to reach
Ti = − J from T0 = 0 ). We can thus simplify (5) to read
k k −1
Pr (Tk =
J)=
∑ Pr ( Ai ) ⋅ Pr (Tk =
J | A i ) + ∑ Pr ( Bi ) ⋅ Pr (Tk =
J | Bi ) (6)
=i 1 =i J
where 1 ≤ k ≤ n − J , with the understanding that an empty sum (lower limit

exceeding the upper limit) equals to 0.

J. Vrbik
From (4) it is obvious that Tk = J is equivalent to having (exactly to be un-

k
derstood from now on) k + J observations smaller than .The correspond-
n
ing probability is the same as that of getting k + J successes in a binomial-type
k
experiment with n trials and a single-success probability of ; we will denote it
n
k
nk + J   .
n
Similarly, Tk = J | A i has the same probability as = Tk J= | Ti J (earlier
values of T becoming irrelevant), which means that, out of the remaining
n − i − J observations, k − i must be in the ( d i , d k ) interval; this probability
k −i
is equal to nk −− ii − J  .
n −i
Finally, Pr (Tk = J | Bi ) = Pr (Tk = J | Ti = − J ) , which means that, out of the
remaining n − i + J observations, k − i + 2 J must be in the ( di , d k ) interval;
k −i
this probability equals to in−−Ji + J  .
n −i
2.2. Resulting Equations

We can thus simplify (6) to
k k  k − i  k −1 n −i + J  k − i 
nk + J   = ∑ Pr ( A i ) ⋅ nk −− ii − J   + ∑ Pr ( Bi ) ⋅ k − i + 2 J   (7)
=  n  i 1= n −i i J n −i
(with 1 ≤ k ≤ n − J ), where the  coefficients are readily computable. This
constitutes n − J linear equations for the unknown values of
Pr ( A1 ) , Pr ( A 2 ) ,, Pr ( A n − J ) , Pr ( B J ) , Pr ( B J +1 ) ,, Pr ( B n −1 ) .
By the same kind of reasoning we can show that, for any k between J and
n −1
k −2J k
∑ Pr ( Ai ) ⋅ Pr (Tk =
−J ) =
Pr (Tk = − J | Bi ) (8)
− J | A i ) + ∑ Pr ( Bi ) ⋅ Pr (Tk =
=i 1 =i J
(note that the T sequence needs at least 2J steps to reach -J at Tk from J at Ti ),

leading to
 k  k −2J k −i k n −i + J  k − i 
nk − J   = ∑ Pr ( A i ) ⋅ nk −− ii −− 2J J   + ∑ Pr ( Bi ) ⋅ k − i   (9)
=   i 1=
n  n − i  i J n −i
when J ≤ k ≤ n .
Combining (7) and (9), we end up with the total of 2 ( n − J ) linear equa-
tions for the same number of unknowns. Furthermore, these equations have a
“doubly triangular” form, meaning that proceeding in the right order, i.e.
Pr ( A1 ) , Pr ( B J ) , Pr ( A 2 ) , Pr ( B J +1 ) , , we are always solving only for a single
unknown (this is made obvious by the next Mathematica code).
Having found the solution, we can then compute (based on Claim 1)
n−J n −1
Pr ( Dn >=
dJ ) ∑ Pr ( Ai ) + ∑ Pr ( Bi ) (10)
=i 1 =i J

J. Vrbik
which yields a single value of the desired CDF (or rather, of its complement) of
Dn . To get the full (at least in the discretized sense) picture of the distribution,
the procedure now needs to be repeated for each possible value of J.
The whole algorithm can be summarized by the following Mathematica code
(note that instead of superscripts, interpreted by Mathematica as powers, we
have to use “overscripts”).
(for improved efficiency, we use only the relevant range of J values).

The program takes over one minute to execute; the results are displayed in
Figure 3.
We can easily interpolate values of the corresponding table to convert it into a
continuous function, thereby finding any desired value to a sufficient accuracy.
The main problem with this algorithm lies in its execution time, which in-
creases (like most matrix-based computation) with roughly the third power of n.
This makes the current approach rather prohibitive when dealing with samples
consisting of thousands of observations.
In this context it is fair to mention that none of our programs have been op-
timized for run-time efficiency; even though some improvement in this regard is
definitely possible, we do not believe that it would substantially change our gen-
eral conclusions.
Figure 3. Pr(D300 > d).

J. Vrbik
3. Generating-Function Solution
We now present an alternate way of building the same (discretized, but other-
wise exact) solution. We start by defining the following function of two integer
arguments
def ii + j
pij = (11)
( i + j )!
Note that, when i + j is negative (i is always positive), pij is equal to 0.
k
Claim 2. The binomial probability in   can be expressed in terms of
n
three such p functions, as follows
m−k
 k  p ⋅p
k
in   = i − k mn − i − m + k (12)
m pn − m
Proof.
k i (m − k )
n −i
⋅
i! ( n − i )!
n −i
n!  k   m − k 
i
n k 
=i   =    (13)
 m  i!( n − i )!  m   m  mn
n!
■
Note that  has the value of 0 whenever the number of successes (the sub-
script) is either negative or bigger than n (the superscript). Similarly, 00 is al-
ways equal to 1.
3.1. Modified Equations

The new function (11) enables us to express (7) and (9) in the following manner:
pkJ ⋅ pn−−J k k
p0k − i ⋅ pn−−J k k −1 pk2 −J i ⋅ pn−−J k
= ∑ Pr ( A ) ⋅ + ∑ Pr ( B ) ⋅ (14)
pn−−J i pnJ − i
i i
= p0n i 1= i J
and
pk− J ⋅ pnJ − k k −1 pk−2−Ji ⋅ pnJ − k k
p0k − i ⋅ pnJ − k
= ∑ Pr ( A ) ⋅ + ∑ Pr ( B ) ⋅ (15)
pn−−J i pnJ − i
i i
= p0n i 1= i J
respectively.
Cancelling pn−−J k in each term of (14) and multiplying by p0n yields
k −1
k
pn pn
=pkJ ∑ pn0−i Pr ( A i ) ⋅ p0k −i + ∑ pn0−i Pr ( Bi ) ⋅ pk2 −J i (16)
=i 1 =
−J i J J
which can be written as

k k −1
pkJ = ∑ ai ⋅ p0k −i + ∑ bi ⋅ pk2 −J i (17)
=i 1 =i J
(for any positive integer k), by defining

p0n
Pr ( A i )
def
ai = (18)
pn−−J i

J. Vrbik
and
p0n
Pr ( Bi )
def
bi = (19)
pnJ − i
Note that n has disappeared from (17), making ai and bi potentially infi-
nite sequences (consider letting n have any positive value; in that sense ai is
well defined for any i from 1 to ∞ and bi for any i from J to ∞ ). Once we
solve for these two sequences, converting them back to Pr ( A i ) and Pr ( Bi )
for any specific value of n is a simple task; this approach thus effectively deals
with all n at the same time!
Similarly modifying (15) results in
k −1 k
pk− J = ∑ ai ⋅ pk−2−Ji + ∑ bi ⋅ p0k −i (20)
=i 1 =i J
(for any k > J ), utilizing the previous definition of ai and bi . The equations,
together with (17), constitute an infinite set of linear equations for elements of
the two sequences. To find the corresponding solution, we reach for a different
mathematical tool.
3.2. Generating Functions

Let us introduce the following generating functions
∞
a (t ) ∑ ak ⋅ t k
def
G= (21)
k =1
b (t ) ∑ bk ⋅ t k
def
G=
k =1
∞
G j ( t ) =δ j ,0 + ∑ pkj ⋅ t k
def
k =1
where j is a non-negative integer, and δ j ,0 (Kronecker’s δ ) is equal to 1 when

j = 0 , equal to 0 otherwise.
Multiplying (17) by t k and summing over k from 1 to ∞ yields
GJ ( t ) = Ga ( t ) ⋅ G0 ( t ) + Gb ( t ) ⋅ G2 J ( t ) (Gj)
since ∑ i =1 ai ⋅ p0k − i is the coefficient of t k in the expansion of Ga ( t ) ⋅ G0 ( t ) ,

k
and ∑ i = J bi ⋅ pk2 −J i is the coefficient of t k in the expansion of Gb ( t ) ⋅ G2 J ( t ) ;

k −1
combining two sequences in this manner is called their convolution. Note the
importance (for correctness of the Ga ⋅ G0 result) of including δ j ,0 in the de-
finition of G0 ( t ) .
Similarly, it follows from (20) that
G− J ( t ) = Ga ( t ) ⋅ G−2 J ( t ) + Gb ( t ) ⋅ G0 ( t ) (22)
3.3. Resulting Solution

The last two (simple, linear) equations can be so easily solved for Ga ( t ) and
Gb ( t ) that we do not even quote the answer.
Going back to a specific sample size n, we now need to find the value of (10),

J. Vrbik
namely
∑ +∑
n −1 n −i n −1
−J a ⋅p bi ⋅ pnJ − i
=i 1 =i
n
i 1
(23)
0 p
which follows from solving (18) and (19) for Pr ( A i ) and Pr ( Bi ) respectively.
The numerator of the last expression is clearly (by the same convolution argu-
ment) the coefficient of t n in the expansion of
Ga ( t ) ⋅ G− J ( t ) + Gb ( t ) ⋅ GJ ( t ) (24)
An important point is that, in actual computation, the G functions need to be

expanded only up to and including the t n term, making them long but other-
wise simple polynomials.
The algorithm to find Pr ( Dn > d J ) then requires us to build G0 ( t ) , GJ ( t ) ,
G− J ( t ) , G2 J ( t ) and G−2 J ( t ) , and Taylor-expand, up to the same t n term,
2G0 ( t ) GJ ( t ) G− J ( t ) − G− J ( t ) G2 J ( t ) − GJ ( t ) G−2 J ( t )
2 2
GD ( t ) =
def
(25)
(G ( t )
0
2
)
− G2 J ( t ) G−2 J ( t ) ⋅ p0n
which is obtained by substituting the solution to (Gj) and (22) into (24), and
further dividing by p0n ; Pr ( Dn > d J ) is then provided by the resulting coeffi-
cient of t n .
Note that, based on the same expansion, we can get Pr ( Dn > d J ) for any
smaller n as well, just by correspondingly replacing the value of p0n . Neverthe-
less, the process still needs to be repeated with all relevant values of J.
The corresponding Mathematica code looks as follows:
It produces results identical to those of the matrix-algebra algorithm, but has

several advantages: the coding is somehow easier, it (almost) automatically yields
results for any n ≤ 300 (not a part of our code) and it executes faster (taking
about 17 seconds). Nevertheless, its run-time still increases with roughly the
third power of n, thus preventing us from using it with a much larger value of n.
We now proceed to find several approximate solutions of increasing accuracy,
all based on (25).
4. Asymptotic Solution
As we have seen, neither of the previous two solutions is very practical (and ul-
timately not even feasible) as the sample size increases. In that case, we have to
switch to using an approximate (also referred to as asymptotic) solution.

J. Vrbik
Large-n Formulas
First, we must replace the old definition of pij , namely (11), by
def i i + j ⋅ e−i
pij = (26)
( i + j )!
Note that this does not affect (12), nor any of the subsequent formulas up to and
including (25), since the various e − i factors always cancel out.
Also note that the definition can be easily extended to real (not just integer)
arguments by using Γ ( i + j + 1) in place of ( i + j )! , where Γ denotes the
usual gamma function.
1) Laplace representation
Note that, from now on, the summations defining the G functions in (21) stay
infinite (no longer truncated to the first n terms only).
Consider a (rather general) generating function
∞
(t ) ∑ pk ⋅ t k
def
=
G (27)
k =0
and an integer n ( p may be implicitly a function of n as well as k); our goal is to

find an approximation for p n as n increases.
After replacing k and t with two new variables x and s, thus
k= n ⋅ x (28)
 s
=t exp  − 
 n
(
G e− s n ) becomes
∞
∑ p x ⋅n exp ( − s ⋅ x ) (29)
x =0
1
in steps of
n
1
Making the assumption that expanding p x ⋅n in powers of results in
n
q( x)  1  q( x)
=
p x ⋅n + O 3 2   (30)
n n  n
(and our results do have this property), then (29) is approximately equal to
∞
1
⋅ ∑ q ( x ) exp ( − s ⋅ x ) +  (31)
n x =0
1
in steps of
n
which, in the n → ∞ limit, yields the following (large-n) approximation to

G ( e− s n ) :
∞
L(s) ∫0 q ( x ) exp ( − s ⋅ x ) dx
def
= (32)
Note that L ( s ) is the so-called Laplace transform of q ( x ) ; we call it the

Laplace representation of G.
q (1)
To find an approximate value of the coefficient of t n (i.e. p n  ) of (27),
n

J. Vrbik
we need to find the so-called inverse Laplace transform (ILT) of L ( s ) yielding

the corresponding q ( x ) then substitute 1 for x and divide by n (this is the gist
of the technique of this section).
To improve this approximation, q ( x ) itself and consequently L ( s ) can be
1
expanded in further powers of (done eventually; but currently we concen-
n
trate on the n → ∞ limit).
2) Approximating Gj
Let us now find Laplace representation of our G j , i.e. the last line of (21),
further divided by n (this is necessary to meet (30), yet it does not change
(25) as long as p0n of that formula is divided by n as well). To find the cor-
responding q ( x ) , we need the n → ∞ limit of
(n ⋅ x ) exp ( −n ⋅ x ) n
n⋅ x + j
pkj
n⋅ = (33)
n ( n ⋅ x + j )!
To be able to reach a finite answer, j itself needs to be replaced by z n ; note
that doing that with our J changes Pr ( Dn > d J ) to Pr n ⋅ Dn > z . ( )
It happens to be easier to take the limit of the natural logarithm of (33),
namely
( x ⋅ n + z n ) ln ( x ⋅ n ) − x ⋅ n + 12 ln n − ln ( x ⋅ n + z n )! (34)
instead.
With the help of the following version of Stirling’s formula (ignore its last
term for the time being)
1 1
ln ( m!)  m ln m − m − ln m + ln 2 π + + (35)
2 12m
and of (we do not need the last two terms as yet)
( )
ln x ⋅ n + z n  ln ( x ⋅ n ) +
x n
z
−
z2
+
z3
−
z4
2 x 2 n 3 x 3n 3 2 4 x 4 n 2
+ (36)
we get (this kind of tedious algebra is usually delegated to a computer)

z2
ln q ( x )  − − ln 2 πx +  (37)
2x
We thus end up with
 z2 
Gj e ( −s n
) 1 ∞
exp  −
 2x
− x ⋅ s
 dx =
exp − 2 z 2 s ( )
n

n →∞
→ ∫
2π 0 x 2s
(38)
j
where z = ; this follows from (32) and the following result:
n
Claim 3.
 v 
exp  − − x ⋅ s 
π
Iv =
def ∞
∫0
 x
x
 dx =
s
(
⋅ exp −2 v ⋅ s ) (39)

J. Vrbik
when v and s are positive

Proof. Since
 v 
exp  − − x ⋅ s 
dI v ∞  x  dx
dv ∫0
= (40)
x3 2
and
 v
exp  − s ⋅ y − 
 ∞ y v v dI
∫0
Iv =
v
⋅
s ⋅ y2
dy =⋅ v
s dv
(41)
s⋅ y
v
after the x = substitution. Solving the resulting simple differential equa-
s⋅ y
tion for I v yields
Iv = (
c ⋅ exp 2 v ⋅ s ) (42)
where c is equal to
∞ exp ( − x ⋅ s ) ∞ (
exp −u 2 ⋅ s ) π
=I0 ∫0 =
x
dx ∫0 u
=⋅ 2udu
s
(43)
the last being a well-known integral (related to Normal distribution). ■

To find the n → ∞ limit of (25), we first evaluate the right hand side of (38)
with j = −2 J , − J ,0, J and 2J, getting
(
G0 e − s n ) → 1
(44)
n →∞
n 2s
(
GJ e − s n
=
)
G− J e − s n
 →
( )
exp − z 2 s ( )
n →∞
n n 2s
(
G2 J e − s n
=
)
G−2 J e − s n
 →
( )
exp −2 z 2 s ( )
n →∞
n n 2s
J
where z = (always positive).
n
3) Approximating GD
The corresponding Laplace representation of (25) further divided by n, let us
denote it L D / n ( s ) , is then equal to
E E2
2⋅ − 2⋅
2s 2s 2 s 2 s =2 ⋅ E ⋅ 2 π =2⋅
2π ∞
⋅ ∑ ( −1) E k
k −1
(45)
1− E  1
2
1 + E 2s 2 s k =1
 ⋅
 2 s  2π
where=
def
( )
E exp −2 z 2 s . This is based on substituting the right-hand sides of
(44) into (25), and on the following result:
p0n n ne− n n 1
=
lim ⋅ n lim= (46)
n →∞ n n →∞ n! 2π

J. Vrbik
(Stirling’s formula again); the last limit also makes it clear why we had to divide
(25) by n: to ensure getting a finite result again.
We now need to find the q D / n ( x ) function corresponding to (45), i.e. the
q D / n (1)
latter’s ILT, and convert it to p n = according to (30); this yields an ap-
n
proximation for the coefficient of t n in the expansion of (25), still divided by n.
q (1)
(
The ultimate answer to Pr nDn > z is thus D / n
n
) ⋅n = q D / n (1) .
Since the ILT of

π k π
s
⋅E =
s
(
⋅ exp −2kz 2 s ) (47)
(where k is a positive integer) is equal to

 2 z 2k 2 
exp  − 
 x 
(48)
x
(this follows from (32) and (38), after replacing z by z ⋅ k ), its contribution to
q D / n (1) is
(
exp −2z 2 k 2 ) (49)
Applied to the last line of (45), this leads to

Pr ( nDn > z 
n →∞ )
→ 20 ( z ) (50)
or, equivalently,
Pr ( )
nDn ≤ z  1 − 20 ( z ) (51)
where
∞
0 ( z ) =
∑ ( −1)
k −1
( )
def
exp −2 z 2 k 2 (52)
k =1
 1 
Note that the error of this approximation is of the O   type, which
 n
1
means that it decreases, roughly (since there are also terms proportional to ,
n
1 1
32
, etc.), with . Also note that the right hand side of (51) can be easily
n n
evaluated by calling a special function readily available (under various names) with
most symbolic programming languages, for example “JacobiTheta4(0, exp(−2∙z2))”
of Maple or “EllipticTheta[4, 0, Exp[−2z2]]” of Mathematica.
The last formula has several advantages over the approach of the previous two
sections: firstly, it is easy and practically instantaneous to evaluate (the infinite
series converges rather quickly only between 2 and 10 terms are required to
reach a sufficient accuracy when 0.3 < z the CDF is practically zero otherwise),
secondly, it is automatically a continuous function of z (no need to interpolate),
and finally, it provides an approximate distribution of nDn for all values of n

J. Vrbik
(the larger the n, the better the approximation).

But a big disappointment is the formula’s accuracy, becoming adequate only
when the sample size n reaches thousands of observations; for smaller samples,
an improvement is clearly necessary. To demonstrate this, we have computed
the difference between the exact and approximate CDF when n = 300 ; see Fig-
ure 4, which is in agreement with a similar graph of [2].
We can see that the maximum possible error of the approximation is over 1.5%
(when computing the probability of D300 > 0.046 ); errors of this size are gener-
ally not considered acceptable.
5. High-Accuracy Solution
Results of this section were obtained (in a slightly different form, and building
on previously published results) by [5] and further expounded by a more accessible
1
[6]; their method is based on expanding (in powers of ) the matrix-algebra
n
solution. Here we present an alternate approach, similarly expanding the gene-
rating-function solution instead; this appears an easier way of deriving the indi-
1 1
vidual and -proportional corrections to (50). We should mention that
n n
1
the cited articles include the 3 2 -proportional correction as well; it would not
n
be difficult to extend our results in the same manner, if deemed beneficial.
To improve accuracy of our previous asymptotic solution, (34) and, conse-
1 1
quently, (38) have to be extended by extra and -proportional terms
n n
(note that (35) and (36) were already presented in this extended form), getting
(
G j e− s n )
n
 z2   z 3 − 3z ⋅ x z 6 − 12 z 4 x + 27 z 2 x 2 − 6 x 3 
exp  − − s ⋅ x  ⋅ 1 + 2
+ 4
+ 
∫
∞  2x   6x n 72 x n  dx
0
2π ⋅ x
=
(
exp − 2 z 2 s ) ⋅ 1 + s ⋅ z  2s
+
z 2 s 2 − 3 2 z 2 s 3 + 3s 
+ 
2s  3 n 18n 
  (53)
j def
where the  sign corresponds to a positive (negative) z = , respectively.
n
The corresponding tedius algebra is usually delegated to a computer (it is no
longer feasible to show all the details here), the necessary integrals are found by
differentiating each side of the equation in (38) with respect to z2, from one up
to four times.
The last expression represents an excellent approximation to the G functions
of (44), with the exception of
∞
 ks 
( ) ∑p
def
G0 e − s n = k
0 ⋅ exp  −  (54)
k =0  n

J. Vrbik
Figure 4. Error of asymptotic solution (n = 300).
which now requires a different approach.

Claim 4.
(
G0 e − s n ) 1
+
1
+
2s
+ (55)
n 2 s 3 n 12n
Proof. The following elegant proof has been suggested by [7].
It is well known that the value of Lambert W ( z ) function is defined as a so-
lution to we w = z , and that its Taylor expansion is given by
( −k )
k −1
∞
∑ k!
zk (56)
k =1
implying that
∞
k k − k (1+ λ )
∑ k! e
k =0
d
1 + W −e − λ −1
=
dλ
( ) (57)
■
Differentiating
we w = −e − λ −1 (58)
dw
with respect to λ , cancelling e w , and solving for yields
dλ
dw w
= − (59)
dλ 1+ w
implying that
∞
kk − k (1+ λ ) 1 def 1
∑ k! e = =
1+ w u
(60)
k =0
where u (being equal to 1 + w ) is now the solution of

( u − 1) eu =
−e − λ (61)
rather than (58). Solving the last equation for λ and expanding the answer in

J. Vrbik
powers of u results in
u2 u3 u4
λ= + + + (62)
2 3 4
Inverting the last power series (which can be easily done to any number of terms)
yields the following expansion:
2λ ( 2λ )
32
u= 2λ − + + (63)
3n 36
1 s
Similarly expanding , replacing λ by and further dividing by n
u n
proves our claim.
Having achieved more accurate approximation for all our G functions, and
with the following extension of (46)
n ne− n n 1  1 
 ⋅ 1 − +  (64)
n! 2 π  12n 
we can now complete the corresponding refinement of (45) by substituting all

these expansions into (25), further divided by n. This results in
2 E+ 1 2π E 1
L D / n ( s )  2π ⋅ ⋅ + ⋅ ⋅
1 + E+ 2 s n 6 (1 + E ) 2 s
(65)
2π  E E  2π z⋅E
+ ⋅ −  2s − ⋅ ⋅ 2s + 
n  9 (1 + E ) 18 (1 − E )  n 9 (1 + E )2
The last formula consists of two types of corrections: replacing E by

def   1  
E+ = exp  −2  z +  2s  (66)
  6 n 
1
in its leading term removes the -proportional error of (45); the remaining
n
1
terms similarly represent the -proportional correction; the error of (65) is
n
 1 
thus of the O  3 2  type.
n 
Note that
 2s s 
E+  E  1 + + +   (67)
 3 n 9n 
enables us to express (65) in terms of E only; this is needed for its explicit verifi-
cation (something we leave to a computer).
What we must do now is to convert (65) to the corresponding q D / n (1) , thus
approximating the coefficient of t n in the expansion of (25). We already pos-
sess the answer for the first two terms of (65), which are both identical to (45),
 1 
except that 0 ( z ) needs to be replaced by 0  z +  in the first case, and
 6 n
divided by 12 in the second one.

J. Vrbik
To convert the remaining terms of (65) to their q D / n (1) contribution, we

must first expand them in powers of E, then take the ILT of individual terms of
these expansions, and finally set x equal to 1; the following table helps with the
last two steps:
(68)
(the first row has already been proven; the remaining three follow by differen-
tiating both of its sides with respect to zk (taken as a single variable), up to three
times).
This results in the following replacement
∞
E
⋅ 2 s → ∑ ( −1) e −2 z k 4k 2 z 2 − 1
k −1
( )
2 2
2π ⋅
1+ E k =1
∞
E
(
⋅ 2 s → ∑ e −2 z k 4k 2 z 2 − 1 )
2 2
2π ⋅ (69)
1− E k =1
∞
 −2  −2 z 2 k 2
2π ⋅
zE
⋅ 2s → z ∑  e (8k 3 z 3 − 6kz )
(1 + E ) k =1  k − 1 
2
where all three series are still fast-converging. Note that the binomial coefficient
( −1)
k −1
of the last sum equals to k.
We can then present our final answer for Pr ( nDn > z ) in the manner of
the following Mathematica code; the resulting KS function can then compute
(practically instantaneously) this probability for any n and z.
The resulting improvement in accuracy over the previous, asymptotic ap-

proximation is quite dramatic; Figure 5 again displays the difference between
the exact and approximate CDF of D300 .
This time, the maximum error has been reduced to an impressive 0.0036%,
this happens when computing Pr ( 0.027 < D300 < 0.0475) ; note that potential
errors become substantially smaller in the right hand tail (the critical part) of the
distribution. Most importantly, when the same computation is repeated with
n = 10 , the corresponding graph indicates that errors of the new approximation

J. Vrbik
Figure 5. Error of high-accuracy solution (n = 300).
can never exceed 0.20%; such an accuracy would be normally considered quite
adequate (approximating Student’s t30 by Normal distribution can yield an er-
ror almost as large as 1%).
As mentioned already, the approximation of Pr ( nDn > z ) can be made
even more accurate by adding, to the current expansion, the following extra
n − 3 2 -proportional correction
z ∞
+
27n 3 2 k =1
k −1
(
∑ ( −1) exp −2 z 2k 2 )
(70)
 107 k   4  78 
× k  k 2 +
2
+ 3 ( −1)  ⋅  1 − k 2 z 2  − + 16k 4 z 4 
 5   3  5 
At n = 300 , this reduces the corresponding error by a factor of 4; nevertheless,

from a practical point of view, such high accuracy is hardly ever required. Fur-
thermore, the new term reduces the maximum error of the n = 10 result from
the previous 0.17% only to 0.10%; even though this represents an undisputable
improvement, it is achieved at the expense of increased complexity. Note that
1
adding higher ( 2 -proportional, etc.) terms of the expansion would no longer
n
(at n = 10 ) improve its accuracy, since the expansion starts diverging (a phe-
nomenon also observed with, and effectively inherited from, the Stirling expan-
sion); this happens quite early when n is small (and, when n is large, higher ac-
curacy is no longer needed).
When simplicity, speed of computation, and reasonable accuracy are desired
in a single formula, the next section presents a possible solution.
Final Simplification
1
We have already seen that the -proportional error is removed by the fol-
n
lowing trivial modification of (50)

J. Vrbik
Pr ( ) 
nDn ≤ z =1 − 20  z +

1 

6 n
(71)
Note that this amounts only to a slight shift of the whole curve to the left, but
1
leaves us with a full O   -type error.
n
When willing to compromise, [8] has taken this one step further: it is possible
to show that, by extending the argument of 0 to
1 z −1
z+ + (72)
6 n 4n
1
yields results which are very close to achieving the full -proportional correction
n
of (65) as well; this is a fortuitous empirical results which can be easily verified
computationally (when n = 10 , the maximum error of the last approximation
increases to 0.27%, for n = 300 it goes up to 0.0096% still practically negligi-
ble).
6. Conclusions and Summary

In this article, we hope to have met two goals:
• explaining, in every possible detail, the traditional derivations (two of them
yielding exact results, several of them being approximate) of the Dn distri-
bution,
• proposing the following simple modification of the commonly used formula:
  z −1 2 
2
( )
∞
1
nDn ≤ z  1 + 2 ∑ ( −1) exp  −2  z + + (73)
k
k 
4n 
Pr
  6 n 
k =1  
making it accurate enough to be used as a practical substitute for exact results
even with relatively small samples. Furthermore, the right hand side of this
formula can be easily evaluated by computer software (see the comment fol-
lowing (52)).
Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this
paper.
References
[1] Durbin, J. (1973) Distribution Theory for Tests Based on the Sample Distribution
Function. Society for Industrial and Applied Mathematics, Philadelphia.
https://doi.org/10.1137/1.9781611970586
[2] Marsaglia, G., Tsang, W.W. and Wang, J. (2003) Evaluating Kolmogorov’s Distri-
bution. Journal of Statistical Software, 8, 1-4. https://doi.org/10.18637/jss.v008.i18
[3] Feller, W. (1948) On the Kolmogorov-Smirnov Limit Theorems for Empirical Dis-
tributions. Annals of Mathematical Statistics, 19, 177-189.
https://doi.org/10.1214/aoms/1177730243
[4] Kendall, M.G. and Stuart, A. (1973) The Advanced Theory of Statistics. Vol. 2,

J. Vrbik
Hafner Publishing Company, New York, 468-477.

[5] Chang, L.C. (1956) On the Exact Distribution of the Statistics of A. N. Kolmogorov
and Their Asymptotic Expansion. Acta Mathematica Sinica, 6, 55-81.
[6] Pelz, W. and Good, I.J. (1976) Approximation the Lower Tail Areas of the Kolmo-
gorov-Smirnov One Sample Statistic. Journal of he Royal Statistical Society, Series
B, 38, 152-156. https://doi.org/10.1111/j.2517-6161.1976.tb01579.x
[7] https://math.stackexchange.com/q/3247174
[8] Vrbik, J. (2018) Small-Sample Corrections to Kolmogorov-Smirnov Test Statistic.
Pioneer Journal of Theoretical and Applied Statistics, 15, 15-23.
Appendix
The following Mathematica function computes the exact Pr ( Dn ≤ d ) for any
value of d; using it to produce a full graph of the corresponding CDF will work
only for a sample size not much bigger than 700, since the algorithm’s computa-
tional time increases exponentially with not only n, but also with increasing val-
ues of d.
Nevertheless, computing only a single value of this function (such as a P value

of an observed Dn ) becomes feasible even for a substantially bigger sample size;
for example: typing KS[3000, 0.031467] results in 0.994855, taking about 13
seconds on an average computer. Increasing n any further would necessitate
switching to one of the (at that point, extremely accurate) approximations of our
article.

Deriving CDF of Kolmogorov-Smirnov Test Statistic: Jan Vrbik

Uploaded by

Copyright:

Available Formats

Deriving CDF of Kolmogorov-Smirnov Test Statistic: Jan Vrbik

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deriving CDF of Kolmogorov-Smirnov Test Statistic: Jan Vrbik

Uploaded by

Copyright:

Available Formats

Applied Mathematics, 2020, 11, 227-246

Deriving CDF of Kolmogorov-Smirnov Test

How to cite this paper: Vrbik, J. (2020) Abstract

DOI: 10.4236/am.2020.113018 Mar. 17, 2020 227 Applied Mathematics

1.1. Test Statistic

1.2. Transforming to  ( 0,1)

The first thing we do is to define

where F ( x ) is the CDF of the hypothesized distribution; the U1 ,U 2 ,,U n

DOI: 10.4236/am.2020.113018 228 Applied Mathematics

Figure 1. Both CDFs, before

Figure 2. and after transformation.

DOI: 10.4236/am.2020.113018 229 Applied Mathematics

We start by defining n + 1 integer-valued random variables

2.1. Total-Probability Formula

We know that, given C , Tk = J could not have happened. Similarly, given

where 1 ≤ k ≤ n − J , with the understanding that an empty sum (lower limit

DOI: 10.4236/am.2020.113018 230 Applied Mathematics

From (4) it is obvious that Tk = J is equivalent to having (exactly to be un-

2.2. Resulting Equations

(note that the T sequence needs at least 2J steps to reach -J at Tk from J at Ti ),

DOI: 10.4236/am.2020.113018 231 Applied Mathematics

(for improved efficiency, we use only the relevant range of J values).

Figure 3. Pr(D300 > d).

DOI: 10.4236/am.2020.113018 232 Applied Mathematics

3.1. Modified Equations

which can be written as

(for any positive integer k), by defining

DOI: 10.4236/am.2020.113018 233 Applied Mathematics

3.2. Generating Functions

where j is a non-negative integer, and δ j ,0 (Kronecker’s δ ) is equal to 1 when

since ∑ i =1 ai ⋅ p0k − i is the coefficient of t k in the expansion of Ga ( t ) ⋅ G0 ( t ) ,

and ∑ i = J bi ⋅ pk2 −J i is the coefficient of t k in the expansion of Gb ( t ) ⋅ G2 J ( t ) ;

3.3. Resulting Solution

DOI: 10.4236/am.2020.113018 234 Applied Mathematics

An important point is that, in actual computation, the G functions need to be

It produces results identical to those of the matrix-algebra algorithm, but has

DOI: 10.4236/am.2020.113018 235 Applied Mathematics

and an integer n ( p may be implicitly a function of n as well as k); our goal is to

which, in the n → ∞ limit, yields the following (large-n) approximation to

Note that L ( s ) is the so-called Laplace transform of q ( x ) ; we call it the

DOI: 10.4236/am.2020.113018 236 Applied Mathematics

we need to find the so-called inverse Laplace transform (ILT) of L ( s ) yielding

we get (this kind of tedious algebra is usually delegated to a computer)

DOI: 10.4236/am.2020.113018 237 Applied Mathematics

when v and s are positive

the last being a well-known integral (related to Normal distribution). ■

DOI: 10.4236/am.2020.113018 238 Applied Mathematics

Since the ILT of

(where k is a positive integer) is equal to

Applied to the last line of (45), this leads to

DOI: 10.4236/am.2020.113018 239 Applied Mathematics

(the larger the n, the better the approximation).

DOI: 10.4236/am.2020.113018 240 Applied Mathematics

Figure 4. Error of asymptotic solution (n = 300).

which now requires a different approach.

where u (being equal to 1 + w ) is now the solution of

DOI: 10.4236/am.2020.113018 241 Applied Mathematics

we can now complete the corresponding refinement of (45) by substituting all

The last formula consists of two types of corrections: replacing E by