0% found this document useful (0 votes)
15 views

Boukeloua 2024 TWMS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Boukeloua 2024 TWMS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

TWMS J. App. and Eng. Math. V.14, N.4, 2024, pp.

1374-1389

TESTING LOCAL HYPOTHESES WITH DIFFERENT TYPES OF


INCOMPLETE DATA

M. BOUKELOUA1 , §

Abstract. In this work, we consider a general framework of incomplete data which


includes many types of censoring and truncation models. Under this framework and
assuming that the distribution of interest has a parametric form, we propose local tests
for simple and composite hypotheses on the parameter. These tests are based on the
φ−divergences, Wald and Rao statistics. We study the asymptotic behaviour of these
statistics under the null hypothesis. For the φ−divergences statistics, we study the as-
ymptotic behaviour under the alternative as well. This allows us to approximate the
power function of the proposed tests. We also propose local tests of homogeneity which
serve to compare the distributions of two samples. Finally, we present the results of an
application on real data.

Keywords: Local tests, censored data, truncated data, φ−divergences, tests of homo-
geneity.

AMS Subject Classification: 62F03, 62F05, 62N03.

1. Introduction
Hypothesis testing constitutes an essential issue in statistics. One of the most popular
types of the tests of hypothesis are the parametric tests of the simple null hypothesis
H0 : θT = θ0 , against the alternative H1 : θT 6= θ0 , where θ ∈ Θ is a parameter that
describes the distribution of the population, θT is the true value of θ and θ0 is a fixed
value in the parameter space Θ. Many tests of this types of hypotheses have been studied
in the literature such as Wald, Rao and the likelihood ratio tests. Recently, the theory of
φ−divergences between measures, introduced by [1], has been widely applied in statistics.
[2, 3] used this theory to study some parametric and semiparametric models. [4] and [5]
used it to study semiparametric copula models for complete and censored data, respec-
tively. Furthermore, [6] proposed φ−divergence tests of the hypothesis H0 against H1 .
All the tests we have cited above compare the two distributions characterized by θT and
1
Laboratoire de Génie des Procédés pour le Développement Durable et les Produits de Santé
(LGPDDPS)-Ecole Nationale Polytechnique de Constantine - Algeria.
Laboratoire de Biostatistique, Bioinformatique et Méthodologie Mathématique Appliquées aux
Sciences de la Santé (BIOSTIM)- Faculté de Médecine - Université Salah Boubnider Constantine 3,
Algeria.
e-mail: boukeloua.mohamed@gmail.com; ORCID: https://orcid.org/0000-0002-5522-498X.
§ Manuscript received: March 25, 2023; accepted: September 19, 2023.
TWMS Journal of Applied and Engineering Mathematics, Vol.14, No.4 © Işık University, Department
of Mathematics, 2024; all rights reserved.

1374
M. BOUKELOUA: TESTING LOCAL HYPOTHESES WITH INCOMPLETE DATA 1375

θ0 globally, i.e., on the whole support of the variable of interest. However, it happens in
certain situations that the two distributions are different but very close on a part of this
support. [7] give a real-life example of such a situation. For this kind of situations, the
conclusion of the test on the whole support of the distribution may differ if we focus on
a specific part of the support. Motivated by this fact, [8] introduced local φ−divergences
that allow to quantify the dispersion between two distributions only on a part of their
support. Using these local φ−divergences, [7] proposed local test statistics of the hypoth-
esis H0 against H1 . They gave the asymptotic distribution of these statistics under both
the null and the alternative hypotheses.
The tests we have discussed until now are based on complete observations of the variable
of interest. However, in the practice, some censoring and/or truncation phenomena may
prevent the observation of the variable of interest and provide only a partial information
about it. The presence of such phenomena affects considerably the statistical investigation
of the data. In the present paper, we consider a general framework of incomplete data
which includes some types of censoring and truncation. Under this framework, we propose
local φ−divergence tests on the parameter θ. The φ−divergence technique constitutes a
useful tool in local hypothesis testing since it facilitates the construction of the local
test statistic. In fact, it suffices to multiply the integrand in the φ−divergence by a kernel
allowing to focus on a specific part of the integration domain. Moreover, the φ−divergence
technique allows to determine the asymptotic distribution of the test statistic under the
alternative hypothesis, which is not possible for classical approaches. This helps to give
an approximation to the power function of the φ−divergence based test. Concerning our
contributions in this paper, we draw on the work of [7] to propose local φ−divergence tests
of the simple null hypothesis H0 : θT = θ0 , under the general framework of incomplete
data. We give the asymptotic distribution of the statistics of these tests under both the
null and the alternative hypotheses. This allows us to approximate the power of these
tests. We also propose local Wald and Rao type tests and we provide the asymptotic
distribution of their statistics under H0 . Then, we study local composite null hypotheses
with incomplete data. For these ones, we propose local φ−divergence, Wald, Rao and
Lagrange multipliers tests and we provide the asymptotic distribution of their statistics
under the null hypothesis. We also study the asymptotic behaviour of the φ−divergence
tests statistics under the alternative hypothesis. Furthermore, we consider the problem
of comparison of the distributions of two samples of incomplete data. Following [7], we
propose local φ−divergence and Wald tests of homogeneity and we provide the asymptotic
distribution of their statistics under the null hypothesis. We also give the asymptotic
distribution of the φ−divergence tests statistics under the alternative hypothesis. Finally,
we apply our proposed tests on a real data set of the time of breast retraction for breast
cancer patients. The rest of the paper is organized as follows. In Section 2, we present
some types of censored and truncated data on which our study will be based. In Section
3, we give our main results. An application on real data is presented in Section 4 and
Section 5 gives some conclusions and perspectives. The proofs are relegated to Appendix
A.

2. Some types of incomplete data


We start by presenting some types of censoring and truncation and we give the form of
the likelihood function for each type. Let X be a positive real random variable (r.r.v.) of
interest. We assume that the distribution of X belongs to a parametric family {Pθ , θ ∈ Θ}
(Θ being an open set of Rd ), dominated by a σ−finite measure m. Denote by fθ the
1376 TWMS J. APP. AND ENG. MATH. V.14, N.4, 2024

Radon-Nikodym derivative of Pθ with respect to m. We also assume that X may be cen-


sored and/or truncated and we denote by (Z, ∆) the couple of the observed variables. In
what follows, (Zi , ∆i )1≤i≤n represents a sample of independent and identically distributed
(i.i.d.) copies of the couple (Z, ∆) and (zi , δi ) represents a realization of (Zi , ∆i ). From
now on, for any random variable V , PV , FV and SV denote respectively the probability
distribution, the distribution function and the survival function of V ; and when PV is
absolutely continuous with respect to m, fV = dP dm represents its Radon-Nikodym deriva-
V

tive. Moreover, for any function ψ : R → R, we denote by ψ(x− ) = limt→x < ψ(t), when the
>
limit exists and for any vector or matrix A, we denote by A the transpose of A. Here
are some types of incomplete data.
Right censored data
In this case, we observe the variables Z = min(X, R) and ∆ = 1{X≤R} , where R is the
right censoring variable assumed to be positive and independent of X and 1{.} denotes the
indicator function. The likelihood function of (Z, ∆) is given by
n
Y δ
L(θ) = fθ (zi )SR (zi− ) i (SX (zi ; θ)fR (zi ))1−δi .
i=1

Since we are interested in the parameter θ, we will only consider the functions that depend
on θ. So, we study the following pseudo-likelihood function
n
Y
L(θ) = gθ (zi , δi ), where
i=1

gθ (zi , δi ) = fθ (zi )δi SX (zi ; θ)1−δi . (1)


Doubly censored data
In this case, we observe the variables Z = max(min(X, R), L) and

 1, if L < X ≤ R
∆= 2, if X > R
3, if X ≤ L,

where R (resp. L) is the right (resp. left) censoring variable with 0 ≤ L ≤ R almost surely
(a.s.) and (L, R) is independent from X.
The likelihood function of (Z, ∆) is given by
n
Y 1{δ =1}
L(θ) = fθ (zi ) SR (zi− ) − SL (zi− ) i (SX (zi ; θ)fR (zi ))1{δi =2} (FX (zi ; θ)fL (zi ))1{δi =3} .
i=1

As in the previous case, we study the following pseudo-likelihood function


n
Y
L(θ) = gθ (zi , δi ), where
i=1

gθ (zi , δi ) = fθ (zi )1{δi =1} SX (zi ; θ)1{δi =2} FX (zi ; θ)1{δi =3} . (2)
Interval censored data, case 1 (current status data)
In this case, we observe the couple (Z, ∆), where Z is a positive random variable indepen-
dent of X and ∆ = 1{X≤Z} . The likelihood function of (Z, ∆) is given by
n
Y
L(θ) = FX (zi ; θ)δi SX (zi ; θ)1−δi fZ (zi )
i=1
M. BOUKELOUA: TESTING LOCAL HYPOTHESES WITH INCOMPLETE DATA 1377

and the pseudo-likelihood function is given by


n
Y
L(θ) = gθ (zi , δi ), where
i=1

gθ (zi , δi ) = FX (zi ; θ)δi SX (zi ; θ)1−δi . (3)


Interval censored data, case 2
In this case, we observe the variables Z = (R, L) and

 1, if L < X ≤ R
∆= 2, if X > R
3, if X ≤ L,

where R and L are positive variables such that L < R a.s. and (R, L) is independent from
X. Let (ri , li , δi )1≤i≤n be a realization of the sample (Ri , Li , ∆i )1≤i≤n . The likelihood
function of (Z, ∆) is given by
n
Y
L(θ) = (FX (ri ; θ) − FX (li ; θ))1{δi =1} SX (ri ; θ)1{δi =2} FX (li ; θ)1{δi =3} fR,L (ri , li )
i=1

and the pseudo-likelihood function is given by


n
Y
L(θ) = gθ (zi , δi ), where
i=1

gθ (zi , δi ) = gθ (ri , li , δi ) = (FX (ri ; θ) − FX (li ; θ))1{δi =1} SX (ri ; θ)1{δi =2} FX (li ; θ)1{δi =3} . (4)
The LTRC data model
Let R (resp. L) be a positive variable of censoring (resp. truncation) independent of X. In
the left truncated and right censored (LTRC) data model, we observe Z = (Y, L) (where
Y = min(X, R)) and ∆1 = 1{X≤R} whenever Y ≥ L (i.e., when the observation is not left
truncated). We also observe the truncation indicator ∆2 = 1{Y ≥L} . Set ∆ = (∆1 , ∆2 )
and (yi , li , δ1i , δ2i )1≤i≤n a realization of the sample (Yi , Li , ∆1i , ∆2i )1≤i≤n . The likelihood
function is given by
n
δ δ fθ (yi )SR (yi− ) δ1i (1−δ2i )
(  
Y
L(θ) = fθ (yi )SR (yi− ) 1i 2i (SX (yi ; θ)fR (yi ))(1−δ1i )δ2i
SX (li ; θ)
i=1
)
SX (yi ; θ)fR (yi ) (1−δ1i )(1−δ2i )
 

SX (li ; θ)

and the pseudo-likelihood function is given by


n
Y
L(θ) = gθ (zi , δi ), where
i=1

 δ1i (1−δ2i )
δ1i δ2i fθ (yi )
gθ (zi , δi ) = gθ (yi , li , δ1i , δ2i ) = fθ (yi ) SX (yi ; θ)(1−δ1i )δ2i
SX (li ; θ)
 (1−δ1i )(1−δ2i )
SX (yi ; θ)
. (5)
SX (li ; θ)
1378 TWMS J. APP. AND ENG. MATH. V.14, N.4, 2024

3. Main results
We will propose local tests on the parameter θ, under a general framework of incomplete
data which includes all the types of censoring and truncation described in the previous
section. Firstly, we begin by defining this general framework.

3.1. General framework of incomplete data. In the general framework, we assume


that the variable of interest X is not completely observed. So instead of observing X, we
observe the variables Z and ∆, where ∆ is a discrete variable that indicates which variable
is observed (the variable of interest or another latent variable). Under this framework, the
pseudo-likelihood function is defined by
n
Y
L(θ) = gθ (zi , δi ),
i=1

where gθ is the pseudo-density of (Z, ∆). In the particular cases studied in the previous
section, gθ has one of the forms (1)–(5), depending on the considered type of incomplete
data.

3.2. Local φ−divergences. Our study is based on the local φ− divergences between the
two functions gθ(1) and gθ(2) (θ(1) and θ(2) being two elements of Θ). Let E and F be
the support of the variables Z and ∆ respectively. These supports vary according to the
considered case. For example, for right censored data, when Z is absolutely continuous, E
is a subset of R+ and when Z is discrete, E is a subset of N. As for F , it is equal to {0, 1}.
The dominating measure m is the Lebesgue measure when Z is absolutely continuous and
it is the counting measure when Z is discrete. We also denote by µ the counting measure
on F . Following [8] and [7], we define the local φ− divergence between gθ(1) and gθ(2) by
 Z Z  
ω ω

(1) (2) gθ(1) (z, δ)
Dφ (gθ(1) , gθ(2) ) = Dφ θ , θ = gω (z, δ)gθ(2) (z, δ)φ dm(z)dµ(δ),
F E gθ(2) (z, δ)
where ω is a fixed point of Θ and φ is a real valued convex function defined on [0, +∞[.
We assume that φ belongs to the class of convex functions
  
0 0
Φ = φ : φ is strictly convex at 1, φ(1) = φ (1) = 0, 0φ = 0,
0
u 
φ(v)
0φ = u lim .
0 v→+∞ v

Local φ−divergences are based on the choice of the kernel gω for the fixed value ω ∈ Θ,
which determines the part of the support  of X, on which we focus our analysis.
The local φ− divergence Dφω θ(1) , θ(2) satisfies Dφω θ(1) , θ(2) ≥ 0 with equality if and


only if gθ(1) = gθ(2) . Moreover, we assume that the parametric family {gθ , θ ∈ Θ} is
identifiable, i.e., gθ(1) = gθ(2) implies that θ(1) = θ(2) for all θ(1) and θ(2) ∈ Θ.
In the sequel, we will be interested in some tests on the parameter θ with local simple and
composite null hypotheses. We will also study some local tests of homogeneity.

3.3. Local tests with simple null hypothesis. In this paragraph, we will study local
tests with simple null hypothesis. In particular, we will construct local φ−divergence,
Wald and Rao test statistics and we will determine their asymptotic distributions under
the null hypothesis, using standard assumptions in the parametric setting. For the local
φ−divergence statistics, we will also determine the asymptotic distribution under the
alternative hypothesis, which allows us to give an approximation to the power function.
M. BOUKELOUA: TESTING LOCAL HYPOTHESES WITH INCOMPLETE DATA 1379

Let θT be the true value of θ and θ0 be a fixed point in Θ. As in [7], we consider the local
null hypothesis defined by
H0 : gθT (z, δ) = gθ0 (z, δ) for a given gω , ω ∈ Θ,
which we briefly write
H0w : θT = θ0 .
To test the hypothesis H0ω against the alternative H1ω : θT 6= θ0 , we make use of the
following local φ−divergence test statistic
 
2nDφω θbn , θ0
ω b
Tφ,n (θn , θ0 ) = ,
φ00 (1)
where θbn is the maximum pseudo-likelihood estimate (MPLE) of θT .
We will also study Wald and Rao tests of the hypothesis H0ω against H1ω . For that, let us
define the Fisher information matrix
Z Z 
∂ log gθ (z, δ) ∂ log gθ (z, δ)
I(θ) = fZ,∆ (z, δ; θ) dm(z)dµ(δ)
F E ∂θi ∂θj 1≤i,j≤d
and the local information matrix
Z Z 
ω ∂ log gθ (z, δ) ∂ log gθ (z, δ)
I (θ) = gω (z, δ)gθ (z, δ) dm(z)dµ(δ) .
F E ∂θi ∂θj 1≤i,j≤d
The local Wald and Rao test statistics are defined respectively by
 >  
ω ω b
W n = n θn − θ0 I ( θn ) θn − θ0
b b

and
1 >
Rnω = U (θ0 )I ω (θ0 )−1 Un (θ0 ),
n n
where !>
n n
X ∂ log gθ (Zi , ∆i ) X ∂ log gθ (Zi , ∆i )
Un (θ0 ) = ,..., .
∂θ1 ∂θd
i=1 i=1 θ=θ0

We will give the asymptotic distribution of the test statistics Tφ,n ω (θ


bn , θ0 ), W ω and Rω
n n
under the following assumptions.
H1: The third partial derivatives of gθ (z, δ) with respect to θ exist for all θ ∈ Θ.
H2: The first, second and third partial derivatives of gθ (z, δ) with respect to θ are
absolutely
R R bounded from functionsR R α(z, δ), β(z, δ) and γ(z, δ) respectively and
RF RE α(z, δ)dm(z)dµ(δ) < ∞, F E β(z, δ)dm(z)dµ(δ) < ∞ and
F E γ(z, δ)fZ,∆ (z, δ; θ)dm(z)dµ(δ) < ∞.
H3: For each θ ∈ Θ, the matrices I(θ) and I ω (θ) exist, they are positive definite and
their elements are continuous functions of θ.
H4: The function φ ∈ Φ is twice continuously differentiable with φ00 (1) > 0.
H5: For each θ0 ∈ Θ there exists an open neighborhood N (θ0 ) such that for all
θ ∈ N (θ0 ) and 1 ≤ i, j ≤ d we have
Z Z  
∂ gθ (z, δ)
gω (z, δ)gθ0 (z, δ)φ dm(z)dµ(δ)
∂θi F E gθ0 (z, δ)
Z Z   
∂ gθ (z, δ)
= gω (z, δ)gθ0 (z, δ)φ dm(z)dµ(δ),
F E ∂θi gθ0 (z, δ)
1380 TWMS J. APP. AND ENG. MATH. V.14, N.4, 2024

∂2
Z Z  
gθ (z, δ)
gω (z, δ)gθ0 (z, δ)φ dm(z)dµ(δ)
∂θi ∂θj F E gθ0 (z, δ)
∂2
Z Z   
gθ (z, δ)
= gω (z, δ)gθ0 (z, δ)φ dm(z)dµ(δ)
F E ∂θi ∂θj gθ0 (z, δ)
and these expressions are continuous on N (θ0 ).
In all the sequel, (Vn )n∈N represents a sequence of independent and identically distributed
standard normal random variables.
Theorem 3.1. Under H0ω and the assumptions H1 − H5, the statistics Tφ,n ω (θ
bn , θ0 ) and
r
Wn converge in distribution to i=1 ai Vi , where r = rank I(θ0 ) I (θ0 )I(θ0 )−1 and
ω 2 −1 ω
P 

(θ0 )I(θ0 )−1 .


a1 , . . . , ar are the non zero eigenvalues of the matrix I ωP
ω
Moreover, the statistic Rn converges in distribution to si=1 bi Vi2 , where
ω −1

s = rank I(θ0 )I (θ0 ) I(θ0 ) and b1 , . . . , bs are the non zero eigenvalues of the matrix
I ω (θ0 )−1 I(θ0 ).
From this theorem, the critical region of the local φ−divergence test at level α ∈ (0, 1) is
CR = {Tφ,n ω (θ bn , θ0 ) > q1−α }, where q1−α is the (1−α)−quantile of the limiting distribution
of T ω (θbn , θ0 ). In the practice, the quantile q1−α can be approximated by a Monte Carlo
φ,n
approach as described in [7]. The critical regions of the local Wald and Rao tests can be
defined in the same way.
The next theorem gives the asymptotic distribution of Tφ,n ω (θbn , θ0 ) under the alternative
ω
hypothesis H1 .
Theorem 3.2. Under H1ω and the assumptions H1-H5, we have
√  ω 
D
n Dφ (θbn , θ0 ) − Dφω (θT , θ0 ) −→ N (0, σ 2 ),
ω (θ,θ )
∂Dφ 0
where σ 2 = T > I(θT )−1 T with T = (t1 , . . . , td )> , ti = ∂θi , 1 ≤ i ≤ d.
θ=θT

Thanks to this theorem, we can approximate the power function θT ∈ Θ 7→ π(θT ) =


PθT (CR). Indeed, we have
√  
n q1−α 00 ω
π(θT ) ≈ 1 − FN φ (1) − Dφ (θT , θ0 ) ,
σ 2n
where FN is the cumulative distribution function of the standard normal distribution.
From this approximation, we can compute the sample size that ensures a specified power
π. Let n0 be the positive root of the equation
√  
n q1−α 00 ω
π = 1 − FN φ (1) − Dφ (θT , θ0 )
σ 2n
which can be written into p
a + b − a(a + 2b)
n0 = ,
2Dφω (θT , θ0 )2
2
where a = σ 2 FN−1 (1 − π) and b = q1−α φ00 (1)Dφω (θT , θ0 ). The required sample size is


then bn0 c + 1 (bxc denotes the integer part of x). In practice, we can replace θT by θbn in
Dφω (θT , θ0 ) and σ and q1−α can be estimated by the Monte Carlo approach described in
[7].
M. BOUKELOUA: TESTING LOCAL HYPOTHESES WITH INCOMPLETE DATA 1381

3.4. Local tests with composite null hypothesis. In this paragraph, we will study lo-
cal tests with composite null hypothesis. In particular, we will construct local φ−divergence,
Wald, Rao and Lagrange multipliers test statistics and we will determine their asymptotic
distributions under the null hypothesis. We will also determine the asymptotic behaviour
of the local φ−divergence statistics under the alternative hypothesis, which allows us to
conclude that the local φ−divergence test is consistent.
Consider the local composite null hypothesis

H0ω : h(θT ) = 0 against the alternative H1ω : h(θT ) 6= 0,


where h is a function defined from Θ to Rp (p < d).
The hypothesis H0ω can be transformed to a simple one by considering a function e
h:B⊆
R d−p ω ω
−→ Θ so that H0 and H1 are equivalent to the hypotheses
H0ω : θT = e
h(β) and H1ω : θT 6= e
h(β),
for some β ∈ B.
Let θen be the MPLE of θT satisfying the constraint h(θen ) = 0. Under the assumptions
H6: For all θ ∈ Θ such that h(θ) = 0, the matrix H(θ) = ∇θ h(θ) exists, it has full
rank and its elements are continuous functions of θ
and
H7: For all β ∈ B, the matrix H(β)
e = ∇β e
h(β) exists, it has full rank and its elements
are continuous functions of β,
we have in view of lemma 3.1 of [7]
√  
D
n θen − θT −→ N 0, Id − I −1 (θT ) B (θT ) I −1 (θT ) Id − B (θT ) I −1 (θT ) ,
 

−1 >
where B(θ) = H(θ) H > (θ)I −1 (θ)H(θ)

H (θ).
We also set A(θ) = B(θ)I −1 (θ)I ω (θ)I −1 (θ)B(θ).
We are interested in the following test statistics of the hypothesis H0ω against H1ω .
- The local φ−divergence statistic
 
  2nDφω θbn , θen
ω
Tφ,n θbn , θen = .
φ00 (1)
- The Wald statistic
 −1
Wnw,c = nh(θbn )> H(θbn )I ω (θbn )−1 H(θbn ) h(θbn ).

- The Rao statistic


1 h i−1
Rnω,c =Un (θen )> I ω (θen ) Un (θen ).
n
- The Lagrange multipliers statistic:
Consider the constrained optimization problem

maxθ∈Θ L(θ)
h(θ) = 0.
The Lagrangian of this problem is
L(θ) + h(θ)> λ,
1382 TWMS J. APP. AND ENG. MATH. V.14, N.4, 2024

where λ is the Lagrange multiplier.


Let (θen , λ
en ) be the solution of this problem, the Lagrange multipliers test statistic
is defined by
1 e> ω e e
Mnω,c = λ Γ (θn )λn ,
n n
where Γω (θ) = H > (θ) [I ω (θ)]−1 H(θ).
We also set Γ(θ) = H > (θ) [I(θ)]−1 H(θ).
Now, we will give the asymptotic distributions of these statistics under H0ω .
Theorem 3.3. Under H0ω and assumptions H1-H3, H6 and H7, we have
i) If H4 and H5 are satisfied, then
  r1
D
X
ω
Tφ,n θn , θn −→
b e ai Vi2 ,
i=1

where r1 = rank(I(θT )−1 A(θT )I(θT )−1 ) and a1 , . . . , ar1 are the non zero eigenval-
ues of A(θ)I(θ)−1 .
ii)
r2
D
X
Wnω,c −→ bi Vi2 ,
i=1
where
r2 = rank(H(θT )> I(θT )−1 H(θT )(H(θT )> I ω (θT )−1 H(θT ))−1
H(θT )> I(θT )−1 H(θT ))
and b1 , . . . , br2 are the non zero eigenvalues of
(H(θT )> I ω (θT )−1 H(θT ))−1 H(θT )> I(θT )−1 H(θT ).
iii)
r3
D
X
Rnω,c −→ ci Vi2 ,
i=1

where r3 = rank(B(θT )I(θT )−1 B(θ


ω −1 −1
T )I (θT ) B(θT )I(θT ) B(θT )) and c1 , . . . , cr3
are the non zero eigenvalues of I ω (θT )−1 B(θT )I(θT )−1 B(θT ).
iv)
r4
D
X
Mnω,c −→ di Vi2 ,
i=1
where r4 = rank(Γω (θ T )) and d1 , . . . , dr4 are the non zero eigenvalues of
Γ−1 (θT )Γω (θT ).
The following theorem deals with the asymptotic behaviour of the local φ−divergence
test statistic under H1ω .
Theorem 3.4. Assume that there exists a unique θ∗ ∈ Θ that maximizes E(log gθ (Z, ∆))
under the constraint h(θ) = 0, then under H1ω and the assumptions H1-H7, the test
ω (θ
statistic Tφ,n bn , θen ) tends in probability to infinity.

From this theorem, we deduce that the power of the local φ−divergence test tends to 1
as n tends to infinity, i.e., it is a consistent test.
M. BOUKELOUA: TESTING LOCAL HYPOTHESES WITH INCOMPLETE DATA 1383

3.5. Local tests of homogeneity. In this paragraph, we will study local tests of ho-
mogeneity. In particular, we will construct local φ−divergence and Wald test statistics
and we will determine their asymptotic distributions under the null hypothesis. As in
the previous paragraphs, we will also determine, for the local φ−divergence statistics, the
asymptotic distribution under the alternative hypothesis, which allows us to give an ap-
proximation to the power function.
Let (Zi , ∆i )1≤i≤n be an observed sample associated to the variable of interest X, with
probability density function fθ(1) and let (Zei , ∆
e i )1≤i≤m be an observed sample associated
to the variable of interest X,
e with probability density function f (2) . We assume that the
θ
two samples are independent and that the sample sizes n and m are asymptotically linked
by the following relation
m
lim
n→∞
= ρ ∈ (0, 1).
m→∞
m+n

We want to test the local null hypothesis H0ω : θ(1) = θ(2) against the alternative H1ω :
θ(1) 6= θ(2) . For that, we use the local φ−divergence and Wald test statistics. The first
one is defined by
(1) (2)
ω
2nmDφω (θbn , θbm )
Tφ,n,m (θbn(1) , θbm
(2)
)= ,
(m + n)φ00 (1)
(1) (2)
where θbn and θbm are the MPLE’s of θ(1) and θ(2) on the basis of the samples (Zi , ∆i )1≤i≤n
and (Z
ei , ∆
e i )1≤i≤m , respectively.
Moreover, the Wald statistic is given by
 > h i−1  
ω
Wn,m = nm θbn(1) − θbm
(2)
mI ω (θbn(1) )−1 + nI ω (θbm
(2) −1
) θbn(1) − θbm
(2)
.

The next theorem gives the asymptotic distributions of these statistics under H0ω .

Theorem 3.5. Assume that the assumptions H1-H5 hold, so under H0ω , the statis-
ω (1) (2) ω
tics Tφ,n,m (θbn , θbm ) and Wn,m converge in distribution, as n and m tend to infinity,
Pr
to i=1 ai Vi , where r = rank(I(θ(1) )−1 I ω (θ(1) )I(θ(1) )−1 ) and a1 , . . . , ar are the non zero
2

eigenvalues of the matrix I ω (θ(1) )I(θ(1) )−1 .

In order to get an approximation of the power of the local φ−divergence test, we will
ω (1) (2)
give the asymptotic distribution of Tφ,n,m (θbn , θbm ) under the alternative hypothesis H1ω .

Theorem 3.6. Assume that the assumptions H1-H5 hold and that the function φ also
satisfies the following assumption.
H8: For all 1 ≤ i ≤ d, we have
Z Z  
∂ gθ(1) (z, δ)
(1)
gω (z, δ)gθ(2) (z, δ)φ dm(z)dµ(δ)
∂θi F E gθ(2) (z, δ)
Z Z  
∂gθ(1) 0 gθ(1) (z, δ)
= gω (z, δ) (1)
(z, δ)φ dm(z)dµ(δ)
F E ∂θi gθ(2) (z, δ)
1384 TWMS J. APP. AND ENG. MATH. V.14, N.4, 2024

and
Z Z  
∂ gθ(1) (z, δ)
(2)
gω (z, δ)gθ(2) (z, δ)φ dm(z)dµ(δ)
∂θi F E gθ(2) (z, δ)
Z Z "  
∂gθ(2) gθ(1) (z, δ)
= gω (z, δ) (2)
(z, δ)φ
F E ∂θi gθ(2) (z, δ)
 #
∂gθ(2) gθ(1) (z, δ) 0 gθ(1) (z, δ)
− (2) (z, δ) φ dm(z)dµ(δ).
∂θi gθ(2) (z, δ) gθ(2) (z, δ)
So, under H1ω , we have
r
nm  ω b(1) b(2) 
D
 
Dφ (θn , θm ) − Dφω (θ(1) , θ(2) ) n→∞
−→ N 0, σφ2 (θ(1) , θ(2) ) ,
m+n m→∞
where
σφ2 (θ(1) , θ(2) ) = ρT1> I(θ(1) )−1 T1 + (1 − ρ)T2> I(θ(2) )−1 T2
with
Z Z  
> ∂gθ(1) 0 gθ(1) (z, δ)
T1 = (t11 , . . . , t1d ) , t1i = gω (z, δ) (1)
(z, δ)φ dm(z)dµ(δ)
F E ∂θi gθ(2) (z, δ)
and T2 = (t21 , . . . , t2d )> ,
Z Z "  
∂gθ(2) gθ(1) (z, δ)
t2i = gω (z, δ) (2)
(z, δ)φ
F E ∂θi gθ(2) (z, δ)
 #
∂gθ(2) gθ(1) (z, δ) 0 gθ(1) (z, δ)
− (2) (z, δ) φ dm(z)dµ(δ).
∂θ gθ(2) (z, δ) gθ(2) (z, δ)
i

o local φ−divergence test at level α ∈ (0, 1) is given by CR =


n The critical region of the
ω (1) (2)
Tφ,n,m (θbn , θbm ) > q1−α , where q1−α is the (1 − α)−quantile of the asymptotic dis-
ω (1) (2)
tribution of Tφ,n,m (θbn , θbm ) under H0ω . Proceeding as in the one sample case, we can
approximate the power function as follows
m + n φ00 (1)
 r  
1 nm ω (1) (2)
π ≈ 1 − FN q1−α − Dφ (θ , θ ) .
σφ (θ(1) , θ(2) ) m + n nm 2
where FN is the cumulative distribution function of the standard normal distribution.

4. Real data application


[10, 11] reported a study on the cosmetic results of breast cancer patients, treated either
by radiotherapy only or by radiotherapy and chemotherapy. During the period from 1976
to 1980, the patients were followed in order to record the time to the cosmetic retraction
of the breast. At the begginig, the patients were observed every 4 to 6 months, but, as
their recovery progressed, the checking times became more distant. Therefore, the time
of breast retraction is case two interval censored. This dataset has also been used by
[12] and [13, 14] who suggested the log-normal distribution to fit the data. For our part,
we consider the sample of patients treated by radiotherapy and chemotherapy (composed
of 48 patients) and we compare its distribution with the log-normal distribution with
parameter θ0 = (m0 , σ02 ) = (3, 0.72 ). The graphs of the density of this distribution and
that of the kernel estimated density of the data are given in the left panel of Figure 1.
We use the gamma kernel to calculate the estimated density of the data. Overall, the two
M. BOUKELOUA: TESTING LOCAL HYPOTHESES WITH INCOMPLETE DATA 1385

graphs are distant, but they are very close in a certain zone at the right. To highlight this
zone, we add in the right panel of Figure 1, the graph of the truncated normal kernel with
parameter ω = (µω , σω2 ) = (25, 0.32 ). This latter is given by
(x − µω )2
 
1
kω (x) = √   exp − 1{x>0} .
σω 2πFN µσωω 2σω2

Figure 1. Graphs of the log-normal density with parameter θ0 = (3, 0.72 )


and the kernel estimated density of the data.

To confirm our observation, we use the global and local φ−divergence, Wald and Rao
tests on this set of data, at the significance level α = 0.05. The divergences we use are the
Kullback-Leibler (KL), modified Kullback-Leibler (KLm ) and the λ−power divergences
introduced by [15] (for different values of λ). They correspond respectively to the functions:
xλ+1 − x − λ(x − 1)
φKL (x) = x log(x) − x + 1, φKLm (x) = − log(x) + x − 1 and φλ (x) =
λ(λ + 1)
(for λ 6= 0 and λ 6= −1). For λ = −0.5 (resp. λ = 1), we obtain the Hellinger (resp. the
χ2 ) divergence. In the case of global tests, the critical value q is the (1 − α)−quantile of
the χ22 distribution and in the case of local tests, it is calculated from Theorem 3.1. The
kernel we use for local tests is gω (r, l) = kω (r)kω (l). Our obtained results are given in
Tables 1 and 2.

The test The test statistic q Decision


KL divergence 10748.1 Reject H0
Modified KL divergence 30891.67 Reject H0
Hellinger divergence (λ = −0.5) 14660.05 Reject H0
Power divergence with λ = 0.5 9189.61 5.991465 Reject H0
χ2 divergence (λ = 1) 8586.346 Reject H0
Wald 126.7467 Reject H0
Rao 2.850749 Accept H0

Table 1. The obtained results for the global tests.


1386 TWMS J. APP. AND ENG. MATH. V.14, N.4, 2024

The test The test statistic q Decision


KL divergence 0.116991 Accept H0
Modified KL divergence 0.152990 Accept H0
Hellinger divergence (λ = −0.5) 0.131920 Accept H0
3.105955
Power divergence with λ = 0.5 0.106491 Accept H0
χ2 divergence (λ = 1) 0.099273 Accept H0
Wald 1.234309 Accept H0
Rao 1434.473 2890.549 Accept H0

Table 2. The obtained results for the local tests.

Except the Rao test, all tests reject the global null hypothesis. Moreover, all tests accept
the local null hypothesis, which confirms our observation of Figure 1.

5. Conclusions
Under a general framework of incomplete data, we have introduced local tests in para-
metric models for simple and composite null hypotheses. We have also introduced local
tests of homogeneity. These tests are based on φ−divergences, Wald and Rao statistics. In
the future, it would be interesting to look at local model selection (see [9]) for incomplete
data. It would also be interesting to study local nonparametric procedures to test the
goodness-of-fit, the homogeneity and the independence.

References
[1] Csiszár, E., (1963), Eine informationstheoretische ungleichung und ihre anwendung auf den beweis
der ergodizität von markoffschen ketten, Magyar Tud. Akad. Mat. Kutató Int. Közl., 8, pp. 85-108.
[2] Broniatowski, M. and Keziou, A., (2009), Parametric estimation and tests through divergences and
the duality technique, Journal of Multivariate Analysis, 100(1), pp. 16-36.
[3] Broniatowski, M. and Keziou, A., (2012), Divergences and duality for estimation and test under
moment condition models, Journal of Statistical Planning and Inference, 142(9), pp. 2554-2573.
[4] Bouzebda, S. and Keziou, A., (2010), New estimates and tests of independence in semiparametric
copula models, Kybernetika, 46(1), pp. 178-201.
[5] Boukeloua, M., (2021), Study of semiparametric copula models via divergences with bivariate censored
data, Communications in Statistics-Theory and Methods, 50(23), pp. 5429-5452.
[6] Salicrú, M., Morales, D., Menéndez, M. L. and Pardo, L., (1994), On the applications of divergence
type measures in testing statistical hypotheses, Journal of Multivariate Analysis, 51(2), pp. 372-391.
[7] Avlogiaris, G., Micheas, A. and Zografos, K., (2016), On testing local hypotheses via local divergence,
Statistical Methodology, 31, pp. 20-42.
[8] Avlogiaris, G., Micheas, A. and Zografos, K., (2016), On local divergences between two probability
measures, Metrika, 79(3), pp. 303-333.
[9] Avlogiaris, G., Micheas, A. C. and Zografos, K., (2019), A criterion for local model selection, Sankhyā
A: The Indian Journal of Statistics, 81(2), pp. 406-444.
[10] Beadle, G. F., Silver, B., Botnick, L., Hellman, S. and Harris, J. R., (1984), Cosmetic results following
primary radiation therapy for early breast cancer, Cancer, 54, pp. 2911-2918.
[11] Beadle, G. F., Come, S., Henderson, I. C., Silver, B., Hellman, S. and Harris, J. R., (1984), The effect
of adjuvant chemotherapy on the cosmic results after primary radiation treatment for early stage
breast cancer, International Journal of Radiation Oncology, Biology and Physics, 10, pp. 2131-2137.
[12] Finkelstein, D. M. and Wolfe, R. A., (1985), A semiparametric model for regression analysis of interval-
censored failure time data, Biometrics, 41, pp. 933-945.
[13] Lindsey, J. C. and Ryan, L. M., (1998), Tutorial in biostatistics: Methods for interval-censored data,
Statistics in Medicine, 17(2), pp. 219-238.
[14] Lindsey, J. C. and Ryan, L. M., (1999), Erratum to ”Tutorial in biostatistics: Methods for interval-
censored data”[Statistics in Medicine 17 (1998) 219-238], Statistics in Medicine, 18(7), pp. 890.
M. BOUKELOUA: TESTING LOCAL HYPOTHESES WITH INCOMPLETE DATA 1387

[15] Cressie, N. and Read, T. R., (1984), Multinomial goodness-of-fit tests, Journal of the Royal Statistical
Society: Series B (Methodological), 46(3), pp. 440-464.
[16] Dik, J. J. and de Gunst, M. C. M., (1985), The distribution of general quadratic forms in normal
variables, Statistica Neerlandica, 39(1), pp. 14-26.
[17] Pardo, L., (2006), Statistical inference based on divergence mesures, Chapman & Hall/CRC, Madrid.
[18] Sen, P. K. and Singer J. M., (1993), Large sample methods in statistics: an introduction with appli-
cations, Chapman & Hall/CRC, New York.

Appendix A. Proofs
Proof of Theorem 3.1. - The result for Tφ,n ω (θ
bn , θ0 ) can be proved following the same
steps of the proof of Theorem 2.1 of [7]. The difference lies in the formulas of Dφω (θ, θ0 ),
I(θ) and I ω (θ), where the functions fθ and fω in [7] are respectively replaced by gθ and
gω . In particular, using a Taylor expansion and the fact that ∇θ Dφω (θ, θ0 )|θ=θ0 = 0 and
∇> ω ω
θ ∇θ Dφ (θ, θ0 )|θ=θ0 = φ”(1)I (θ0 ), we get
  1 >  
Dφω θbn , θ0 = θbn − θ0 φ”(1)I ω (θ0 ) θbn − θ0 + op n−1 .

2
So that
 >  
T ω (θbn , θ0 ) = n θbn − θ0 I ω (θ0 ) θbn − θ0 + op (1)
φ,n

and the claimed result follows from the fact that


√  
D
n θbn − θ0 −→ N 0, I(θ0 )−1 ,

(6)

thanks to Corollary 2.1 of [16].


- The results for Wnω and Rnω follow by the same steps of the proof of Theorem 2.2 of [7].
Here too, the difference lies in the formulas of I ω (θ) and Un (θ), where the functions fθ
and fω in [7] are respectively replaced by gθ and gω . In particular, the convergence of
P
Wn follows from (6) and the fact that I ω (θbn ) −→ I ω (θ0 ), thanks to Corollary 2.1 of [16].
Moreover, the convergence of Rnω follows, once again, from this corollary and the fact that
1 D
√ Un (θ0 ) −→ N (0, I(θ0 )) .
n

Proof of Theorem 3.2. The proof is similar to that of Theorem 9.2 of [17]. 
Proof of Theorem 3.3. - The proof of i) is similar to that of Theorem 3.1 of [7].
- The proof of ii) is similar to that of Theorem 3.2 of [7].
- Proof of iii):
Thanks to equations (5.6.20) page 242 and (5.6.2) page 237 of [18], we have
1 1 √  
√ Un (θen ) = √ Un (θT ) − I(θT ) n θen − θT + oP (1)
n n
√   √  
= I(θT ) n θn − θT − I(θT ) n θn − θT + oP (1).
b e

So, lemma 3.1 of [7] allows to write


1 √   √  
√ Un (θen ) = I(θT ) n θbn − θT − I(θT )(Id − I(θT )−1 B(θT )) n θbn − θT + oP (1)
n
√  
= B(θT ) n θbn − θT + oP (1).
1388 TWMS J. APP. AND ENG. MATH. V.14, N.4, 2024

Therefore
1 D
√ Un (θen ) −→ N (0, B(θT )I(θT )−1 B(θT )), as n → ∞
n
P
and since I ω (θen )−1 −→ I ω (θT )−1 , we deduce by Corollary 2.1 of [16] that
r3
D
X
Rnω,c −→ ci Vi2 ,
i=1

where r3 = rank(B(θT )I(θT )−1 B(θT )I ω (θT )−1 B(θT )I(θT )−1 B(θT )) and c1 , . . . , cr3
are the non zero eigenvalues of I ω (θT )−1 B(θT )I(θT )−1 B(θT ).
- Proof of iv):
Equations (5.6.23) page 243 and (5.6.2) page 237 of [18] allow to write
 
1 e > −1 −1 > −1 1
√ λn = −(H(θT ) I(θT ) H(θT )) H(θT ) I(θT ) √ Un (θT ) + oP (1)
n n
>√
 
> −1 −1
= −(H(θT ) I(θT ) H(θT )) H(θT ) n θbn − θT + oP (1)
√  
= −Γ(θT )−1 H(θT )> n θbn − θT + oP (1). (7)
Otherwise, the Taylor Young formula permits to write
√ √ √   √
nh(θbn ) = nh(θT ) + nH(θT )> θbn − θT + oP ( n(θbn − θT ))
√  
= nH(θT )> θbn − θT + oP (1),
√   √
so nH(θT )> θbn − θT = nh(θbn ) + oP (1). Combining this with (7), we get
1 e √ −1 b
√ λ n = − nΓ(θT ) h(θn ) + oP (1)
n
and
1 e> ω e e
Mnω,c = λ Γ (θn )λn
n n
= nh(θbn )> Γ(θT )−1 Γω (θen )Γ(θT )−1 h(θbn ) + oP (1)
= nh(θbn )> Γ(θT )−1 Γω (θT )Γ(θT )−1 h(θbn ) + oP (1)
by the continuity of Γω (θ) in θ.
Moreover, proceeding as in the proof of Theorem 5.4.1 of [18], we get
√   √
D
n h(θbn ) − h(θT ) = nh(θbn ) −→ N (0, Γ(θT )) .
So, the claimed result follows from Corollary 2.1 of [16].

Proof of Theorem 3.4. Proceeding as in [7] (proof of Theorem 3.1), we get
 >   
ω b e ω e 2
Tφ,n (θn , θn ) = n θn − θn I (θn ) θn − θn + op (kθn − θn k ) .
b e b e b e

P P
So, the claimed result follows from the fact that θbn −→ θT and θen −→ θ∗ 6= θT . 
Proof of Theorem 3.5. ω
- To obtain the asymptotic distribution of Tφ,n,m , one can
proceed as in Theorem 4.1 of [7].
M. BOUKELOUA: TESTING LOCAL HYPOTHESES WITH INCOMPLETE DATA 1389

- For Wn,m ω , we have under Hω


0
 > h i−1
ω
Wn,m = mn θbn(1) − θbm (2)
m(I ω (θ(1) ) + oP (1))−1 + n(I ω (θ(1) ) + oP (1))−1
 
θbn(1) − θbm
(2)

mn  b(1) b(2) > ω (1)  


= θn − θm (I (θ ) + oP (1)) θbn(1) − θbm (2)
.
m+n
In view of [17] page 443, we have
r
mn  b(1) b(2)  D  
θn − θm −→ N 0, I(θ(1) )−1 , (8)
m+n
which implies that
r
mn  b(1) b(2) 
θ − θm = OP (1).
m+n n
So
ω mn  b(1) b(2) > ω (1)  b(1) b(2) 
Wn,m = θ − θm I (θ ) θn − θm + oP (1)
m+n n
and the claimed result follows from (8) thanks to Corollary 2.1 of [16].

Proof of Theorem 3.6. The proof follows by the same arguments used in [17], pages
441-442. 

Mohamed Boukeloua is currently an associate professor of mathematics in the


National Polytechnic Institute of Constantine, Algeria. He is a member of Labora-
tory of Process Engineering for Sustainable Development and Health Products in the
same institute. He is also a member of Laboratory of Biostatistics, Bioinformatics
and Mathematical Methodology Applied on Health Sciences. He holds a PhD in
Mathematical Statistics. His research interests are Parametric and Non parametric
Inference, Censored data, φ−divergences and their applications, Copula models, Ker-
nel estimation, Non parametric regression, Dependent data, Bayesian inference.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy