Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
We have to estimate the unknown parameters of a parent universe on the basis of the result based
on the various samples. The general theory concerned with estimation is known as the theory of
estimators.
Estimation of population parameters like (mean, variance, proportion, correlation coefficient) etc,
from the corresponding sample statistics is called statistical inference
Example:
1) A manufacturer may be interested to estimate the future demand for his product is the
market.
2) Manufacturer of bulbs or tubes may be interested in estimating the average life of his
product.
3) Manufacturer equipment may be interested to know the quality of the product by
estimating the proportion of defective pieces in the lot.
Let a random sample x 1 , x 2 , x 3 ,… … … … x n−1 , x n from a parent universe have a density function
f { x , θ } where θ is an unknown parameter to be estimated. Now our problem is to be estimated.
Now our problem is to obtain θ in terms of the sample values. There are infinite many ways of
choosing an estimator and our problem is to choose best estimator. By best we mean that the
distribution of the estimator should be concentrated near the true parameter value of θ .
Point estimation:
A particular value of a statistics which is used to estimate a given parameter is known as a point
estimate or estimator of the parameter.
Let us suppose that some characteristic of the element in a population is represented by the random
variable x which has pdf F ( x ,θ ), where the form of F is known except for the fact that it contain an
unknown parameter θ . If θ is somehow determined, then F would be completely specified. The
problem of point estimation is to pick up statistic T ( X 1 … … X n ) that represents or estimates θ . The
numerical value of T when the realization is x 1 , x 2 … … x n is called an estimator of θ . Thus an
estimator is a statistic T that estimates is called an estimator of θ . Thus an estimator is a statistic T
that estimates the parameter θ , whereas an estimate is a value of T.
Where,
n
1
S2= ∑ ( x −x )2
n−1 i=1 1
Parameter Space:
The set of all admissible value of parameter θ , associated with the population pdf F ( x ,θ ) is called
the parametric space, to be denoted by θ .
The parametric θ may be a vector of parameters. For example, for a normal population
N ( μ , σ 2 ) ,θ−( μ , σ 2 ) is a vector of paramters μ and σ 2.Hence the parametric space is
Θ= {( μ 0 , σ 2) 0−∞ < μ< ∞ , σ 2> 0 } ,
When both μ and σ 2 are unknown.If μ=μ 0 (given), and σ 2 is not known, then
Θ= {( μ 0 , σ ) : σ >0 } ,
2 2
Consistency:
X 1 , X 2 … be a sequence of independent and identically distributed r,v with common pdf
F ( x 1 ,θ ) ,θ ∈ Θ. A sequence of point estimators T n ( X 1 , … X n )=T n will be called consistent if:
T n P θ for each ¿ θ∈ Θ
→
Or if for every ∈>0
P (|1 T n−θ|>∈ ) → 0 as n→ ∞
Or if,
lim P (|1 T n −θ|<∈ ) =1 ∀ ∈> 0
n→∞
In variance property of consistent estimators.
Theorem: If T n is a consistent estimator of θ and Ψ ( θ ) is a continuous function of θ , then Ψ ( T n ) is a
consistent estimator of Ψ ( θ )
Proof: Since T n is a consistent estimator of θ
T n P θ as n → ∞ , i , e . for ∈>0 , η>0.
→
∃ a positive integer n≥ m ( ∈, η ) such that
P {|T n−θ|<∈ }>1−n , ∀ n ≥ m
Since Ψ { . } is a continuous function , for every ∈>0 , however small , ∃ a positive number ∈1 such
that
|Ψ ( T n )−Ψ ( θ )|< ∈1 whenever |T n−θ|< ∈i .e .
|T n−θ|<∈⇒|Ψ ( T n ) −Ψ (θ )|<∈1
For two events A and B, if A⇒B,then
Crammer-Rao Inequality
Theorem: If it an unbiased estimator for γ ( θ ), a function of parameter θ ,then
[ ]
2
d
γ (θ ) 2
dθ [ γˈ ( θ ) ]
var ( t ) ≥ =
( )
2
∂ I (θ)
E log L
∂θ
Where I ( θ ) is the information on θ , supplied by the sample.
Proof: Regularity Condition for Crammer-Rao Inequality
1) The parameter spaceΘ is a non-degenerate open interval on the real line R1 (−∞ , ∞ ) .
∂L
2) For almost all x=( x 1 , x 2 … … xn ) and ∀ θ ∈Θ , ( x ,θ ) exists, the exceptional set, if any is
∂θ
independent of θ .
3) The range of integration is independent of the parameter θ , so that f ( x ,θ ) is differentiation
under the integral sign.
4) The condition of uniform convergence of integrals are satisfied so that differentiation under
the integral sign is valid.
[{ }]
2
∂
5) I ( θ )=E log L ( x ,θ ) ,exists and is positive for all θ ∈Θ
∂θ
Let x be a random variable, following the probability density function f ( x ,θ ) and Let L be the
likelihood function of the random sample ( x 1 , x 2 … … x n ) from this population, Then-
n
L=L ( x , θ ) =∏ f ( x i , θ )
i=1
Since is the joint probability distribution function of f ( x 1 , x 2 … … x n )
∫ L ( x , θ ) dx=1
Where
∫ dx=∬ ⋯ ⋯ ⋯ ∫ d x 1 , d x2 ⋯ ⋯ ⋯ d xn
Differentiating with respect to θ and using regularity conditions given above, we get-
∂
∫ ∂ θ Ldx=0
∫ 1L ∂∂θL . Ldx=∫ ( ∂θ
∂
log L) Ldx=0
¿E ( ∂
∂θ )
log L =0
{ }
∞
∵ E ( x )= ∫ x f ( x ) dx
−∞
∫ t ∂∂ Lθ dx =γ ˈ ( θ ) ⇒∫ t ( ∂ log
∂θ
L
) ˈ
L dx=¿ γ ( θ ) ¿
⇒E t. ( ∂
∂θ
log L =γ ˈ (θ ) )
cov t . ( ∂
∂θ ) (
log L =E t .
∂
∂θ )
log L −E ( t ) E
∂
∂θ ( )
log L =γ ˈ ( θ )
We have
cov ( x , y )
r ≤ 1∧r=
σxσ y
⇒r =2 [ cov ( x , y ) ]
var ( x ) var ( y )
2
⇒ [ cov ( x , y ) ] ≤ var ( x ) . var ( y )
⇒ [ γ ( θ ) ] ≤ var ( t ) E (
2
ˈ 2 ∂ ∂
∂θ
⇒ [ γ ( θ ) ] ≤ var ( t ) [ E ( log L) ]
2
ˈ 2 ∂
∂θ
2
[ γ ˈ (θ ) ]
⇒ var ( t ) ≥
[( )]
2
∂
E log L
∂θ
| |
ˈ
γ (θ )
=|γ ( θ ) λ ( θ )|
ˈ
3) var ( t )=
Aθ ( )
Efficiency:
In the sampling distribution of two statistics have the same mean the statistic with the smaller
variance is called an efficient estimator of the mean while the other statistic is called an inefficient
estimator.
If one of the two consistent estimator T 1 , T 2 of a certain parameter θ . We have
V ( T 1) < V ( T 2 ) , n
If T 1 is the most efficient estimator with variance V 1 and T 2 is any other estimator with
variance V 2, then the efficiency E of T 2 is defined as
T1
E= , Where E ≤1
T2
Sufficiency:
An estimator T n is said to be sufficient if it contains all the information in the sample regarding the
population parameter θ .
If T =T ( x 1 , x 2 … … x n ) is an estimator of a parameterθ , based on a sample x 1 , x 2 … … x n of size n
from the population with density f ( x ,θ ) such that the conditional distribution of x 1 , x 2 … … x n
given T , is independent of θ , then T is sufficient estimator θ .
Testing of Hypothesis
Hypothesis:
Example :
1) A quality control manager is to determine whether a process is working properly.
2) A drug chemist is to decide whether a new drug is really effective in curing a disease.
3) A statistical has to decide whether a given coin is biased.
Simple Hypothesis:
If the statistical hypothesis specifies the population completely then it is termed as a simple
statistical hypothesis.
Example :
If x 1 , x 2 … … x n is a random sample of size n from a normal population with mean μ and variance σ 2
,then the hypothesis.
2
H 0 : μ=μ0 , σ =is a simple hypothesis .
Composite Hypothesis:
A hypothesis which does not specify the population completely then it is termed as a composite
hypothesis.
If x 1 , x 2 … … x n is a random sample of size n from a normal population with mean μ & variance
2
σ ,then the hypothesis.
For example:
I. H 0 : μ=μ0
II. H 0 :σ 2=σ 20
2 2
III. μ< μ0 , σ < σ 0
2 2
IV. μ=μ 0 ,σ < σ 0 is a composite hypothesis.
Test of Hypothesis:
A test of statistical hypothesis is a procedure or a rule for deciding whether to accept or reject
the hypothesis on the basis of sample values obtained.
For example, let x 1 , x 2 … … x n be random sample from on N ( x 1 4 ) .
Also let.
H :μ ≤ 15. One of the test is as follows
Reject H if and only if
A test is usually described in term of some statistic T =T ( x 1 , … , x n ) which reduced the
experimented data. Such a statistic associated with the test is called a test statistic.
Null hypothesis:
A hypothesis which is tested under the assumption that it is true is called a null hypothesis. The
null hypothesis asserts that there is no (significant) difference between the statistic and the
population parameter and whatever observed difference is there is merely due to fluctuations
in sampling from the same population. It is denoted by H 0.
R.A. fisher 'The hypothesis which is tested for possible rejection under the assumption that it is
true."
Symbolically, the above defination can be explained as:
Let X 1 , … X n be a random sample from population with dF F θ , θ∈ Θ whereΘ is the parametric
space.
Then the null hypothesis is,
H 0 :θ ∈ⒽoC Ⓗ
The null hypothesis is simple if H 0 is a single for set, otherwise it is composite.
Alternative Hypothesis
Any hypothesis which contraticts the null hypothesics i.e., whenever we reject the null
hypothesis, the conclusion we do accept is called the altenative hypothesis and it is denoted by
H1
The two hypothesis H 0∧H 1 and HI are such that if one is true, other is false and vice—versa.
Symbolically, it is explained as.
H 1 :θ ∈ Θ−Θ0
The hypothesis is simple if Θ−Θ0 is a singleten set, otherwise it is composite.
If we have to test whether the population mean μ has a specified value μ0 , then the null
hypothesics.
H 0 : μ=μ0
Unit - 4
Non-Parametric Test-
Almost all the exact (small) sample tests of significance are based on the fundamental assumption
that-
1) Parent popluation is normal.
2) They are concerned with testing or estimating the means and variance of these populations.
The tests, which deal with the parameters of the population are known as parametric tests. The
parametric techniques are not five distribution. Thus, a parametric test is a test whose model
specifies certain conditions about the parameters of the population from which the samples are
drawn.
On the other hand a Non-parametric test is a test that does not depend on the particular fom of the
basic frequency function from which he samples are drawn i.e. Non parametric test does not make
any assumption regarding the form of the population. It means the Non-Parametric techniques are
distribution free.
The hypothesis of a non-parametric test are concerned with something other than the value of a
population parameter. A large number of these tests exist but few of the better known and more
widely used ones.
1) The sign test for paired data, where positive or negative signs ae substituted for quantitative
values.
2) A rank sum test, often called the Mann-Whitney U test, which can used to determined
whether two independent sample have been drawn from the same population. It uses more
information than the sign test.
3) Another rank sum test. the Krushal—Walis test, which generalized the analysis of variance to
enable us to despense with the assumption that population are normally distributed.
4) The one sample runs test,a methods for determining the randomness with which sampled
items have been selected.
5) Rank correlation, a methods for doing correlation analysis when the data are not available to
use in numerical form, but when information sufficient to rank the data fist, second and so
on.
6) The Kolmogorov-Smimov test, another method for determing the goodness of fit between
an observed sample and a theoretical distribution.
Assumptions—
Certain assumptions associated with Non-parametric test are-
a) Sample observations are independent.
b) The variable under study is continuous.continuous.
c) The probability densi funct'
d) Lower order momen exist.
Yet if we represent " 189.42 " by "5 " we loose information that is contained in the value of 189.42. If
the value becomes 1189.42 and still be the fifth or largest value of list.
e) They are often not as effectient or sharp as parametric tests. These assumptions are fewer
and much weaker than those associated with parametric tests.
Sign test is specificially designed for testing hypothesis above the median of any continuous
population like mean, median is the measure of centre or location of distribution therefore the sign
sometime called as test for location.
Let x 1 , x 2 … … x n a random sample from a population with unknown median θ . Suppose we are
required to test the hypothesis H 1 :θ=θ 0 (some specified value) against the one sided alternative
H 1 :θ <θ0 or H 1 :θ >θ0∨¿
two sided H 1 :θ ≠ θ0 .
If the sample comes from distribution with median θ0 then on average half of the observation will be
greater than θ0 and half will be smaller than θ0 .
It name comes from the fact that it is based on the direction {or signs for pluses or minuses) of a of
observation not on their numerical magnitude.
Steps:
1) Replace each observation greater than θ0 plus ¿ sign and each observation smaller than θ0
by negative ( – ) sign. Sample values equal to θ0 may be ignored.
2) Count number of plus ¿ sign and denote it by r and number of negative ( – ) sign by s with
r+s≤ n.
The distribution of r given r + s with p=P [ x >θ0 ] . The number r of plus signs may be used to test
1
H 0, which is equivalent to testing for the binomial parameter i.e. for H 0 : p=
2
Note- If P-value ≤ α . then H 0 is rejected otherwise H 0 is accepted
Unit-5
Analysis of Variance:
Analysis of variance is one of the most powerful tools of statistical analysis and this technique is used
to test whether the difference between the means of three or more populations is significant or not.
The systematic procedure of this statistical technique was first developed by R.A.Fisher and the F-
distribution was named in his honour. Earlier, this tehnique was used in agricultural experiments.
NOW analysis Of variance is widely used in natural, social and physical sciences.
Example:
1) We can explain by using the technique of analysis of variance whether different varieties of
seeds or fertilizers or soils differ significantly or not as regards average yields of land.
2) A manager of a firm may use this technique to know whether there is a significant difference
in the average sale figures of different salesman employed by the firm.
3) The difference in various types of drugs manufactured by a company to cure a particular
disease may be studied through this technique.
Thus, through ANOVA technique one can, in general, investigate any number of factors which are
hypothesized or said to influence the dependent variable. One may as well investigates the
difference amongst various categories within each of these factors which may have a large number
of possible values.
The technique of analysis of vanance split up the variance :!.. venous components. Usually the
variance (or total variance) is splm,.. • two parts :
(a) Variance between the samples.
(b) Variance within the samples.
Assumptions in Analysis of Variance : The Analysis of Variance (ANOVA) is based on the following
assumptions :
1) The samples are independently or randomly drawn from the populations.
2) All the populations from which samples have been drawn are normally distributed.
3) The variance of all the populations are equal.
Technique of Analysis of Variance (ANOVA) The observations may be classified according to one
factor (criterion) or two factors. The classifications according to one factor and two factors are
respectively called one-way cfrceifications and two-way classification.
We have to make two estimates of population variance viz, one based on between sample variance
and the other based on within sample variance. The two estimates of population are compared with
F-test.
One-way classification.
one-Way classification the observation are classified according to one factor.
Example:
The yields of several piots of land may be classified according to one or more types of fertilisers
Following are the methods by which we can perform ANOVA:
(a) Direct Method
(b) Shortcut Method
(c) Coding Method
Design of Experiments:-
{ }
−x λ−1
e x
f ( x )= ,if λ>0.0< x <∞
λ!
0 ; otherwise
∞
Where, n !=∫ e
−x
x n−1 dx is known as Gamma Distribution and is denoted by γ ( λ ) .
0
∞ −x λ−1
e x
M x t=∫ e
tx
dx
0 λ!
∞
1
M x t= ∫ e
− x ( 1−t) λ−1
x dx
λ! 0
On putting
x ( 1−t ) = y
y
x=
( 1−t )
∞
( )
λ−1
1 y dy
M x t= ∫ e
−y
λ! 0 1−t ( 1−t )
∞
1
λ∫
− y λ −1
M x t= e y dy
λ! (1−t ) 0
[ ]
∞
∵∫ e
−x n−1
x dx=n !
0
1
∴ M x t= λ
[ λ !]
λ ! ( 1−t )
M x t=( 1−t )− λ
Hence, This is the expression for Moment Generating function of Gamma Distribution.
Moments=μ1 , μ 2 …
Constants=μ 1 , μ2 …
β 1=4
Kurtosis:
μ4 3 λ ( 2+ λ )
β 2= 2
=
μ 2 λ2
3 ( 2+ λ )
β 2=
λ
Additive property of Gamma Distribution:
Let x i ( where i=1,2,3 , … k ) be the continuous random variables of gamma distribution. We know
that moment generating function of Gamma Disribution will be given as
−λ i
M x t = (1−t )
i
Since, we know that the Moment Generating function of sum of variates ( x i +1+2+3+… k ) is given
by
Which is again a Moment Generating function. Therefore, the result follows by uniqueness
theorem.
βeta Distribution :-
There are two types of βeta Distribution−¿
st
I. βeta Distribution of Kind I
nd
II. βeta Distribution of Kind II
st
βeta Distribution of Kind I :-
The continuous random variable which is distributed according to the probability Law
{ }
1 μ −1 v−1
x ( 1−x ) ( μ , v ) > 0 ,0< x<1
f ( x )= β ( μ , v )
0 , otherwise
1
Where β ( m, n )=∫ x
m−1
( 1−x )n−1 dx is known as βeta Distribution of kind I st
0
1
1
μr =∫ x
ˈ r μ−1 v−1
x ( 1−x ) dx
0 β (μ , v )
1
1
μ=
ˈ
r ∫
β (μ,v) 0
x
μ +r−1
( 1−x )v−1 dx
{ }
1
m !n !
∵ β ( m, n ) =∫ x
m −1
( 1−x )n−1 dx , β ( m , n )=
0 ( m +n ) !
ˈ 1
μr = β ( μ+ r , v )
β (μ,v)
( μ+ v ) ! ( μ+r ) ! v !
μˈr =
μ ! v ! ( μ+r + v ) !
( μ+ v ) ! ( μ+ r ) !
μˈr =
μ ! ( μ+r + v ) !
{ ( n+1 ) !=n ! n }
Now, mean
ˈ ( μ+ v ) ! ( μ+1 ) !
μ1=
μ ! ( μ +v +1 ) !
ˈ ( μ +v ) ! μ! μ
μ1 =
( μ+ v )( μ+ v ) ! μ !
ˈ μ ˈ
μ1 = =μ 1( mean)
( μ+ v )
ˈ ( μ+ v ) ! ( μ+2 ) !
μ2=
μ ! ( μ +v +2 ) !
μ ( μ+1 ) μ! ( μ+ v ) !
μˈ2=
( μ+ v+1 )( μ+ v )( μ+ v ) ! μ !
ˈ μ ( μ+1 )
μ2=
( μ+ v+1 )( μ+ v )
Since, variance
2
μ2=μ ˈ2−( μˈ1 )
[ ]
2
μ ( μ+1 ) μ
μ2= −
( μ+ v+1 )( μ+ v ) ( μ+ v )
μ ( μ+1 ) μ2
μ2= −
( μ+ v+1 )( μ+ v ) ( μ+ v )2
2
μ ( μ +1 )( μ+ v )−μ ( μ+v +1 )
μ2 =
( μ+ v +1 ) ( μ+ v )2
μ2=
μ
( μ+ v ){( μ+1 ) ( μ+ v )−( μ 2+ μv + μ )
( μ+ v+1 ) ( μ+ v ) }
μ2=
μ
( μ+ v ){μ2 + μv+ μ+ v−μ 2−μv−μ
μ2 + μv+ μv + v 2+ μ+ v }
μv
μ2=
( μ+ v )( μ+ v+ 1 )( μ+ v )
μv
μ2= 2
( variance)
( μ+ v ) ( μ+ v+1 )
μ3
{ }
1 x μ−1
f ( x )= β ( μ , v ) ( 1+ x )μ+ v ( μ , v )> 0 , 0< x< 1
0 , otherwise
∞
1 xr −1
μ =∫ x r
ˈ
r dx
0 β ( μ , v ) ( 1+ x )μ +v
∞ μ+r −1
1 x
μ=
ˈ
r ∫
β ( μ , v ) 0 ( 1+ x ) μ+v +r−r
dx
{ }
∞ m−1
x
∵ β ( m, n ) =∫
0 ( 1+ x )m+ n
∞ μ+r −1
1 x
ˈ
μ=
r ∫
β ( μ , v ) 0 ( 1+ x ) μ+v +r−r
dx
ˈ 1
μr = . β ( μ+r , v−r )
β (μ,v)
1 ( μ +r ) ! ( v−r ) !
so μˈr =
μ !v ! ( μ+r + v−r )
( μ+v ) !
( μ+ r ) ! ( v−r ) !
μˈr =
μ !v !
In Particular when r =1, from eqn ( 3 )
Mean,
ˈ ( μ+1 ) ! ( v−1 ) !
μ1=
μ!v !
μ ! μ ( v −1 ) !
μˈ1=
μ ! ( v−1 ) v−1!
ˈ μ
μ1= =μ (mean)
( v −1 ) 1
( μ+2 ) ! ( v−2 ) !
μˈ2=
μ!v !
( μ+1 ) μ !μ ( v−2 ) !
μˈ2=
μ ! ( v−1 )( v−2 ) v−2 !
μ ( μ+1 )
μˈ2=
( v −1 )( v−2 )
and variance
μ ( μ+1 ) μ2
μ2= −
( v −1 )( v−2 ) ( v−1 )2
μ2=
μ
[
( μ +1 )
−
μ
( v −1 ) ( v−2 ) ( v−1 ) ]
μ2 =
μ
( v −1 ) [
μv−μ+ v−1−μv +2 μ
( v−1 ) ( v−2 ) ]
μ2=
μ
[
( v +1 ) ( μ−1 )
( v −1 ) ( v−1 )( v−2 ) ]
μ ( μ+ v−1 )
μ2= variance
( v −1 )2 ( v−2 )
Exam Plan
Unit-I
1) Estimation Theory
a) Point estimation
b) Interval estimation
2) Parameteric Space
3) Consistency and sufficient condition for consistency
4) Unbaisedness
5) Efficiency and most efficient estimator
6) Sufficiency and Properties of sufficient estimators.
7) Properties of Maximum Likelihood Estimators(M.L.E)
8) Method of Minimum Variance(M.M.V)
9) Crammer-Rao Inequality and Conditions for the equality sign in CR Inequality
10) Unit-II
11) Hypothesis and Example of it.
12) Simple Hypothesis and Example of it.
13) Composite Hypothesis
14) Test of Hypothesis
15) Null Hypothesis
16) Alternative Hypothesis
17) Two-types of errors in sampling
a) Type I errors
b) Type II errors
18) Probability forms
19) Critical Region
20) Critical Value
21) Best Critical Region
22) One Sided Test, Left Sided Test & Right Sided Test
23)
Exam Question
1) Explain the method of maximum likelihood. State the properties of the maximum likelihood
estimator.
2) State and prove Cramer-Rao Inequality. Let x 1 , x 2 … … x n be a random sample from N ( μ , σ 2 )
where σ 2 is known. Obtain M.V.U.E. for μ.
3) Explain with suitable illustrative examples, the problem of estimation, and the criteria of
unbiasedness, consistency, efficiency and sufficiency for an estimator.
4) Find the maximum likelihood estimators of θ for a random sample x 1 , x 2 … … x n from a
distribution:
−x
1 θ
a) with probability density function f ( x )= e ;0<x
θ
e−θ θ x
b) with probability mass function p ( x ) = ; x=0 , 1 ,2 , …
x!
5) Explain the problem of estimation in statistical theory, with a suitable example.
If x 1 , x 2 , x 3 … … , x n are independent random observations from a normal population with mean
μ and variance σ 2, where σ 2 is finite, and
n n
1 1
x= ∑
n i =1
x i , s = ∑ ( x i−x )
2
n i=1
2