Engineering Mathematics - IV (15MAT41) Module-V: SAMPLING THEORY and Stochastic Process
Engineering Mathematics - IV (15MAT41) Module-V: SAMPLING THEORY and Stochastic Process
Engineering Mathematics - IV (15MAT41) Module-V: SAMPLING THEORY and Stochastic Process
By
Dr. K.S.BASAVARAJAPPA
Professor and Head,
Department of Mathematics,
Bapuji Institute of Engineering and Technology,
Davangere-4, Karnataka
E mail : ksbraju@hotmail.com
Module-V : Sampling Theory and Stochastic
Process
Module-V : Sampling Theory and Stochastic Process
Sampling
Sampling distribution
Standard error
Test of Hypothesis for means and proportions
Confidence limits for means
Student’s t – distribution
Chi - square distribution as a test of goodness of fit
Stochastic processes
Stochastic processes
Probability vector
Stochastic matrices, Fixed points
Regular stochastic matrices
Markov chains
Higher transition probability
Simple problems
Sampling Theory
Sampling theory is the field of statistics that is involved with the collection, analysis and
interpretation of data gathered from random samples of a population under study. The application
of sampling theory is concerned not only with the proper selection of observations from the
population that will constitute the random sample
Sampling aims at gathering maximum information about the population with the minimum effort,
cost and time. Sampling determines the reliability of these estimates. The logic of the sampling
theory is the logic of induction in which we pass from a particular(sample) to general (population).
Such a generalization from sample to population is called Statistical Inference.
It also involves the use of probability theory, along with prior knowledge about the population
parameters, to analyse the data from the random sample and develop conclusions from the
analysis. The normal distribution, along with related probability distributions, is most heavily
utilized in developing the theoretical background for sampling theory.
Population Size( N )
It is the total number of members of the population denoted by „N‟
Sample Size( n )
It is the number included in the sample denoted by „n‟
Main objectives of sampling
To obtain the maximum information about the population with the minimum effort
To state the limits of accuracy of estimates based on samples
Systematic Sampling
It involves the selection of sample units at equal intervals after all the units in the population have
been arranged in some order
Multistage Sampling
It refers to a sampling procedure which is carried out in several stages
There is often considerable interest in whether the sampling distribution can be approximated
by an asymptotic distribution, which corresponds to the limiting case either as the number of
random samples of finite size, taken from an infinite population and used to produce the
distribution, tends to infinity, or when just one equally-infinite-size "sample" is taken of that same
population.
Sampling Distribution
It is the probability law which the statistic follows if repeated random samples of a fixed size
are drawn from a specified population
The probability distribution of the statistic of the mean of x values will be called as sampling
distribution of sample mean
Two important sampling distributions(Large samples)
Sampling distribution of sample mean( mean of x )
If x denotes the mean of a random sample of size n drawn from a population with mean µ and
standard deviation σ, then the sampling distribution of x is approximately a normal distribution
with
Mean = µ and standard deviation standard error of x
Consider all possible samples of size „n‟ which can be drawn from a given population at
random. For example, we can compute the mean. The means of the samples will not be identical. If we
group these different means according to their frequencies, the frequency distribution so formed is known
as sampling distribution of the mean. Similarly we have sampling distribution of the standard
deviation etc.
STANDARD ERROR
The standard deviation of the sampling distribution is called standard error. Thus the standard
error of the sampling distribution of means is called standard error opf means. The reciprocal of the
standard error is called precision.
TESTING HYPOTHESIS
To reach decisions about populations on the basis of sample information, we make certain
assumptions about the populations involved. Such assumptions, which may or may not be true, are
called Statistical hypothesis. By testing a hypotheis is meant a process for deciding whether to
accept or reject the hypothesis.
The method consists in assuming the hypothesis as correct and then computing the probability of
getting the observed sample. If this probability is less than a certain pre- assigned value the
hypothesis is rejected.
The American statistician J. Neyman(1894-1981) and the English statistician E.S.Pearson(1895-
1980) - son of Kari Pearson developed a systematic theory of tests around 1930.
ERRORS
If a hypothesis is rejected while it should have been accepted, we say that a Type I error has been
estimated. On the other hand, if a hypothesis is accepted while it should have been rejected , we
say that a Type II error has been made. To reduce the both types of errors is to increase the
sample size, if possible.
NULL HYPOTHESIS
The hypothesis formulated for the sake of rejecting it, under the assumption that it is true, is called
the null hypothesis and is denoted by . To test whether procedures is better than another, we
assume that there is no difference between the procedures. Similarly to test whether there is a
relationship between two variates, we take that there is no relationship.
By accepting a null hypothesis, we mean that on the basis of the statistic calculated from the
sample, we do not reject the hypothesis. It however, does not imply that the hypothesis is proved
to be true. Nor its rejection implies that it is disproved.
LEVEL OF SIGNIFICANCE
The probability level below which we reject the hypothesis is known as the level of significance.
The region in which a sample value falling is rejected, is known as critical region. We generally
take two critical regions which cover 5% and 1% areas of the normal curve.
The shaded portion in the figure corresponds to 5% level of significance. Thus the probability of
the value of the variate falling in the critical region is the level of significance
TEST OF SIGNIFICANCE
The procedure which enables us to decide whether to accept or reject the hypothesis is called the
test of significance. Test is carried out by whether the differences between the sample values and
the population values are so large that they signify evidence against the hypothesis or these
differences are so small as to account for fluctuations of sampling.
CONFIDENCE LIMITS
Example:
A sample of 900 items has mean 3.4 and S.D 2.61. Can it be regarded as drawn from a population
with mean 3.25 at 5% level of significance?
Solution:
Given n = 900, Sample Mean = 3.4,Population Mean( µ) = 3.25, S.D = 2.61 = √npq, Then the
Standard normal variate(SNV) is,
SNV (Z) = (Sample mean – Population mean)( √ (n) / √npq
= (3.4 – 3.25)(√900)/2.61 =1.73
I Z I = 1.73 < 1.96
Conclusion: The given sample can be regarded as one from the population with mean 3.25
Example:
A sugar factory is expected to sell sugar in 100 kg bags, A sample of 144 bags gives mean as 99
kg and S.D as 4. Can we conclude that the factory is working as per standards?
Solution:
Given n = 144, Sample Mean = 99,
Population Mean(µ) = 100, S.D = 4, Then the Standard normal variate(SNV) is ,
SNV (Z) = (Sample mean – Population mean)( √ (n) / S.D
= (99 – 100)(√144)/4 = - 3
I Z I = 3 > 1.96
Conclusion: Factory is not working as per standards
EXAMPLE
PROBLEMS
A coin is tossed 400 times and it turns up head 216 times. Discuss whether the coin may be an
unbiased one.
A die is tossed 960 times and it falls with 5 upwards 184 times. Is the die biased.
Balls are drawn from a bag containing equal number of black and white balls, each ball being
replaced before drawing another. In 2250 drawings 1018 balck and 1232 while balls have been
drawn. Do you suspect some bias on the part of the drawer
SAMPLING VARIABLES
Let us consider sampling of a variable such as weight, height etc. Each member of the population
gives a value of the variable and the population is a frequency distribution of the variable. Thus a
random sample of size „n‟ from the population is same as selecting „n‟ values of the variable from
those of the distribution.
Properties of t - Distribution
This curve is symmetrical about the line t = 0, like the normal curve, since only even powers of t
appear in equation. But it is more peaked than the normal curve with same S.D. The t-curve
approaches the horizontal axis less rapidly than the normal curve. Also t –curve attains its
maximum value at t = 0 so that its mode coincides with the mean.
Fig - 2
SIGNIFICANE TEST OF A SAMPLE MEAN
Then find the value of p for the given df (degrees of freedom) from the table.
If the calculated value of t > t at 0.05, then the difference between mean and µ is said to be
significant at 5% level of significance.
If calculated value of t > t at 0.01, then the difference is said to be significant at 1% level of
significance.
If calculated value of t < t at 0.05, then the difference is said to be consistent with the hypothesis
that µ is the mean of the population.
Example
x d = x – 48 d2
45 -3 9
47 -1 1
50 2 4
52 2 4
48 0 0
47 -1 1
49 1 1
53 5 25
51 3 9
Total 10 66
Taking A = 48, n = 9, We find the mean and S.D as
Mean of the sample = A + (∑d)/9
= 48 + 10/9 = 49.1
Variance = (∑d *d)/9 – ((∑d)/9)((∑d)/9)
= 498/81 = 6.15
Then S.D (σ ) = 2.48
For the df = 9-1 = 8, the table value = 2.31
Therefore
t = (Sample mean – Population mean)( √ n) / S.D
= (49.1 – 47.5)(√900)/2.48 =1.83
I t I = 1.83
t(calculated value) = 1.83 < t(table value) = 2.31
Conclusion:
This implies that, the value of t is significant at 5% level of significance
The test provides evidence against the population mean being 47.5
Example:
A certain stimulus is administered to each of the 12 patients give the increase in blood pressure as
x: 5,2,8,-1,3,0,-2,1,5,0,4,6. Can it be concluded that the stimulus Will in general be accompanied
by an increase in blood pressure?
Solution:
Given n = 12, Population Mean( µ) = 0,
Sample Mean = (5+2+8-1+3+0-2+1+5+0+4+6)/12
= 2.583,
S.D = √∑((x – mean)**2)/n-1 = 2.9571
Solution: We compute the mean and the standard deviation of the difference between the marks of
the two tests as under:
Let d = marks of test I – marks of test-2, then
Mean of d = 11/11 = 1
Then the variance =((∑(d-mean of d)**2))/n-1)
= 50/10 = 5,
then the S.D = 2.24
Assuming that students have not been benefitted by extra coaching, it implies that the mean of the
difference between the marks of the two tests is zero ( µ = 0 )
Then by student t-test
t = ((mean of d - µ)/S.D) √n
= (1-0) (√11)/ 2.24
= 1.48 < t = 2.228 at t-0,05
Conclusion:
Here t = 1.48 < t = 2.228 at t-0,05, then the value of t is not significant at 5% level of significance
The test provides no evidence that the students have benefitted by extra coaching.
Example:
A machinist is making engine parts with axel diameter of 0.7 inches. A random sample of 10 parts
shows mean diameter as 0.742 and S.D as 0.04 inches. On the basis of this sample, would you say
that the work is inferior?
Solution:
Given n = 10,
Population Mean( µ) = 0.7,
Sample Mean = 0.742,
S.D = 0.04
Goodness of Fit
The value of chi-square is used to test whether the deviations of the observed frequencies from the
expected frequencies are significant or not. It is also used to test how will a set of observations fit
a given distribution.
Chi-square therefore provides a test of goodness of fit and may be used to examine the validity of
some hypothesis about an observed frequency distribution
From the Chi-square table , we can also find the probability p corresponding to the calculated
values of Chi-square for the given df.
If p < 0.05, The observed value of Chi-square is significant at 5% level of significance
If p < 0.01, The value of significance is at 1% level.
If p > 0.05, It is a good fit and the value is not significant.
Example: In experiments on pea breeding, the following frequencies of seeds were obtained.
Theory predicts that the frequencies should be in proportions 9:3:3:1. Examine the correspondence
between theory and experiment.
Problem:
A die is thrown 264 times and the number appearing on the face( ) the following frequency distribution
x 1 2 3 4 5 6
f 40 32 28 58 54 60
Calculate .
Observed 40 32 28 58 54 60
frequency
Expected 44 44 44 44 44 44
frequency
( )
∑
The table value of Chi-square for 5 df is 11.07, then calculated value 22 > 11.07, the
hypothesis is rejected
Problem :
Fit a poisson distribution to the fallowing data and test for its goodness of fit at level of
significance 0.05
x 0 1 2 3 4
x 0 1 2 3 4
Problem:
Fit a poisson distribution to the fallowing data and test for its goodness of fit given that
for d.f = 4
x 0 1 2 3 4
frequency 122 60 15 2 1
= 121,61,15,3,0
x 0 1 2 3 4
Observed 122 60 15 2 1
frequency
Expected 121 61 15 3 0
frequency
( )
∑
Example:
A set of similar coins is tossed 320 times then, the following results are obtained : No. of heads:
0,1,2,3,4,5 and getting head up as Frequency(f): 6,27,72,112,71,32 .
Test the hypothesis that “the data follows the binomial distribution using chi – square test”.
Solution: Probability of getting head up by tossing a single coin p( H ) = ½= p(say), then,
probability of not getting head up by tossing a single coin
q( H ) = ½ = q(say).
Then by fitting the binomial distribution for getting no. of heads 0,1,2,3,4,5 will be
N *p(r) = N* [ q + p ]**r,
Here N = ∑f = 6 + 27 + 72 + 112 + 71 + 32 = 320
N *p(r) = 320* [ 1/2 + 1/2 ]**5, where r = 0,…5 , we get
N *p(r) = 10 , 50, 100, 100, 50, 10 = Expected frequencies(E)
Then by chi- square test we get,
Example:
Genetic theory states that children having one parent of blood type A and the other type B will
always be of one of three types A, AB and B and that the proportion of three types will be on an
average be as 1: 2:1. A report states that out of 300 children having one A parent and B parent
then, 30% were found to be type A, 45% were found to be type AB and the remainder type B, Test
the hypothesis by X**2 test that “the observed results support genetic theory”(the table value of
X**2 at 0.05 = 5.991 for 2 df).
Solution:
Observed frequencies of the given types are:
Type A : 30% of 300 children = 30 * 300/100 = 90
Type AB : 45% of 300 children = 45 * 300/100 = 135
Type B : 25% of 300 children = 25 * 300/100 = 75
Then we consider observed frequencies(O) as
O : 90, 135, 75
Stochastic process
Probability vector
Stochastic matrices, Fixed points
Regular stochastic matrices
Markov chains
Higher transition probability
Simple problems
Stochastic process :
In probability theory and related fields, a stochastic or random process is a mathematical object
usually defined as a collection of random variables. Historically, the random variables were
associated with or indexed by a set of numbers, usually viewed as points in time, giving the
interpretation of a stochastic process representing numerical values of some system randomly
changing over time.
Stochastic processes are widely used as mathematical models of systems and phenomena
that appear to vary in a random manner.
Biology,
Chemistry,
Ecology,
Neuroscience
Physics
In technology and Engineering fields such as,
Image processing
Signal processing
Information theory
Computer science
Cryptography
Telecommunications
Furthermore, seemingly random changes in financial markets have motivated the extensive
use of stochastic processes in finance.
Introduction:
A stochastic or random process can be defined as a collection of random variables that is indexed
by some mathematical set, meaning that each random variable of the stochastic process is uniquely
associated with an element in the set.
The set used to index the random variables is called the index set.
Historically, the index set was some subset of the real line, such as the natural numbers, giving the
index set the interpretation of time.
Each random variable in the collection takes values from the same mathematical space known as
the state space.
Probability vector is a convenient, compact notation for denoting the behavior of a discrete
random variable.
Probability Vector with each one of its components as non – negative is denoted by,
V = ( V1 , V2, ……….. Vn )
The sum is equal to unity,
∑ Vi = 1, i = 1 to n, where Vi ≥ 0.
Stochastic Matrix
A square matrix p = [ Pi j ] with every row in the form of probability vector is called stochastic
matrix
or
p = [ Pi j ] is a square matrix with each row being a probability vector
V =[ ] → is a Stochastic Matrix
A =[ ]
Let =[ ][ ]=[ ]
Then
row I gives 1/2 + 1/2 = 1 and
row II gives 1/4 + 3/4 = 1
Therefore A is a regular stochastic matrix
Example
Example: physical systems. If the state space contains the masses, velocities and accelerations
of particles subject to Newton‟s laws of mechanics, the system in Markovian (but not random!)
Example:
speech recognition. Context can be important for identifying words. Context can be modelled as a
probability distribution for the next word given the most recent k words. This can be written as a
Markov chain whose state is a vector of k consecutive words.
Epidemics.
Suppose each infected individual has some chance of contacting each susceptible individual in
each time interval, before becoming removed (recovered or hospitalized). Then, the number of
infected and susceptible individuals may be modelled as a Markov chain.
Transition state
A state I is said to be recurrent state if the system in this state at some step and there is a
chance that it will now return to that state.
The entry In the transition probability matrix p of the Markov chain in the probability that the
system changes from the state Ai to Aj in n steps. j
Define Pij = PXn+1=j|Xn=i. Let P = [Pij] denote the (possibly infinite) transition matrix of the
one-step transition probabilities.
P2 ij = X k
PXn+1=k|Xn=iPXn+2=j|Xn+1=k= X k