Biostatistics-1
Biostatistics-1
Problem 1
Protein intake
No. of Families
(gms/day)
15 - 25 30
25 - 35 40
35 - 45 100
45 - 55 110
55 - 65 80
65 - 75 30
75 - 85 10
Find out the mean, median, mode, standard deviation & coefficient of variation.
Mean:∑
fx
Solution:
n
= 19000 / 400
x = 47.5g/day
C]
(n/2−F)
L= 45,n=400,F= 170, f = 110, C = 10
f
Median = L+[
10]
(400/2− 170)
= 45+[ 110
10]
(200− 170)
= 45+[ 110
= 45+300/110
= 47.7
Median = 47.7 g/day
d 1c
Lm +d1+d2
=
d1= frequency in modal class – frequency in preceding class,
Lm= lower limit of modal class,
(110−100)10
= 45+ (110− 100)+(110 −80)
= 45 +100/40
= 47.5
Mode = 47.5g/day
Standard deviation = √
∑ f(x−x)2
∑f
No. of
f(x −
Protein intake Middle of class
(calories in families interval
x)2
f x (x-x) (x-x)2
g/day) x
f
15 – 25 -
30 20 600 756.25 22687.5
27.5
25 - 35 -
40 30 1200 306.25 12250
17.5
35 - 45
100 40 4000 -7.5 56.25 5625
45 - 55
110 50 5500 2.5 6.25 687.5
55 - 65
80 60 4800 12.5 156.25 12500
65 - 75
30 70 2100 22.5 506.25 15187.5
75 - 85
10 80 800 32.5 1056.25 10562.5
Standard deviation:
S.D = √
2
∑ f(x−x)
∑f
= √79500/400
= √198.75
= 14.1 g / day.
Coefficient of variation
C.V(%) = S.D. × 100 = (14.1/47.5)x100 = 29.68%
Mean
Problem 2
Solution:
a. Since it is a normal distribution, mean, median, mode must coincide. 50% of the
newborn can be expected to have weight above 2.8kg.
b. Low birth weight is less or greater than 2 SD on either side. 5% of the newborns fall
outside 95% C1. (<2SD + >2 SD). So 2.5% of the newborns can be expected to be
low birth weight.
The above interval means that it is for 95% confidence that the mean birth weight
in that block would fall between 2.7412 to 2.8588 kg.
=√
𝑝𝑞 2211
Standard Error of proportion (SEP)
√√
= 33*67
=
𝑛 10 100
=√22.11 = 4.70
0
The above interval means that it is for 95% confidence that the estimate of low birth
weight in that block would fall between 23.788% to 42.212%
Problem 3, 4
3. The mean systolic BP of 100 newly recruited employees was 100 mm Hg with
standard deviation of 10. Find out what would be the number of employees with
systolic BP between 100 of 120 mm Hg?
Solution:
Table value for Z=1 (Z table for area under standard normal curve from 0 to Z) is 0.3413
Since we need the no. of students with Hb ≥ 14g% , curve beyond 0.3413 is 0.5- 0.3413 =
0.1587.
Solution:
population Hb level ie 𝑥̅ = µ
Null hypothesis: There is no significant difference between the sample Hb level and
√𝑛
=
𝑋−μ |10
.89−11.4|
Critical ratio: Z= = 0.51/0.24 = 2.125
𝑆.𝐸. 0.24
1.96.
Our observed value is 2.125, which is greater than the table value. Therefore the
probability of observing this value or greater than that by chance is < 5%. Hence the null
hypothesis is rejected (P<0.05).
Inference:
There is significant difference between sample Hb level and population Hb level.
Problem 6
Solution:
Given data:n= 246;
p= 45/246= 0.183= 18.3%;
q= 1-p = 1-0.183 = 0.817= 81.7%
𝑃= 20% = 0.2.
Null hypothesis: No difference between the population and sample prevalence. ie. p=P
SE of proportion =√ = √
𝑝𝑞 18.3* 81.7
= 2.46
𝑛 246
Inference:
There is no statistically significant difference between the sample and population.Hence, it
confirms with the statement that 20% of school children have otological problem.
Problem 7
7. In a nutritional survey 100 children were given a special diet and another
comparable group routine diet. The average weight of group was 30kg and 27 kg
and 5 kg and 4 kg respectively. Can we say that the special diet was responsible
for difference in weight statistically? Give your comments.
Solution:
= 5√ +
2 2
𝑆𝐷1 𝑆𝐷2
= +
2
4
√
2
𝑛1 𝑛2 10 100
0
=√0.25 + 0.16
= 0.64.
𝑆.𝐸. 0.27
The Z value observed is more than the table value (4.69 > 1.96) at 5% significance level.
So, the Null hypothesis is rejected.
Therefore there is statistically significant difference in weight among the children who
had special diet and routine diet.
Problem 8
8. Out of 1200 persons examined for leprosy in a village, 42 were detected. In other
village, 24 cases detected from a population of 800. Does the data indicate that
the difference in proportion in number of leprosy cases is statistically significant?
Give your comments.
Solution:
Given data:
n1= 1200 n2= 800
p1 = 42/1200=0.035 p2= 24/800=0.03
q1= 1- p1 q2= 1- p2
q1= 0.965 q2= 0.97
=√ +
0.035 * 0.965 0.03* 0.97
1200 800
= √0.0000644=
0.008
The Z value observed is less than the table value (0.625< 1.96) at 5% significance level. So,
the Null hypothesis is not rejected.
Inference:
Therefore there is no statistically significant difference in proportion in number of leprosy
cases between the two villages.
Problem 9
9. 300 polio vaccines were dispatched using ice packs made from deep freezer and
dry ice purchased from ice factories for maintaining cold chain. The No. of success
and failure as detected from the VVM are tabulated below.
Success Failure
Ice Pack 211 14
Dry Ice 65 10
Test whether there is any statistically significant difference between the two
using Chi- square method.
Solution:
Null hypothesis: There is no difference b/w ice packs made from deep freezer and dry
ice factories.
2. 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑣𝑎𝑙𝑢𝑒𝑖𝑛𝑎𝑐𝑒𝑙𝑙 =
1. Observed O1= 211; O2 = 14; O3= 65; O4= 10
𝑅𝑜w𝑡𝑜𝑡𝑎𝑙𝑋𝑐𝑜𝑙𝑢𝑚𝑛𝑡𝑜𝑡𝑎𝑙
𝐺𝑟𝑎𝑛𝑑𝑡𝑜𝑡𝑎𝑙
27 𝑋 225
276 𝑋 225 = 18
= 207 300
300
75 𝑋 276 75 𝑋 24
= 69 =6
300 300
=
∑4
2
(0−𝐸)
2 (211−207
2 (14−18)
2 (65−69)
2 (10−6)
2
)
3.
𝑖=1
+
69 6
+
𝐸
= +
207 18
= 207 + 18 +69 +6
16 16 16 16
Our values is 2 = 3.865 which is greater than the table of 3.84. Therefore the
probability if observing this value of 3.865 or greater than that by chance is <5%.
Therefore Null hypothesis is rejected (p<0.05)
Inference: There is statistically significant difference between ice packs made from
deep freezer and dry ice from ice factories.
Problem 10
10. In a pilot study done on malnourished babies, 6 months old, mean weight in 5kg
with a SD of 0.5 kg. Calculate sample size for the major study with an allowable error
of 1% and 5% risk.
Solution:
Mean = 5 kg, SD = 0.5 kg
N = Z² (1 - /2) S²
L²
a. On assuming allowable error of 1%
11. To study the Diarrheal diseases in a community with a prevalence rate of 2%,
calculate the sample size with an allowable error of 5%. Demonstrate that decision
on the allowable error determines the sample size.
Solution
a. Diarrheal diseases prevalence rate p = 2%, q = 100 – p = 98%
𝑧²(1−𝛼) (𝑝)
(𝑞)
𝑑²
Sample size = , d = 20% 0f 2 = 0.4
= (1.96)² x 2 x 98
(0.4)²
Sample size = 4705.9 = 4706
Solution:
13. In a dietary experiment, the mean weight of children are respectively 3000 g and
3200 g before and after the intervention.The combined SD was 500g. Determine the
sample size to conduct a study to ascertain the real difference due to the special diet.
( with Zα at 95% level with an allowable error of 10% and power of the test 90%.
Solution:
14. The time x in years that a doctor spent at a hospital and the his hourly pay, y, for 5
doctors are listed in the table below. Calculate and interpret the correlation coefficient r.
x y
5 25
3 20
4 21
10 35
15 38
Answer
x y x2 y2 xy
5 25 25 625 125
3 20 9 400 60
4 21 16 441 84
Interpret this result: There is a strong positive correlation between the number of
years and employee has worked and the employee’s salary, since r is very close to
1.
Problem 15
15. The table below shows the height, x, in inches and the pulse rate, y, per minute,
for 9 people. Find the correlation coefficient and interpret your result.
x 68 72 65 70 62 75 78 64 68
y 90 85 88 100 105 98 70 65 72
You may use the facts that (double check this for practice)
∑x = 622,
∑y = 773
∑x2 = 43206
∑ y2 = 68007
∑ xy = 53336.
−445
Now, divide to get r 0.38.
1159.66
Interpret this result: There is a weak negative correlation between the study time and
final exam grade, since r is closer to 0 than it is to −1