0% found this document useful (0 votes)
77 views

Biostatistics-1

DSS

Uploaded by

Murali dharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Biostatistics-1

DSS

Uploaded by

Murali dharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Department of Community Medicine

BGS Medical College and Hospital (BGS - MCH)


BGS Vijnatham Campus, Nagaruru,
Bengaluru North, &, PU College Campus,
Bengaluru, Karnataka 562123

Problem 1

1. In a dietary survey conducted as a part of nutritional assessment in semi urban area


of Tamil Nadu, the protein intake (gms/day) of 400 families is as given below.

Protein intake
No. of Families
(gms/day)

15 - 25 30

25 - 35 40

35 - 45 100

45 - 55 110

55 - 65 80

65 - 75 30

75 - 85 10

Find out the mean, median, mode, standard deviation & coefficient of variation.

Mean:∑
fx
Solution:
n
= 19000 / 400
x = 47.5g/day

Median of grouped data:


C]
(n/2−F)
= L+[ f

L = lower limit of the median


class n= total no of observations
F= no of observations upto median class
f= frequency in the median class
C = Class interval

Protein intake (g/day) No of families(f) Cumulative frequency


15-25 30 30
25-35 40 70
35-45 100 170
45-55 110 280
55-65 80 360
65-75 30 390
75-85 10 400

Median class = 45-55

C]
(n/2−F)
L= 45,n=400,F= 170, f = 110, C = 10
f
Median = L+[

10]
(400/2− 170)
= 45+[ 110

10]
(200− 170)
= 45+[ 110
= 45+300/110
= 47.7
Median = 47.7 g/day

Mode of grouped data:

d 1c
Lm +d1+d2
=
d1= frequency in modal class – frequency in preceding class,
Lm= lower limit of modal class,

d2= frequency in modal class – frequency in succeeding class,


C = class interval of modal class

(110−100)10
= 45+ (110− 100)+(110 −80)
= 45 +100/40
= 47.5
Mode = 47.5g/day

Standard deviation = √
∑ f(x−x)2

∑f

No. of

f(x −
Protein intake Middle of class
(calories in families interval
x)2
f x (x-x) (x-x)2
g/day) x
f
15 – 25 -
30 20 600 756.25 22687.5
27.5
25 - 35 -
40 30 1200 306.25 12250
17.5
35 - 45
100 40 4000 -7.5 56.25 5625
45 - 55
110 50 5500 2.5 6.25 687.5
55 - 65
80 60 4800 12.5 156.25 12500
65 - 75
30 70 2100 22.5 506.25 15187.5
75 - 85
10 80 800 32.5 1056.25 10562.5

400 19000 2843.75 79500

Standard deviation:

S.D = √
2
∑ f(x−x)

∑f
= √79500/400
= √198.75
= 14.1 g / day.

Variance = (S. D)2 = (14.1)2= 198.75

Coefficient of variation
C.V(%) = S.D. × 100 = (14.1/47.5)x100 = 29.68%
Mean
Problem 2

2. A random sample of 100 newborns selected from a block were


weighed: Mean – 2.8 kg SD – 300 gms
Assuming that the distribution is normal.
a) What proportion of the newborns can be expected to have weight above 2.8 kg?
b) What proportion can be expected to be low birth weight?
c) Calculate 95% confidence interval. Write in one sentence what the above 95%
confidence interval means.
d) In the same sample of 100 children, 33% where found by clinical assessment to suffer
from low birth weight. Compute the 95% C.I. estimate of low birth weight of
the population.

Solution:
a. Since it is a normal distribution, mean, median, mode must coincide. 50% of the
newborn can be expected to have weight above 2.8kg.

b. Low birth weight is less or greater than 2 SD on either side. 5% of the newborns fall
outside 95% C1. (<2SD + >2 SD). So 2.5% of the newborns can be expected to be
low birth weight.

c. 95% confidence interval: Mean ± 1.96(SE of


mean). SE Standard error of Mean = SD/√n
=0.3/√100=0.3/10=0.03
95% confidence interval = 2.8 ± 1.96(0.03)
= 2.8±0.0588
=2.7412 to 2.8588

The above interval means that it is for 95% confidence that the mean birth weight
in that block would fall between 2.7412 to 2.8588 kg.

d. Given data: n = 100; p = 33%, q= (1-p) = (1 - 0.33) = 0.67 = 67%

=√
𝑝𝑞 2211
Standard Error of proportion (SEP)
√√
= 33*67
=
𝑛 10 100

=√22.11 = 4.70
0

95% confidence interval of estimate


p ± 1.96 (SEP) = 33 ± 1.96 (4.70) = 33 ± 9.212
= 23.788% to 42.212%

The above interval means that it is for 95% confidence that the estimate of low birth
weight in that block would fall between 23.788% to 42.212%
Problem 3, 4

3. The mean systolic BP of 100 newly recruited employees was 100 mm Hg with
standard deviation of 10. Find out what would be the number of employees with
systolic BP between 100 of 120 mm Hg?

Solution:

Given data: Mean = 100 mm Hg; SD: 10 mm Hg

60 70 80 90 100 110 120 130 140

Relative deviate = Z =X−μ =


|120−100|
=2
σ 10
Table value, P for Z= 2 is 0.4772. i.e for 10000 people 4772 will have systolic BP between
100 to 120 mm Hg.
Therefore for 100 employees (4772/10000) x 100 = 47.72 = 48
48 employees will have systolic BP between 100 to 120 mm Hg.

4. The mean Hb of 500 students in a school is 12gm with 2 standard


deviation. Find out.

a) What would be the number of students with Hb between 10 to 12 g.


b) What would be the number of students with Hb 14g and more than
14g? Solution:

Given data: Mean Hb (𝑋) = 12g%; SD (σ) = 2

a. No. of students between 10-12g%


𝑋−μ |10
=Z= σ −12
| 2
=
=1
8 10 12 14 16
Value for Z= 1 in Z table cumulative normal frequency distribution (Area under
standard normal curve from 0 to z) is 0.3413.

170.65 ~ 171 students.


i.e for 10000 students, 3413 will lie in this interval. For 500 students = (500 X 3413)/10000 =

b. With Hb levels 14g% and more than 14 g%

𝑋−μ |14 −12|


=Z= σ = 2 =1

Table value for Z=1 (Z table for area under standard normal curve from 0 to Z) is 0.3413
Since we need the no. of students with Hb ≥ 14g% , curve beyond 0.3413 is 0.5- 0.3413 =
0.1587.

Therefore 500 student: (500X 1587)/10,000 = 79.35 ~79.


For every 10,000 students, 1587 wi l have Hb≥14g%.

79 Students will have Hb 14g% and above.


Problem 5

5. The mean Hb level of a sample of 81 women selected at random from a village


was 10.89gm/100ml with standard deviation of 2.2gm/dl. The mean Hb level of the
population was 11.4gm/dl. Find whether there is a significant difference between
the sample Hb level and the population Hb level.

Solution:

Data given: n=81;


Mean (𝑥̅)=10.89g/dl; SD (S)=2.2g/dl;
Population mean (µ) = 11.4g/dl

population Hb level ie 𝑥̅ = µ
Null hypothesis: There is no significant difference between the sample Hb level and

Standard Error of mean (𝑆𝐸) = = 2.2⁄


𝑆𝐷 √8
⁄ 1
= 0.24

√𝑛

=
𝑋−μ |10
.89−11.4|
Critical ratio: Z= = 0.51/0.24 = 2.125
𝑆.𝐸. 0.24

Comparison with table value for Z-distribution

The table value for Z= 2.125 at 5% level is

1.96.

Our observed value is 2.125, which is greater than the table value. Therefore the
probability of observing this value or greater than that by chance is < 5%. Hence the null
hypothesis is rejected (P<0.05).

Inference:
There is significant difference between sample Hb level and population Hb level.
Problem 6

6. In an otological examination of school children, out of 246, 45 were found to


have ear problem. Does it confirm with the statement that 20% of school
children have otological problem.

Solution:
Given data:n= 246;
p= 45/246= 0.183= 18.3%;
q= 1-p = 1-0.183 = 0.817= 81.7%

𝑃= 20% = 0.2.
Null hypothesis: No difference between the population and sample prevalence. ie. p=P

SE of proportion =√ = √
𝑝𝑞 18.3* 81.7
= 2.46
𝑛 246

Critical ratio (z) = | |=


𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛𝑠 20−18.3
𝑆.𝐸 𝑜𝑓 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛𝑠 2.46 = 0.69
The observed z value is less than the table value (0.68< 1.96) at 5% significance level. i.e the
probability of observing a value ≥ 0.67 by chance is > 5%. So, the nu l hypothesis is not
rejected.

Inference:
There is no statistically significant difference between the sample and population.Hence, it
confirms with the statement that 20% of school children have otological problem.
Problem 7

7. In a nutritional survey 100 children were given a special diet and another
comparable group routine diet. The average weight of group was 30kg and 27 kg
and 5 kg and 4 kg respectively. Can we say that the special diet was responsible
for difference in weight statistically? Give your comments.

Solution:

Given data: 𝑥1= 30 kg; SD1=5 kg;


𝑥2= 27 kg; SD2=4 kg

routine diet. ie .𝑥1 = 𝑥2


Null hypothesis: There is no difference in the effect of weight among special and

SE of difference b/w 2 mean

= 5√ +
2 2
𝑆𝐷1 𝑆𝐷2
= +
2
4

2

𝑛1 𝑛2 10 100
0
=√0.25 + 0.16
= 0.64.

Critical ratio (z) = | | = 30−27 = 4.69


𝑥1−𝑥2

𝑆.𝐸. 0.27

The Z value observed is more than the table value (4.69 > 1.96) at 5% significance level.
So, the Null hypothesis is rejected.
Therefore there is statistically significant difference in weight among the children who
had special diet and routine diet.
Problem 8
8. Out of 1200 persons examined for leprosy in a village, 42 were detected. In other
village, 24 cases detected from a population of 800. Does the data indicate that
the difference in proportion in number of leprosy cases is statistically significant?
Give your comments.

Solution:

Given data:
n1= 1200 n2= 800
p1 = 42/1200=0.035 p2= 24/800=0.03
q1= 1- p1 q2= 1- p2
q1= 0.965 q2= 0.97

Null hypothesis: There is no difference in proportion in number of leprosy cases


between the two villages.
SE of difference between 2 proportions, 𝑆 .
𝐸. = √𝑝1𝑞1 + 𝑝2𝑞2
𝑛1 𝑛2

=√ +
0.035 * 0.965 0.03* 0.97
1200 800

= √0.0000644=
0.008

Critical ratio(z) = | | = 0.035


𝑝1−𝑝2 −0.03
= 0.625
𝑆.𝐸 0.008

The Z value observed is less than the table value (0.625< 1.96) at 5% significance level. So,
the Null hypothesis is not rejected.

Inference:
Therefore there is no statistically significant difference in proportion in number of leprosy
cases between the two villages.
Problem 9

9. 300 polio vaccines were dispatched using ice packs made from deep freezer and
dry ice purchased from ice factories for maintaining cold chain. The No. of success
and failure as detected from the VVM are tabulated below.

Success Failure
Ice Pack 211 14
Dry Ice 65 10
Test whether there is any statistically significant difference between the two
using Chi- square method.

Solution:
Null hypothesis: There is no difference b/w ice packs made from deep freezer and dry
ice factories.

2. 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑣𝑎𝑙𝑢𝑒𝑖𝑛𝑎𝑐𝑒𝑙𝑙 =
1. Observed O1= 211; O2 = 14; O3= 65; O4= 10
𝑅𝑜w𝑡𝑜𝑡𝑎𝑙𝑋𝑐𝑜𝑙𝑢𝑚𝑛𝑡𝑜𝑡𝑎𝑙

𝐺𝑟𝑎𝑛𝑑𝑡𝑜𝑡𝑎𝑙

27 𝑋 225
276 𝑋 225 = 18
= 207 300
300
75 𝑋 276 75 𝑋 24
= 69 =6
300 300
=
∑4
2
(0−𝐸)
2 (211−207
2 (14−18)
2 (65−69)
2 (10−6)
2
)
3.

𝑖=1
+
69 6
+
𝐸
= +
207 18

= 207 + 18 +69 +6
16 16 16 16

= 0.077 + 0.889+ 0.232+ 2.667


= 3.865
4. Degree of freedom = (c-1)(r-1) = (2-1) (2-1) = 1
5.
6. Table value at d.f(1) is 3.84 at 5% Sig. level.

Our values is 2 = 3.865 which is greater than the table of 3.84. Therefore the
probability if observing this value of 3.865 or greater than that by chance is <5%.
Therefore Null hypothesis is rejected (p<0.05)
Inference: There is statistically significant difference between ice packs made from
deep freezer and dry ice from ice factories.
Problem 10

10. In a pilot study done on malnourished babies, 6 months old, mean weight in 5kg
with a SD of 0.5 kg. Calculate sample size for the major study with an allowable error
of 1% and 5% risk.

Solution:
Mean = 5 kg, SD = 0.5 kg
N = Z² (1 - /2) S²

a. On assuming allowable error of 1%

L = 1% of mean = 0.01 x 5 = 0.05


Z (1 - /2) ² = 3.84
N = Z² (1 - /2) S²

= 3.84 x (0.5)²
(.05)²
= 0.96/ 0.0025 = 384.

b. Assuming allowable error of 5%


N = Z (1 - /2) ² S²

L = 5% of mean= 0.05 x 5 = 0.25
N = 3.84 x (0.5)²
(.25)²
= 0.96/ 0.0625
= 15.36 = 16
Problem 11

11. To study the Diarrheal diseases in a community with a prevalence rate of 2%,
calculate the sample size with an allowable error of 5%. Demonstrate that decision
on the allowable error determines the sample size.

Solution
a. Diarrheal diseases prevalence rate p = 2%, q = 100 – p = 98%

Allowable error = 5% d= 5% of prevalence


d= 5 x 2 = 0.1%
100
(1−∝)² (𝑝)(q)
Sample size for the study = 𝑧
𝑑²
α = 0.05 significance 95% CI, Zα at 0.05 = 1.96
Sample size = (1.96)² x 2 x 98
(0.1)²
752.95
0 .01
= = 75295
Sample size = 75295.

If allowable error is 10%


(1−𝛼) (𝑝)(𝑞)
Sample size = 𝑧²
𝑑²
, d = 10% of 2 = 0.2
= (1.96)² x 2 x 98
(0.2)²
Sample size = 18823.

If allowable error is 15%


( )
𝑧² 1−𝛼 (𝑝)(𝑞)
𝑑²
Sample size = ,d= 15% of 2 = 0.3
= (1.96)² x 2 x 98
(0.3)²
Sample size = 8366.

If allowable error = 20%

𝑧²(1−𝛼) (𝑝)
(𝑞)
𝑑²
Sample size = , d = 20% 0f 2 = 0.4
= (1.96)² x 2 x 98
(0.4)²
Sample size = 4705.9 = 4706

Thus sample size is increased as the allowable error is decreased.


Problem 12
12. An existing procedure has 80% success rate. A new procedure is claimed to have
90% success rate. Find out the sample size to ascertain the validity of the claim. (α
and βerrors are 5% and 10%).

Solution:

Given data p1 = 80%, p2 = 90%,


q1 = 20% q2 = 10%

Sample size n = 2 (P) (1 - P) (Zα + Zβ)²


(p1 – p2)²
P =p1 + p2
2
= 0.80 + 0.90 = 0.85
2
P = 0.85
α = 5% , β = 10%β
Zα = 1.96 , Zβ =
1.29

n = 2 (P) (1 - P) (Zα + Zβ)²


(P1 – P2)²
= 2 (0.85)(0.15)(1.96 + 1.29)²
(0.80 – 0.90)²
= 269.34

n = 270 for each arm


Problem 13

13. In a dietary experiment, the mean weight of children are respectively 3000 g and
3200 g before and after the intervention.The combined SD was 500g. Determine the
sample size to conduct a study to ascertain the real difference due to the special diet.
( with Zα at 95% level with an allowable error of 10% and power of the test 90%.

Solution:

Mean weight before intervention = X1 =


3000gm Mean weight after intervention = X2 =
3200gm SD = σ = 500gm
α = 5%
β = 10%
d = difference = 3200 – 3000 = 200
Sample size = 2 σ² (Zα + Zβ) ²

= 2 (500)(500) (1.96 + 1.29)²
200²
= 132.03

Sample size = 133


Problem 14

14. The time x in years that a doctor spent at a hospital and the his hourly pay, y, for 5
doctors are listed in the table below. Calculate and interpret the correlation coefficient r.

x y

5 25

3 20

4 21

10 35
15 38

Answer

x y x2 y2 xy

5 25 25 625 125

3 20 9 400 60

4 21 16 441 84

10 35 100 1225 350

15 38 225 1444 570

∑x = 37 ∑y = 139 ∑x2 = 375 ∑y2 = 4135 ∑xy = 1189

Calculate the numerator:

n∑(xy) – (∑x) (∑y) = 5 (1189) – (37 x 139) = 802


Calculate the denominator:

Now, divide to get r ≈ 802 ≈ 0.97.


827.72

Interpret this result: There is a strong positive correlation between the number of
years and employee has worked and the employee’s salary, since r is very close to
1.
Problem 15

15. The table below shows the height, x, in inches and the pulse rate, y, per minute,
for 9 people. Find the correlation coefficient and interpret your result.

x 68 72 65 70 62 75 78 64 68
y 90 85 88 100 105 98 70 65 72

You may use the facts that (double check this for practice)

∑x = 622,
∑y = 773
∑x2 = 43206
∑ y2 = 68007
∑ xy = 53336.

Calculate the numerator:

n∑(xy) – (∑x) (∑y) = 9 · 53336 − 622 · 773 = −782

Calculate the denominator:

−445
Now, divide to get r 0.38.
1159.66
Interpret this result: There is a weak negative correlation between the study time and
final exam grade, since r is closer to 0 than it is to −1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy