Statistics DietCook
Statistics DietCook
INTRODUCTION TO STATISTICS
Introduction
Information we receive through print and electronic media in classrooms and seminars
and at other various functions such as workshops appear in the form of numbers. Results
from polls, weather forecast, inflation rates, patient medical records, incidence and
prevalence rates, mortality rates, neonatal mortality rates among others are often
expressed in percentages, ratios, probabilities.
Concept of Statistics
Statistics is a mathematical principle that is applied in our day to day activities in this
current global age. It can be applied in the industry, health sector, business, social
environment and other areas. As health professionals the application of statistics in the
day to day discharge of our duties cannot be left out. In general one can say that statistics
is the methodology for collecting, analyzing, interpreting and drawing conclusions from
information.
Statistics can be defined as a mathematical body of science that deals with the collection,
organization, analysis and interpretation and presentation of data in a meaningful way.
For health care professionals, the effort to improve the quality of the services they
provide to patients never ends. The goal becomes more vital as health care budgets shrink
and the demands placed on health care systems push them to the breaking point.
The application of statistics to biological and medical data promises to have a tremendous
impact on the provision of health care and prevention of disease. The accurate
interpretation of biostatistical data can serve as the foundation for efforts to improve
public health and the quality of patient care. As with many burgeoning technologies,
however, there is much uncertainty among nursing professionals about the role of
biostatistics in health care.
1
INTRODUCTORY BIOSTATISTICS
Statistical methods can be used to find answers to the questions like:
What kind and how much data need to be collected?
How should we organize and summarize the data?
How can we analyze the data and draw conclusions from it?
How can we assess the strength of the conclusions and evaluate their uncertainty?
That is, statistics provides methods for
Design: Planning and carrying out research studies.
Description: Summarizing and exploring data.
Inference: Making predictions and generalizing about phenomena represented by
the data.
Example (Statistics in practice)
Consider the following problems:
Agricultural problem: Is new grain seed or fertilizer more productive?
Medical problem: What is the right amount of dosage of drug to treatment?
Branches of Statistics
It is very important for every student to know about the different branches of statistics to
enable them correctly understand statistics from a more holistic point of view. Statistics
can be divided into two distinct branches namely descriptive statistics and inferential
statistics. These two branches of statistics are employed in scientific analysis of data and
both are equally important in understanding statistics and undertaking statistical tasks.
2
INTRODUCTORY BIOSTATISTICS
Descriptive Statistics
This is a branch of statistics that involves the collection and summarization of data in
order to express understanding and compare characteristics measured by the data.
Here, data is organized in the form of frequency distribution tables, pie charts, and bar
charts.
Summary statistics such as mean, mode, median, standard deviations etc. are done to
bring more meaning in the data.
It is usually the first part of a statistical analysis and also different areas of study requires
different kinds of analysis using descriptive statistics. For example, a nursing assistant
studying the OPD attendance for a particular month needs the daily attendance for that
hospital from which he or she can find the average attendance with regards to age group,
gender and other parameters using the mean, mode, percentages and others.
3
INTRODUCTORY BIOSTATISTICS
Inferential Statistics
1. As a healthcare professional it makes your work reliable with data and numerical
reasoning. It provides you with the knowledge of reasoning and methods of
organising and interpreting data to the understanding of the ordinary person.
2. It helps solve the problem of variations in individual members or characteristics in
a population of study Example height, weight, haemoglobin level, temperature,
etc. Such measurement hardly give the same result when taken at different times,
hence the need to employ statistics for a mean value for that individual.
3. To the Ministry of Health, statistics is vital for planning, budgeting, forecasting
and implementation of key health policies. Some distinct use of statistics include;
i. To test the efficacy of a drug
ii. To predict bed availability and occupancy
iii. To calculate incidence and prevalence rates
iv. Helps us understand medical literature
4
INTRODUCTORY BIOSTATISTICS
4. Helps in the analysis and Presentation of Data - Tools like SPSS, Minitab, R and
Microsoft Excel are used to analyze data and present data in tables, charts, etc. to
give it a meaningful interpretation.
5. Resource Allocation - Guide managers as to where to allocate more resources
5
INTRODUCTORY BIOSTATISTICS
9. Population size: This is the total number of elements in a population. It tells us
how many items are there in a population. For example, the total number of
patients in the emergency ward at Korle Bu Teaching hospital.
10. Sample: A sample is a subset of a population. It is a fraction of an entire
population. For a sample to be accurate and acceptable, it has to be representative
and bias free. Example of a sample is the number of Muslim students in Nalerigu
NMTC.
11. Representative Sample: This is sample drawn from a population of interest and
has the same characteristics as in those in the population.
12. Bias Sample: This is sample in which certain characteristics are more likely to be
included in the sample than other characteristics in the population. E.g. selecting
only Muslims students to solicit their views in order to determine a day of
worship for the school. Also another example is conducting a study on the meals
served to students by the College of Nursing and Midwifery School Nalerigu, the
researcher then chooses only the diploma students as the target population.
Conclusions generated from this study cannot be representative for the entire
student body which therefore makes it a bias study.
13. Sample Size: The total number of elements in a sample.
14. Parameter: This is a single value which is a summary number that describes an
entire population. It is a descriptive measure of a population. It uses the mean,
median mode, percentages, etc. For example the percentage population of the
students in Nalerigu NTC is distributed as 60 % and 40% for females and males
respectively.
15. Statistic: This is a single value which is a summary number that describes an
entire sample. It is a descriptive measure of the sample. It also uses the mean,
median, mode, etc. For example the mean age of students in RNAC E class is 24
years.
16. Elementary Units: Each individual element in a population is known as
elementary unit. Some examples are; each student in a school; each patient in a
hospital; each doctor in a hospital.
6
INTRODUCTORY BIOSTATISTICS
Variables
A variable is any characteristic of a person, object or phenomenon that can take on
different value. It can be manipulated and changed from time to time. Examples are age,
gender, weight, height, time, eye colour, region, occupation, etc.
Types of Variables
There are different ways variables can be described according to the ways they can be
studied, measured, and presented. Variables can be classified as quantitative or
qualitative.
1. Quantitative Variable: These are variables that are numerical and can
quantified. For example the variable age is numerical, and people can be ranked
in order according to the value of their ages. Other examples of quantitative
variables are heights, weights, and body temperatures. Quantitative variables can
be further classified into two groups as:
a. Discrete Variables - A discrete variable is one that assumes only specific
values with no possibility of any values between them. In other words, a
discrete variable can take only non0negative whole numbers or integers.
Discrete variable can be assigned values such as 0,1,2,3 and are said to be
countable. Examples of discrete variable are the number of patients in a ward,
number of children in a family, number of beds in a hospital, number of
students in a class, ages of tutors in a school, etc.
b. Continuous Variables - A variable that can assume all values between any
two specific values; a variable obtained by measuring. Temperature, for
example, is a continuous variable, since the variable can assume all values
between any two given temperatures. Other examples are, height, weight,
time, etc.
2. Qualitative Variables (Categorical): They variable that can be placed into
distinct categories, according to some characteristic or attribute. For example, if
subjects are classified according to gender (male or female), then the variable
gender is qualitative. Other examples of qualitative variables are religious
preference and geographic location.
7
INTRODUCTORY BIOSTATISTICS
a. Nominal Variables – These are observations that cannot be organized in any
logical sequence. Examples are gender, occupation, place of residence, hair
colour, etc.
b. Ordinal Variables – These are observations that can be organized in a logical
sequence. Examples are standard of living, socio-economic status,
classification of anaemia, etc.
The classification of variables can be summarized as follows;
ACTIVITY 1
Table 1: Indicate whether each of the following is a qualitative or quantitative variable
b. Gender
c. Hair colour
e. Tase
8
INTRODUCTORY BIOSTATISTICS
f. Weight of a patient
g. Marital status
h. Occupation
i. Nutrition status
l. Educational background
n. Stage of disease
p. Cause of death
q. Nationality
9
INTRODUCTORY BIOSTATISTICS
CHAPTER TWO
PRIMARY DATA
Primary data are first hand data collected by the researcher and is used directly for what it
is intended for. This type of data has not undergone any refining or transforming process.
It is collected and used for various purposes such as demographic characteristics, product
popularity, human right awareness, health education, prediction of rates such as maternal
and neonatal mortality, calculation of OPD attendance, etc.
i. Observational method
ii. Surveys
iii. Experiments
Observational Method
1. It is sometimes the only method of data collection available for some specific
types of information.
2. It helps us collect original data as and when they occur without necessarily
depending on the reports by others. Information passing through different people
10
INTRODUCTORY BIOSTATISTICS
removes the authenticity and originality at some stage and might not be the fair
account of what actually transpired.
3. Observation alone can also capture the whole event as it occurs in it natural
environment.
4. Some subjects seem to accept an observational intrusion better than questioning.
This is because it is less demanding and would not necessarily waste their time.
1. The observer or researcher must always be at the scene of the event when it
happens, but naturally it is impossible to predict where and when the event will
occur.
2. It is a slow and expensive method of data collection.
3. The research world is more likely suited or designed to subjective assessment and
recording of data than observation and participation, hence making this data
collection method somehow inefficient.
Surveys
This is a method of primary data collection in which the researcher obtains data from the
population of interest on a subject matter using a list of questions. It may be conducted
using questionnaires, phone calls, emails and personal interviews. A survey conducted on
an entire population of interest is referred to as a census and alternatively a survey
conducted on a sample is referred to as a sample survey.
11
INTRODUCTORY BIOSTATISTICS
Experiments
An experiment is a data collection method where the researcher change some variables
and observe their effect on other variables. The variables that you manipulate are referred
to as independent whiles the variable that changes as a result of the manipulation are
dependent variables. An experiment can be conducted in a laboratory or in a field setting
For example Tobinco Pharmaceuticals limited is testing the effect of a drug strength on
the number of bacteria in the body. The company decides to test the drug strength at
10mg, 20mg and 40mg. In this instance, the drug strength is the independent variable
while the number of bacteria is the dependent variable. The drug administered is the
treatment while the 10mg, 20mg and 40mg are the levels of the levels of treatment.
Experimental method of data collection can be adopted in different fields like medical
research, agriculture, sociology, psychology among others.
SECONDARY DATA
Secondary data are second hand data. They are data that has already been gathered or
published by other sources for other purposes. It is relatively faster to collect and less
expensive than primary data. Secondary data sources include journals, magazines,
newspapers, patient folder, bill boards, recipe books, menus, data available or published
in government agencies, internet sources among others.
SAMPLING
Statistical sampling is a process of selecting a smaller group of respondents (sample)
from the larger group (population) with the resulting sample representing the entire
population. A good sample must exhibit the characteristics of the population of interest.
Importance of Sampling
12
INTRODUCTORY BIOSTATISTICS
Qualities of a Good Sample
Representativeness: A good sample must possess all the qualities that describes
the entire population. For example if we want to select students to travel for an
industrial tour or affiliation, we need to select students from the various programs
to attend the trip. If we select only students from a particular program or
department then that sample is said not to be representative of the population.
Large: A good sample must be large enough before it could be used to generalize
for the population. We cannot select any number at all as we wish to draw
conclusion about a population. The larger the sample size, the better the accuracy
of the results. For example, would it be proper for someone to use the bad
behaviour of three students to conclude for the entire class?
Relevance: The sample drawn from the population must be very useful for the
parameter under construction. For example if a researcher wants to find out if a
patient has malaria, the relevant sample one needs to take is the patient’s blood
and not the stool.
Types of Sampling
There are two types of sampling in statistics and research. These are; probability
sampling and Non-probability sampling.
Probability Sampling
This is a sampling technique in which every member of the population has equal chances
of being selected in the sampling. In this type of sampling technique, every member in
the population has equal chances of participating in the study and sampling is unbiased.
In this section, we shall discuss extensively the four types of probability sampling;
13
INTRODUCTORY BIOSTATISTICS
Simple Random Sampling
Stratified Sampling
Systematic Sampling
In this method, observable units of the population are arranged in some way and a
random starting point is selected, then every k th member of the population is selected
for the sample. To obtain a systematic sample, the first individual or item to be
selected is chosen at random from the population of interest and the rest is obtained
by selecting every kth individual or item thereafter from the entire population.
Cluster Sampling
This method is often employed when sampling is done over a large geographical area.
This area is often sub-divided into smaller areas called primary units. At this stage
several units are selected and sampling is concentrated in these units.
In a cluster sampling, the population is divided into several clusters and then a
random sampling is done to select clusters and finally members of the selected
clusters are sampled for the study. This is mostly effective when you have a large
geographical area such as electoral areas, enumeration areas, Districts, Regions, etc.
This sampling technique tends to be less effective compared to simple random
sampling and stratified sampling and often require a large overall sample size
compared to those obtained in the former.
14
INTRODUCTORY BIOSTATISTICS
Non-Probability Sampling
Convenience Sampling
Quota Sampling
In this sampling technique, the researcher requires the sample to contain a certain
number of items with a given characteristics. In quota sampling, you select
participants or elements non-randomly according to some fixed quota. This method is
mostly used when you need information about a subject. Example if you want to
gather information about the culture of the people in the North East Region, you just
need to speak to the chiefs and a few elders and you would get the needed
information for your study.
Snowball Sampling
In snowball sampling, the researcher begins by identifying someone who meets the
criteria for inclusion in his or her study. This person then leads or may direct the
researcher to others who also meet the criteria. Although this method would hardly
15
INTRODUCTORY BIOSTATISTICS
lead to representative samples there are times when it may be the best method
available. Snowball sampling technique is very useful especially when you are trying
to reach populations that are inaccessible or hard to find. For instance, if you are
conducting a study on widows in a rural community you are likely not to able to
locate all widows but if you locate one person, that person can always give you
direction to the next and this continues until you get your desired sample size.
ACTIVITY 2
1. What is the difference between primary and secondary data sources? Give two
examples of each relevant to the field of nutrition or dietetics. ……………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
2. What sampling method is suitable when preparing a menu for the 100 cholera
patients in a hospital? State why you choose a particular method. ………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
16
INTRODUCTORY BIOSTATISTICS
CHAPTER THREE
Recall that:
100 = 1 (anything raised to the 0 power equals 1)
101 = 10 (anything raised to the 1st power is the value itself)
102 = 10 x 10 = 100
103= 10 x 10 x 10 = 1,000
So the fraction of (numerator/denominator) can be multiplied by 1, 10, 100, 1000, and so
on. This multiplier varies by measure and will be addressed in each section.
Ratios
A ratio compare things. A ratio is a relationship/comparison between two values that is
expressed in the form of a common fraction like x/y or x: y
In a ratio, the values of x and y are completely independent. The order in which the ratio
is given is very important. Therefore the ‘thing’ that is given in the comparison must be
written first in the ratio. For example, the ratio of child to adult is 5: 7, then alternatively
the ratio of adult to child is 7:5. Interpreting ratios are very significant because it gives
more meaning to your work and throws more light on the understanding of the values
obtained.
17
INTRODUCTORY BIOSTATISTICS
Doctor to patient ratio
Teacher to student ratio
Nurse to patient ratio
Male to female ratio
After the numerator is divided by the denominator, the result is often expressed as the
result “to one” or written as the result “:1.”
Example
During the first quarter of 2013, the Baptist Medical Centre recorded 1,068 case of
malaria. 893 cases were females and 175 were males. Calculate and interpret the male-to-
female ratio for pneumonia.
Solution
Males: Females
X : Y
¿ 175 :893
175
¿
893
1
¿ =1 :5. 1:5
5.1
Interpretation
18
INTRODUCTORY BIOSTATISTICS
This means that for every one malaria case recorded for male there are five female
malaria cases recorded in the first quarter of 2013 for BMC.
Trial
In Savelugu hospital during their second quarter in 2013, they recorded 45 meningitis
cases in adults out of which 25 were males and the rest were females. Calculate and
interpret the male to female ratio for meningitis.
Proportions
A proportion is a relationship or comparison of x/y in which x is included in y. Proportion
is therefore about sharing something in a given ratio. Proportion simply means comparing
one ‘thing’ to all other ‘things’ combined in the sense that that one thing is among all the
x
other things. .
x+ y
Examples of proportion;
In proportion, you first find the total parts for the ratio;
Also, the ratio X+Y+Z gives (x+y+z) parts. Therefore to find one part proportionally;
x
Proportion of x =
x+ y+ z
y
Similarly, proportion of y =
x+ y+ z
For a proportion, 10n is usually 100 (or n=2) and is often expressed as a percentage
19
INTRODUCTORY BIOSTATISTICS
Example
During the first quarter of 2019, the Baptist Medical Centre recorded 1,068 cases of
severe malnutrition. 893 cases were females and 175 were males. Calculate and interpret
the proportion of malaria cases that are males.
Solution
X : Y
¿ 175 :1068
175
¿ X 100=16.39 %
1068
This means that in every 100 malaria cases registered, there are 16 male cases recorded.
Rates
A rate is a third type of frequency measure. It is often a proportion, with an added
dimension: it measures the occurrence of an event in a population over time and is
sometimes used as performance improvement measures.
Some examples of rates are; Caesarean section rates, OPD attendance rates, birth rates,
death rates, admission rates and so on.
20
INTRODUCTORY BIOSTATISTICS
In the case of rates, X and Y are the two quantities that are being compared and he added
dimension in this case is the 10n which is a constant.
Example
Solution
Number of CS= 53
53
Therefore CS rate ¿ x 100=20.5 % ( 1 d . p ) ≈ 21 %
259
This therefore means that for every 100 deliveries undertaken in the first quarter at BMC,
there were 21 cases of C-sections done.
Or For every 100 women that delivered in the first quarter of 2016 at BMC, there were 21
of them who underwent CS.
ACTIVITY 3
Write the answers to the following questions in the spaces provided. Make sure to answer
questions individually in order to enhance understanding.
21
INTRODUCTORY BIOSTATISTICS
Age group Total OPD
Total admissions Total deaths
(yrs.) cases
Under 1yr. 3 1 0
1 - 4Yrs 4 1 0
5 - 9Yrs 4 2 1
10 - 14Yrs 5 2 1
15 - 17Yrs 7 3 1
18 - 19Yrs 8 3 0
18 - 19Yrs 11 3 1
25 - 34Yrs 8 2 0
35 - 49Yrs 4 1 0
50 and above 6 2 1
Use the data in
Total 60 20 5 Table 2 to
determine the
following:
4. The proportion of deaths that occurred in the 5-9yr age group? …………………..
5. If 15yrs and above are considered as adults and those below children, what
proportion of:
22
INTRODUCTORY BIOSTATISTICS
a. Children were seen at the OPD …………………………………..
b. Adults were admitted and ………………………………..
c. Children who died ……………………………………
CHAPTER FOUR
Types of Fractions
Basically, there are three (3) main types of fractions. These are;
Proper Fraction: As the name implies, this is a fraction that has its numerator
smaller than the denominator or its denominator larger than the numerator. Some
examples are;
1 5 7 3
, , , etc .
4 7 10 5
Improper Fraction: On the other hand, an improper fraction is a fraction that has
its numerator larger than the denominator or its denominator smaller than the
14 5 7 13
numerator. Some examples are , , , etc
4 3 2 5
Mixed Fraction: A mixed fraction is a fraction that has a whole number attached
1 5 7 3
to a fraction. 2 ,3 ,1 , 3 etc . Hence before simplifying this kind of
4 7 10 5
fractions, you MUST always remember to convert them into improper fractions
before working with them.
23
INTRODUCTORY BIOSTATISTICS
Like Fractions: Two or more fractions are said to be like if they have the same
4
∧5
denominator. For example, 3 7 are like fractions
,
7 7
2
∧3
Unlike Fractions: Unlike fractions have different denominators example 6
8
fraction has smaller numbers in it. Therefore, a fraction is in its simplest form or lowest
term if it cannot be reduced any further. A simpler form of any fraction can be found by
dividing the numerator and denominator by the same number. The number must be a
factor of both the numerator and denominator. For example, when you have a fraction say
8
its simplest form is achieved by dividing both the numerator and denominator by 2
6
4
which is a common factor is thereby giving as .
3
1
Example 1; Convert 2 into animproper fraction
4
24
INTRODUCTORY BIOSTATISTICS
Then add the result to the numerator; 8+1=9
9
Finally, rewrite the result in a fractional form maintaining the denominator; ¿
4
9
In the example above, = 2 remainder 1
4
9 1
Hence =2
4 4
Like denominators
To add fractions with like denominators, you add the numerators together and
maintain the denominators.
Examples
2 1 20 4
1. Simplifying + 2. Simplifying −
5 5 20 20
2+ 1 3 20−4 16
= = = =
5 5 20 20
4
¿
5
Unlike denominators
First, in adding unlike denominators, you have to change both fractions so they
are equivalent or equal, and they both have the same denominator. To do this,
you must first find the Lowest/Least Common Multiple (L.C.M) of the fractions.
25
INTRODUCTORY BIOSTATISTICS
A MULTIPLE of a number is a number which can be divided exactly by that
number.
E.g., multiples of 10 are 1, 2, 5 and 10 itself.
The first of every number is the number itself, so a multiple of a number cannot
be more than the number
Examples
2 1 4 9 1
Simplify the following; (i) + + (ii ) −
5 2 3 4 2
Solution
Exercise
1 3 10 4
Simplify the following; (i) + ( ii ) −
7 5 3 9
26
INTRODUCTORY BIOSTATISTICS
Multiplication and Division of Fractions
Fractions can be multiplied and simplified into simpler fractions. In multiplying fractions,
the numerators multiply themselves and the denominators also does same. Also if the
numbers diagonal from each other has the same factor, they can be reduced as low as
they can go.
Examples
2 1 2 X1 2 15 4 3 1 3 X 1
1. Multiply X ≫ = 2. Multiply X ≫ X =
5 3 5 X 3 15 8 25 2 5 2 X 5
3
¿
10
In example 2, the fractions were simplified diagonally using the common factors.
When dividing fractions, the basic rule is reciprocation. In dividing fractions, the first
fraction is maintained and the second fraction is turned upside down (i.e. the numerator
becomes the denominator and vice versa). After that, the operation is then changed from
division to multiplication and finally simplified.
Examples
5 3 1 3
1. Simplify ÷ 2.2 ÷
6 24 5 10
Solution
5 24 5 4 11 10 11
1. X = X 2. X = X3
6 3 1 3 5 3 1
5 X 4 20 11 X 2 22
= = ¿ =
1X3 3 1X3 3
27
INTRODUCTORY BIOSTATISTICS
ORDER OF OPERATIONS (BODMAS)
When working problems involving fractions which have more than one of the following
signs; ‘of’, x, +, - and /, we use what is known as BODMAS. The following steps should
Step 1: Deal with anything in Bracket first (if there is any) i.e.’ ()’
Illustrative examples
1 1 1
2 + of 3
2 3
( 5
1. 1 − 1 ÷ 2 2.
3 4 8
3 2
)
2 4
×
3 1 3
3. 2 + ÷ 8
3 2
1
5 ( )
3 5
Percentages
A percentage is a fraction whose denominator is 100. The term ‘per cent’ means out of a
hundred. In mathematics, percentages are used to describe parts of a whole – the whole
being made up of equal parts. The percentage symbol % is used commonly to show that
the number is a percentage.
28
INTRODUCTORY BIOSTATISTICS
Understanding percentages is a key skill that will potentially save you time, money and
makes you more useful in the area of health.
Examples
Therefore, this means that in every 100 malaria cases recorded in 2013, there are 32
female malaria cases recorded.
Exercise
Dr Mubarik has to create the holiday schedule for the maternity unit at his hospital. He
knows that 20% of the staff will not be available because they are on holiday off. Of the
remaining staff members who will be available, only 10% are certified to work at the
maternity unit. What number of the remaining staff will be available but not certified to
work in the maternal unit if the total staff is 300?
Examples
Exercise
The population of malaria cases recorded in Tamale Teaching Hospital in the year 2000
was 500,000. Over the decade the population of malaria cases grew by 8%. What is the
population of malaria cases in the year 2010?
CONVERSIONS
1. Converting from percentage to decimal
To convert from percentage to decimal, you divide by 100, and remove the "%"
sign.
The easiest way to divide by 100 is to move the decimal point 2 places to the left.
Examples
i. Convert 75% to decimal
75
Or ¿ =0.75
100
ii. Convert 1.25%
1.25
= =0.0125
100
2. Converting from Decimal to percentage
To convert from decimal to percentage, you multiply the quantity by 100 i.e. you
move the decimals two places to your right. Always remember to add the ‘%’
after conversion
30
INTRODUCTORY BIOSTATISTICS
.
Examples
i. Convert 0.14 to percentage ii. Convert 2.4 to percentage
= 0.14 X 100=14 % = 0.4 X 100=40 %
3. Converting from fraction to decimal
The easiest way to convert a fraction to a decimal is to divide the top number by the
bottom number (divide the numerator by the denominator in mathematical language)
Example
1
ii. Convert to decimal
4
= 1 ÷ 4 = 0.25
4. Converting from decimal to fraction
Converting from decimal to fraction is a bit more involving and needs a more
technical approach
31
INTRODUCTORY BIOSTATISTICS
Example
i. Convert 0.64 to fraction
0.64
Firstly, you write down the decimal “over” 1 ≫
1
0.64 100
You then multiply the numerator and denominator ≫ X
1 100
by 10 or 100 for every number after the decimal
( i.e 10 for 1 number, 100 for 2 numbers, etc)
64 16
≫ =
100 25
ii. Convert 1.8 to fraction
1.8 10
= X
1 10
18 9
= =
10 5
5. Converting from fraction to percentage
The easiest way to convert a fraction to a percentage is to divide the top number
by the bottom number. Then multiply the result by 100, and add the "%" sign.
Examples
2 4
i. Convert ¿ percentage ii. Convert ¿ percentage
5 9
= 2 ÷5=0.4 = 4 ÷ 9=0.44
= 0.4 X 100 = 40% = 0.44 X 100 = 44%
32
INTRODUCTORY BIOSTATISTICS
To convert a percentage to a fraction, first convert to a decimal (divide by 100), then
use the steps for converting decimal to fractions (like above).
Examples
i. Convert 45% to fraction ii. Convert 26% to fraction
45 9 26 13
= simplifying this= = dividing by 2=
100 20 100 50
.
ACTIVITY 4 A
write your answers in the spaces provided
ACTIVITY 4 B
33
INTRODUCTORY BIOSTATISTICS
write your answers in the spaces provided
1 Malaria 39257
2 ARI 10068
3 Anaemia 9662
5 Cataract 7157
9 Hypertension 4780
10 Pneumonia 4455
1. Express the anaemia cases as a fraction in it lowest terms to the total disease burden
in the health facility. ……………………………………….
2. Convert the fraction in (1) to a percentage ………………………………………..
CHAPTER FIVE
PICTORIAL AND GRAPHICAL REPRESENTATION OF DATA
Introduction
34
INTRODUCTORY BIOSTATISTICS
When we handle more or large data individually or collectively, tables and charts can be
used to properly organize and summarise our data effectively and efficiently to ease
understanding.
These statistical tools help us to understand relationships, trends and distributions of data
which will help give a clear picture of the current situation.
Tables
A table is a set of data arranged in rows and columns. Most quantitative information can
be organized and displayed in a table. Tables are useful for demonstrating patterns,
exceptions, differences, and other relationships. Mostly, tables serve as the basis for
preparing more visual displays of data, such as graphs and charts, where some of the
detail may be lost.
Tables designed to present information to others should be free from ambiguity. This
means that it should be simple as possible. Two or three small tables, each focusing on a
different aspect of the data, are easier to understand than a single large table that contains
many details or variables.
SCENARIO (A)
In June 2020, the total OPD morbidity reported by Baptist Health Centre was 250. Ninety
(90) of these were malaria, 70 were diarrhoeal diseases, 30 were typhoid fever, 50 were
intestinal worms and 10 were URTIs. Out of the total malaria cases, 40 were males and
50 were females. 45 females and 25 males made up the diarrhoeal diseases and 12 males
and 18 females made up the typhoid disease burden. The intestinal worms comprised 24
females and 26 males and only one case was male among the URTIs cases reported.
SCENARIO (B)
Cases Frequency
35
INTRODUCTORY BIOSTATISTICS
Male Female Total
Malaria 40 50 90
Diarrhoea 25 45 70
Typhoid fever 12 18 30
Intestinal Worms 26 24 50
URTIs 1 9 10
Total 104 146 250
Source: BHC 2013
Which scenario is easier to understand? It could be seen that scenario (B) is easier to
comprehend. Also, a table should always be self-explanatory such that even if it is taken
out from its original context it should still convey the information necessary for the
reader to understand it.
i. Use a precise and concise title that describes the what, where, and when of
the data in the table.
ii. Precede the title with a table number (for example, Table 2.1).
iii. Label each row and each column clearly and concisely and include the units
of measurement for the data (for example, years, mm Hg, mg/dl, rate per
100,000).
iv. Show totals for rows and columns. If you show percent (%), also give their
total (always 100).
vi. Note the source of the data in a footnote if the data are not original.
Types of Variables
One variable table: This is the most common type of table in descriptive
statistics which displays a frequency distribution with only one variable.
36
INTRODUCTORY BIOSTATISTICS
In this type of frequency distribution table, the first column shows the values or
categories of the variable represented by the data, such as age, gender, etc.
The second column shows the frequency or number of persons or events that fall
into each category. Often, a third column lists the percentage of persons or events
in each category.
Table 1: Age distribution of OPD attendance in March 2014
37
INTRODUCTORY BIOSTATISTICS
Two and three variables table
Table 2 shows the number of OPD attendance by both age and gender of the patient.
Age Gender
Male Female Total
15-19 2 8 10
20 -24 12 18 30
25 - 29 10 2 12
30 - 34 3 3 6
35 and above 4 3 7
Total 31 34 65
Source: Gambaga Health Centre, 2020
Charts
A chart is a graphical representation of data in which the data is represented by symbols.
Symbols such as bars in a bar chart, lines in a line graph or slices in a pie chart are used
often.
Charts are often used to ease understanding of large quantities of data and the
relationships between parts of these data. They can usually be read more quickly than the
raw data and are used in a wide variety of fields. Charts are useful for displaying
information about frequency and are statistical snapshot that helps us see patterns, trends,
similarities and differences in the data. There are various types of charts which includes;
bar chart, pie chart, line chart, histogram, flow charts, area charts, etc.
38
INTRODUCTORY BIOSTATISTICS
Types of Charts
BAR CHARTS OR BAR GRAPHS
Explanation of the term-bar chart (graph): A bar chart or a bar graph is a graph that uses
vertical or horizontal bars to represent the frequencies of the categories in a data set.
Note
Example 1: A sample of 300 college students was asked to indicate their favorite soft
drink. The survey results are shown in Table 1-6.Display the information using a bar
chart.
Table: Frequency Distribution for the Example 1
Solution: Observe that these are categorical or qualitative data. The vertical bar chart for
this information is shown in Fig. 2.The number at the top of each category represents the
number of values (frequency) for that specific group (soft drink).
39
INTRODUCTORY BIOSTATISTICS
A horizontal bar chart for the same soft drink information is shown in Fig. 2
Fig. 2: Vertical bar chart for Example 1
40
INTRODUCTORY BIOSTATISTICS
HISTOGRAMS
Explanation of the term-histogram: A histogram is a graphical display of a frequency
or a relative frequency distribution that uses classes and vertical bars (rectangles) of
various heights to represent the frequencies.
It is the graphically representation of frequency distributions. A histogram consists of a
set rectangles having a base on a horizontal axis (x-axis) with its centre at the class mid-
values or class mid mid-point and lengths equal to the class sizes. The vertical axis
corresponds to the frequency of the class.
Class boundaries can also be used as the horizontal axis instead of the class midpoints as
shown in the diagram below. The vertical axis is labelled ‘frequency’ and the horizontal
axis is labelled with a description of the data.
Note
Example: Display the data in Example 1-3with a histogram using seven classes.
Solution: A histogram with seven classes for the data is shown in Fig. 4
The histogram shows the frequency count for each class, with each class having a width
of 10.
41
INTRODUCTORY BIOSTATISTICS
Fig. 1-8: Histogram for data in Example 1-3
FREQUENCY POLYGONS
Explanation of the term-frequency polygon: A frequency polygon is a graph that
displays the data using lines to connect points plotted for the frequencies. The frequencies
represent the heights of the vertical bars in the histograms.
Note: A frequency polygon provides an estimate of the shape of the distribution of the
population.
Example 1-7: Display a frequency polygon for the data in Example 1-3.
Solution: The display given in Fig. 1-9 shows the frequency polygon superimposed on
the histogram for Example 1-6.
42
INTRODUCTORY BIOSTATISTICS
Pie Charts or pie graphs
A pie graph or pie chart is a circle that is divided into slices according to the percentage
of the data values in each category.
A pie chart allows us to observe the proportions of sectors relative to the entire data set. It
can be used to display either qualitative or quantitative data. However, categorical or
qualitative data readily lend themselves to this type of graphical display because of the
inherent categories in the data set.
Steps in constructing a pie chart
Example
The table below gives the distribution of women who visited the nutrition unit of a
hospital in some selected five communities in Ghana in the Northern and North East
Region in 2020.
Solution
44
INTRODUCTORY BIOSTATISTICS
240 0
Savulugu = X 360=120
720
8%
13%
33%
Nalerigu
Gambaga
Walewale
Gushegu
Savulugu
25%
21%
Line Graph
A line graph also known as a line chart is a type of chart used to visualize the value of
something over time. It consists of a horizontal x-axis and a vertical y-axis. A line graph
is best for illustrating a trend over time. More than one line may be plotted in the same
axis as a form of comparison.
In statistics, we commonly use this type of graph to show a long series of data and to
compare several series. It is the method of choice for plotting rates. Line graphs are best
suited for displaying the amount of change or difference in a continuous variable, which
is usually shown on the vertical axis
45
INTRODUCTORY BIOSTATISTICS
100
80
60
Frequency
40
20
0
Month
Figure 2 Trend of discharges by month in Goaso Hospital, 2018
Figure 4 Institutional Maternal Deaths in Cape Coast and UCC hosp, 2000-2020
4
2
0
Time (Years)
46
INTRODUCTORY BIOSTATISTICS
ACTIVITY 5 A
Students are expected to do this activity in their personal graph books, and submit to the
tutor for assessment.
1. At Chereponi Hospital in the month of June 2020, the following data was
gathered with regards to the number of children admitted at the nutrition ward and
their towns. Use the table below to construct a bar chart.
2. The table below shows the distribution of health staff in Tamale Teaching
Hospital in 2020.
47
INTRODUCTORY BIOSTATISTICS
FREQUENCY DISTRIBUTIONS
Explanation of the term-frequency distribution: A frequency distribution is an
organization of raw data in tabular form, using classes (or intervals) and frequencies.
The types of frequency distributions that will be considered in this section are categorical,
ungrouped, and grouped frequency distributions.
Explanation of the term-frequency count: The frequency or the frequency count for a
data value is the number of times the value occurs in the data set.
Example: The blood types of 25 blood donors are given below. Summarize the data
using a frequency distribution.
Solution: We will represent the blood types as classes and the number of occurrences for
each blood type as frequencies. The frequency table (distribution) in Table 1-
1summarizes the data.
48
INTRODUCTORY BIOSTATISTICS
QUANTITATIVE FREQUENCY DISTRIBUTIONS-UNGROUPED
Explanation of the term-ungrouped frequency distribution: An ungrouped frequency
distribution simply lists the data values with the corresponding number of times or
frequency count with which each value occurs.
Example 1-2: The following data represent the number of defectives observed each day
over a 25-day period for a manufacturing process. Summarize the information with a
frequency distribution.
Solution: The frequency distribution for the number of defects is shown in Table 1-2.
49
INTRODUCTORY BIOSTATISTICS
Table 1-2: Frequency Table for Example 1-2
50
INTRODUCTORY BIOSTATISTICS
Example: The weights of 30 female students majoring in Physical Education on a college
campus are given below. Summarize the information with a frequency distribution using
seven classes.
143 151 136 127 132 132 126 138 119 104
113 90 126 123 121 133 104 99 112 129
107 139 122 137 112 121 140 134 133 123
Solution: A grouped frequency distribution for the data using seven classes is presented
in Table 1-5. Observe, for instance, that the upper limit value for the first class and the
lower limit value for the second class have the same value, 95. The value of 95 cannot be
included in both classes, so the convention that will be used here is that the upper limit of
each class is not included in the interval of values; only the lower limit value is included
in the interval.
Thus, the value of 95 is included only in the interval of values for the second class.
Note: The class width for this frequency distribution is 10. It is obtained by subtracting
the lower class limit for any class from the lower class limit for the next class. For the
third class, the class limit = 115-105= 10.
51
INTRODUCTORY BIOSTATISTICS
MEASURES OF CENTRAL TENDENCY OR LOCATION
A measure of central tendency is a single value that describes a set of data by identifying
the central position within that data set. At such they are also called central location.
A measure of central location is the single value that best represents a characteristic such
as age or height of a group of persons. They are also classed as summary statistics. The
term central tendency dates back from the late 1920s.
The mean
The mode
The median
Mean
The mean is commonly called “average.” In formulas, the mean is usually represented as
“x-bar” represents the mean of a sample¿).
x 1+ x 2+ x 3+…+ xn ∑ x
Mean x = =
n n
Examples
Mean x=
∑x
n
29+31+24+29+ 30+25 168
= = =28 days
6 6
2. The ages of five children seen at consulting room one at BMC are 4, 10, 24, x and
16. If their average age is 13 years, find the value of x .
52
INTRODUCTORY BIOSTATISTICS
Solution
x=
∑x
n
4+10+ 24+ x +16
≫ =13
5
54 + x
=≫ =13
5
=≫ 54+ x =65
=≫ x=11 years
TRIAL
Median
Median means middle, and the median is the middle of a set of data that has been put into
rank order.
Specifically, it is the value that divides a set of data into two halves, with one half of the
observations being larger than the median value, and the other half smaller. It can also be
the arithmetic mean of the two middle values. The symbol for the median is MD.
53
INTRODUCTORY BIOSTATISTICS
1
If n is odd then the median is the middle item i.e. ( n+1 ) thitem or observation.
2
If n is even then the median is the arithmetic mean of the two middle items i.e.
1 1
( n ) th and (½ n +1) th items or ( n ) th item and the next item after it.
2 2
I. 2,1,2,3,5
II. 3,7,2,9,8,11,12
Solution
1
= ( 5+1 ) th item
2
1
= ( 7+1 ) thitem
2
54
INTRODUCTORY BIOSTATISTICS
I. 15, 7, 13, 9, 10, 11
II. 9,3,1,5,8,5,7,4,8,4,7,6
Solution
1
= ( 6 ) th item∧the next
2
10+11
= =10.5
2
1
= ( 12 ) th item∧the next
2
5+6
= =5.5
2
TRIALS
55
INTRODUCTORY BIOSTATISTICS
1. The data below gives the number of times 20 pregnant women visited the Public
Health Unit of BMC in February 2020: 5,2,5,3,1,6,2,2,3,4,2,1,2,2,4,3,2,2,2,3.
a) What is the mean number of visits
b) What is the median?
2. The following data shows the ages of males admitted at the Male’s surgical ward
One at Margaret Marquart Catholic hospital in the Volta Region;
41,26,27,64,72,65,85,20,41. Calculate
a) The mean age of the males admitted
b) The median age
c) If new admissions are registered in the ages 32,72,99,44,57,71, what is the
new mean age of the male’s surgical ward one.
Mode
The third measure of average is called the mode. The modal value of a set of numbers is
the number which occurs most frequently. It is the observation that occurs most in a
given data. Mode may or may not exist.
If we find that every value occurs only once, the distribution has no mode. If we find that
two or more values are tied as the most common, the distribution has more than one
mode.
The mode for grouped data is the modal class. The modal class is the class with the
largest frequency.
Examples
Find the mode of the following set of numbers;
a) 3,2,2,4,3,5,4,2,4,2,3
b) 15,14,19,15,14,15,16,14,15,18,14,15,14,19,17
c) 1,2,3,4,5,6,7,8,
Solution
56
INTRODUCTORY BIOSTATISTICS
b) 14 and 15 have the same maximum frequency, hence 14 and 15 are the modes
c) Here, all the observations occurred once, therefore the data has no mode.
The mode is the value with the largest frequency. The mode can be read directly from
the frequency table. Remember that the mode is the value, not the frequency.
Example
The frequency table below shows the number of children in each of 30 families. What
is the mode of the data?
Number of 0 1 2 3 4 5
children
Number of 3 9 11 4 2 1
families
Solution
For a grouped data, the modal class is the class with the largest frequency
The value of the mode within the modal class of a grouped data can be found from a
histogram of the distribution as shown in the diagram below
Using the highest triangle (modal class), find the intersection of the lines AC and BD.
57
INTRODUCTORY BIOSTATISTICS
Finding the mean, median and mode from frequency distribution tables
Class Interval: It is a symbol defining a class such as 35 – 39 in the table above.
The numbers 35 and 39 are called class limits with smaller number called the
lower class limit and the larger upper class limit.
Class boundaries: They separate one class from the other. They are the smallest
and largest values an item in that class can have. The smallest value is the lower
class boundary and the larger value called the upper class boundary. For a
grouped frequency distribution, each class boundary is usually half-way between
the upper limit of one class and the lower limit of the next class. i.e. ½( upper
class limit of one class + lower class limit of the next class)
Alternatively, to find the class boundaries of a grouped data, follow the steps
below:
a) First find the difference between the upper class limit of a particular class
interval and the lower class limit of the next class and divide it by 2.
From the table below, the difference between the upper class limit of a
particular class interval and lower class limit of the next class i.e. (40-39 =
1) and half of the difference is 0.5.
b) Subtract the 0.5 obtained from the lower class limits of each class and add
the 0.5 to the upper class limit of each class to obtain the class boundaries.
I.e. for the class 35- 39, 34.5 is the lower class boundary and 39.5 is the
upper class boundary as shown in the table below.
Note that it is not always that you subtract 0.5 from the lower class limits and add
0.5 to the upper class limits. For the frequency table below, we subtract 40 (i.e.
58
INTRODUCTORY BIOSTATISTICS
half 80) from the lower class limits and add 40 to the upper class limits if we
follow the steps above.
In most cases you subtract 0.5 from the lower class limits and add 0.5 to the upper class
limits.
The class size or width is the difference between the lower and upper boundaries of a
class interval. I.e. class size or width = upper class boundary – lower class boundary
From the table above, the class size of the class interval 36-39 is (39.5 –34.5) = 5.
For the class interval 45- 49, the class size is (49.5 – 44.5) = 5.
The class mid – value or midpoint is the middle value of a class. It is obtained by adding
the lower and the upper class limits and diving the result by 2 or it is the average of the
lower and the upper class boundaries. i.e.
From the table above, the class mid- value of the class interval 35- 39 is ½(35+39) = 37
59
INTRODUCTORY BIOSTATISTICS
The class mid- value (denoted by x) is often used in calculating the mean and standard
deviation of a grouped frequency distribution.
EXAMPLES
1. The frequency table below gives the exact age distribution of women that visited
BMC for nutritional attendance to their children suffering from severe wasting in
June 2020.
Age (yrs) 17 18 19 20 21
No of girls 4 10 8 5 2
a) Calculate the mean age of the women
b) Find the median age
c) What is the modal age of the women?
Solution
1
= ( 29+1 ) th item
2
= 15th item which corresponds to 19 years
No of
surgerie
s 0 1 2 3 4 5 6 7
60
INTRODUCTORY BIOSTATISTICS
No of
doctors 15 9 6 19 5 3 2 1
Find the;
Solution
No of No of
surgeries doctors fx Mean number of surgeries = ∑ fx = 132
(x) (f) ∑f 60
0 15 0 = 2.2 ≈2
1 9 9
2 6 12
Median =
3 19 57
4 5 20
5 3 15 1
( ∑ f ) th item∧thenext
6 2 12 2
7 1 7
∑f =60 ∑fx =132
1
= ( 60 ) thitem∧the next
2
=30th and 31st item which corresponds to 2&3
2+ 3
=≫ =2.5 ≈ 3
2
= 30
30
% of doctors that performed more than 2 surgeries = X 100=50 %
60
61
INTRODUCTORY BIOSTATISTICS
3. The management of BMC wanted to know the ages of patients who died in the
hospital within the first quarter of 2013 so that they can put in place age specific
measures to reverse the trend in 2014. Records on 30 deaths were therefore
reviewed and the data obtained were as follows;
1 2 4 25 15 10 12 2 4 2
5 7 6 8 12 14 14 17 19 1
2 2 21 23 2 4 5 1 3 11
a) Group the data using a frequency table beginning with class interval 1-5, 6-10
and ending with class interval 21-25.
b) Determine the mean age of the distribution
c) What is the median age group?
d) What is the median age?
e) What is the modal age group?
f) What is the modal age?
Solution
a)
1-5 3 15 45 15
6-10 8 4 32 19
11-15 13 6 78 25
16-20 18 2 36 27
21-25 23 3 69 30
∑f x
Total ∑f =30 =260
∑ fx 260
b) Mean = = = 8.7 ≈ 9 years
∑f 30
1
c) Median age group = ( ∑ f ) th item∧thenext
2
62
INTRODUCTORY BIOSTATISTICS
1
= ( 30 ) th item∧the next
2
=15th and 16th item which falls in 1-5 and 6-10 age groups
3+ 8
=≫ =5.5 therefore the median is within 6-10 age group
2
[ ]
n
−F
d) Using the formula; Median = 2
L1 + Xw
fm
Where L1 = Lower class boundary of the lower class limits of the median class
n = total frequency
w = class width
[ ]
30
−15
Median = 2
5.5+ X5
4
= 5.5+ ( 0 ) X 5=5.5
f) Modal age = L1 +
∆1
[
∆1 +∆ 2
Xw
]
Where L1 = Lower class boundary of the lower class limits of the modal class
∆ 1= difference between the frequency of the modal class and the class
before the modal class
∆ 2 = difference between the frequency of the modal class and the class
after the modal class
w = class width
63
INTRODUCTORY BIOSTATISTICS
Mode = 0.5+ [ 15
15+11
X 5s
]
= 0.5+ ( 0.58 ) X 5=3.4
ACTIVITY 5 B
1. The table below gives the weights of some neonates admitted at the pediatrics’
ward at the Volta Regional Hospital Ho.
Weight (kg) 2 3 4 5 6 7 8 9
Number of
children 14 21 42 83 118 12 7 3
65 84 91 58 43
57 33 53 29 40
64
INTRODUCTORY BIOSTATISTICS
37 14 18 92 13
22 41 27 51 63
73 33 76 80 86
72 19 51 67 27
61 39 23 22 45
19 35 39 72 47
a) Using the class interval 11-20, 21-30 … 91-100 construct a cumulative frequency
table for the data.
b) Calculate the mean age of the patients. ……………………………………….
c) i) Find the median age group ………………………………………………..
ii) Calculate the median age …………………………………………………
d) i) Find the modal age group …………………………………………………
ii) Find the modal age …………………………………………………….
A cumulative frequency table can be made from a frequency table. The cumulative
frequency for any class is the sum of the frequencies of that class and all the lower
classes.
Cumulative frequency tables are usually written in vertical column. The last cumulative
frequency in the table should be equal to the total frequency.
65
INTRODUCTORY BIOSTATISTICS
See the figures below. Accurate values are obtained from the cumulative frequency
curve than from the cumulative frequency polygon. But in an examination draw the
diagram stated in the question.
A cumulative frequency diagram can be used to estimate values from the data. Always
draw lines on your graph to show how you obtained your answers.
66
INTRODUCTORY BIOSTATISTICS
MEASURES OF DISPERSION AND VARIABILITY
In statistics, dispersion (also called variability, scatter, or spread) denotes how stretched
or squeezed a distribution (theoretical or that underlying a statistical sample) is. Common
examples of measures of statistical dispersion are the variance, standard
deviation and interquartile range.
Dispersion is contrasted with location or central tendency, and together they are the most
used properties of distributions. Some common measures of dispersions are range,
quartiles, percentiles, variance, Mean deviation and standard deviations.
Range
The simplest of our methods for measuring dispersion is range. Range is the difference
between the largest value and the smallest value in the data set. While being simple to
compute, the range is often unreliable as a measure of dispersion since it is based on only
two values in the set.
Example
The following data below are the mark scored by ten students in a mid-semester exams in
College of Nursing and Midwifery, Nalerigu; 12,11,12,18,7,14,10,17,15 and 13. What is
the range of the data set?
Solution
Range = 18 – 7= 11
The inter- quartile range and semi- interquartile range are slightly better measures of
dispersion than the range. They are not affected by extreme values because they are based
on the ‘middle- half’ of the data, i.e. between the upper and lower quartiles.
67
INTRODUCTORY BIOSTATISTICS
The inter- quartile range = upper quartile – lower quartile. i.e. Q3 – Q1
The semi- interquartile range is half the lower quartile range. i.e. ½(Q3- Q1)
Example:
A distribution has a lower quartile of 147and upper quartile of 166. Calculate the inter –
quartile range and hence the semi- interquartile range.
Solution
= 166 – 147 = 19
Exercise:
156,149,129,140,143,135,137,145,146,158,147,136,143,143,138.
68
INTRODUCTORY BIOSTATISTICS
Quartiles
In descriptive statistics, the quartiles of a ranked set of data values are the three points
that divide the data set into four equal groups, each group comprising a quarter of the
data. A quartile is a type of quantile.
The quartiles of a grouped data divide the set of data into four (4) groups, separated by
Q1, Q2, and Q3. The QUARTILES of a set of numbers are found in the same way as the
median. Also the same method is applied when finding the quartiles from a frequency
distribution table.
For e.g.: To find the quartiles of a set of “n” numbers: x1, x2, x3…xn
The quartiles are usually found from the cumulative frequency curve.
The First Quartile (Q1) also known as the lower quartile is defined as the middle number
between the smallest number and the median of the data set.
1
Q1 ¿ ( n+1 ) thitem
4
The Second Quartile (Q2) which is also the median splits the data into two equal halves.
The Third Quartile (Q3) also known as the upper quartile is the middle value between the
median and the highest value of the data set.
3
Q3 ¿ ( n+1 ) th item
4
69
INTRODUCTORY BIOSTATISTICS
The interquartile range (IQR) is the difference between the upper and lower quartile. IQR
= Q3 – Q1
Alternatively, the quartiles can also be found by following the procedure.
Example
The following are the ages of patients seen in a particular room at BMC for a particular day;
42,43,36,39,41,40,49,47,7,6,15. Find the;
a) Lower quartile
b) Upper quartile
c) Interquartile range
Solution
n = 11
1
a) Q1 ¿ ( n+1 ) th item
4
1
= ( 11+1 ) thitem
4
= 3rd item which correspondents to 15
3
b) Q3 ¿ ( n+1 ) thitem
4
3
= ( 11+1 ) th item
4
= 9th item which correspondents to 43
c) IQR = Q3 – Q1
70
INTRODUCTORY BIOSTATISTICS
= 43 – 15 = 28
TRIAL
The following set of data are the ages of students who failed Introductory Statistics quiz
one; 18, 20, 23,20,23,27,24,23,29. Find the;
a) Mean age of the students
b) Median age of the students
c) Lower quartile of the data
d) Upper quartile of the data
e) Interquartile range
Percentiles
In estimating percentiles, you first start by calculating the ordinal rank and then taking
the value from the ordered list that corresponds to that rank. But before you calculate the
ordinal rank, rank your data in ascending order of magnitude. The ordinal rank n is
calculated using this formula;
n=
[ P
100
xN
]
Where P = percentile value
TRIALS
1. The following data are the ages of Malaria death cases recorded at Savulugu
Hospital in April 2015; 30, 15, 50, 40 and 35. Find the 30 th, 40th, 50th and 100th
percentile.
2. The marks scored by ten students in a GCOM quiz are as follows; 15, 10, 3, 6, 20,
8, 7, 8, 16 and 13. Find the 25th, 50th and 75th percentile.
71
INTRODUCTORY BIOSTATISTICS
Finding quartiles and percentiles from a cumulative frequency curve
The cumulative frequency curve can also be used to estimate the quartiles and
percentiles.
The lower quartile (Q1) is the mark corresponding to ¼ ∑f on the cumulative frequency
axis,
The upper quartile (Q3) is the mark corresponding to ¾ ∑f on the cumulative frequency
axis.
n
The nth percentile is the mark corresponding to ¿ ×∑ f on the cumulative frequency
100
axis.
60
For example, the 60th percentile is the mark corresponding to ¿ ×∑ f on the
100
cumulative frequency axis.
72
INTRODUCTORY BIOSTATISTICS
Standard Deviation and Variance
Standard deviation is the measure of spread most commonly used in statistical practice
when the mean is used to calculate central tendency. Thus, it measures spread around the
mean. Standard deviation is also influenced by outliers one value could contribute largely
to the results of the standard deviation. In that sense, the standard deviation is a good
indicator of the presence of outliers. This makes standard deviation a very useful measure
of spread for symmetrical distributions with no outliers.
Generally, the more widely spread the values are, the larger the standard deviation is. It is
represented as σ
The standard deviation is the positive square root of the variance. For instance, if the
variance of a data is 1.69, then the SD is √ 1.69 = 1.3
The variance is the square of the standard deviation. For example, if the SD of a data is
1.2, then the variance is 1.2 2 = 1.44. Thus to find the variance, we use the formulae below
for SD but without the square root sign
It is used to measure only the spread or dispersion of data around the mean.
It is sensitive to outliers
S=
√
∑(x −x) 2
n−1
73
INTRODUCTORY BIOSTATISTICS
Example
The ages of patients scheduled to undergo cleft surgery at Korle-Bu teaching hospital
are as follows; 35, 56, 19, 6, 43, 45 and 57. Find the standard deviation and variance
of the data.
SOLUTION
Mean =
∑ x = 35+56+ 19+6+ 43+45+57 =37.29 ≈ 37
n 7
x
35
(
X −X ¿
-2
¿¿
4
S=
√
∑(x −x) 2
n−1
56 19 361
19
6
-18
-31
324
961 √
= 2150
7−1
43 6 36
45 8 64 = 18.93
57 20 400
2150
NB: Students should find the variance by squaring the standard deviation
s=
√ ∑ ( X− X)2 f
n−1
The variance combines all the values in a data set to produce a measure of spread.
Variance = σ 2
74
INTRODUCTORY BIOSTATISTICS
Example
The data below gives the ages in range of patients seen in a consulting room by a
particular doctor.
Ages 12 - 17 18 - 23 24 - 29 30 - 35
Number of
patient 3 6 4 2
Calculate the;
Solution
∑ fx 337.5
a) Mean ¿ = = = 22.5
∑f 15
b) σ =
√ ∑ (x−x) 2
n−1
f
=
√ 480
14
= √ 34.3
= 5.86
75
INTRODUCTORY BIOSTATISTICS
ACTIVITY 5 C
No of
surgerie
s 0 1 2 3 4 5 6 7
No of
doctors 15 9 6 19 5 3 2 1
Calculate the;
a) Standard deviation for the data. ………………………………….
b) Variance. ………………………………………………..
2. The following are the ages of patients scheduled to undergo hernia repairs surgery
at BMC between August and October 2017.
65 84 91 58 43
57 33 53 29 40
37 14 18 92 13
22 41 27 51 63
73 33 76 80 86
72 19 51 67 27
61 39 23 22 45
76
INTRODUCTORY BIOSTATISTICS
19 35 39 72 47
a) Using the class interval 11-20, 21-30 … 91-100 construct a cumulative frequency
table for the data.
b) Cumulative frequency curve
c) Using the ogive, estimate the;
i) Lower quartile ……………………………………….
ii) Median ……………………………………………….
iii) Upper quartile ………………………………………..
iv) 40th percentile …………………………………………
v) 75th percentile …………………………………………
CHAPTER SIX
INFERENTIAL STATISTICS
Inferential statistics are used to test hypothesis about a population or sample. Statistical
inference is the process of using data analysis to infer properties of an underlying
distribution of probability. Inferential statistics analysis infers properties of a population,
for example by testing hypothesis and deriving estimates. It is assumed that the observed
data set is sampled from a larger population.
Given a hypothesis about a population, for which we wish to draw inferences, statistical
inference consists of first selecting a statistical model of the process that generates the
data and deducing propositions from the model. The conclusion of a statistical inference
is a statistical proposition. Some common forms of statistical proposition are the Point
and Interval estimation.
Point estimate is a single value estimate of a parameter. For example a sample mean
is a point estimate of a population mean. An interval estimate on the other hand gives
you a range of values where the parameter is expected to lie or fall in. A typical
example of the interval estimate is a confidence interval.
77
INTRODUCTORY BIOSTATISTICS
Normal distribution
T-test
Chi square test
Correlation and Regression
Normal Distribution
The normal distribution is the most important probability distribution in statistics because
it fits many natural phenomena. For example heights, blood pressures, IQ scores follow
the normal distribution. It is also known as the Gaussian distribution and the bell curve.
The normal distribution is a probability function that describes how the values of a
variable are distributed. It is a symmetric distribution where most of the observations
cluster around the central peak and the probabilities for values further away from the
mean taper off equally in both directions.
It is a very simple concept and even made easier by the fact that distributions are very
easy to visualize. Indeed in most cases a single glimpse of a graphical representation of a
distribution will tell you more about it than several minutes of staring at a bare list of
numbers.
1. It is bell shaped
2. It is symmetrical about its mean. 50% of the curve to the left side of its
centre and 50% of the curve to the right of its centre.
3. Total area under the curve sums to 1, that is, the area of the distribution on
each side of the mean is 0.5.
4. It is unimodal
5. The normal distribution has a mean of zero and a standard deviation of 1.
6. About 68% of all data falls within ± 1 SD of the mean.
7. About 95% of all data falls within ± 2 SD of the mean.
8. About 99.7% of all data falls within 3 SD of the mean.
9. The standard normal curve is symmetrical.
78
INTRODUCTORY BIOSTATISTICS
Empirical Rule
The empirical rule tells you what percentage of your data falls within a certain number of
standard deviations from the mean;
68% of the data falls within one standard deviation of the mean.
95% of the data falls within two standard deviations of the mean.
99.7% of the data falls within three standard deviations of the mean.
The standard deviation controls the spread of the distribution. A smaller standard
deviation indicates that the data is tightly clustered around the mean; the normal
distribution will be taller. A larger standard deviation indicates that the data is spread out
around the mean; the distribution will be flatter and wider. For example, if the mean
weight of a population is 70kg with a standard deviation of 3.4kg then, 68% of the
weights fall between 66.6 and 73.4kg, 1 SD (70 ±3.4). 95% of the weight fall between
63.2 and 76.8kg, 2 SDs. 99.7% of the weights fall within 59.8 and 80.2kg, 3 SDs.
79
INTRODUCTORY BIOSTATISTICS
Skewness
ACTIVITY 6
80
INTRODUCTORY BIOSTATISTICS
3. ………………………………………………………………………………….
CHAPTER SEVEN
2. Bed Complement
Total number of hospital beds for admission of cases
3. Patient Days
Number of inpatients x days in Period
81
INTRODUCTORY BIOSTATISTICS
5. Crude death rate
Total deaths in a year X 1000
Total population in the same year
82
INTRODUCTORY BIOSTATISTICS
Bed Turnover Rate (BTR) is the average number of inpatients admitted per each
hospital bed.
Discharges + Deaths
Bed complement
83
INTRODUCTORY BIOSTATISTICS
Total live births in the same period
Example
Table 10: Inpatient statistics for OLG hospital, March, 2012
No. of Patient Available
Ward Admissions Discharges Deaths
Beds days bed days
Male Ward 28 117 102 7 921 840
84
INTRODUCTORY BIOSTATISTICS
d. Turn-over per bed
e. % bed occupancy
Solution
ACTIVITY 7
Write your answers in the spaces provided.
Table 11: Inpatient statistics for Lawra hospital, 2010-2013
85
INTRODUCTORY BIOSTATISTICS
2012 42 185 167 4 543 1123
2. In 2010, the male and female annual catchment populations for Presbyterian clinic
were 5,000 and 6,000 respectively. If the clinic recorded 250 and 300 deaths in males and
females respectively, within the same period of the year, Calculate the:
a) Sex-specific death rate (SSDR) for males. ………………………………..
b) Sex-specific death rate (SSDR) for females ………………………………..
c) Crude death rate (CDR) of the population ………………………………..
3.
Fig. 12: Basic statistics for the 1st 10 days in Our Lady of Grace Hospital- June, 2011
No. of patients in Total beds in
Day Discharges Deaths
ward ward
1 5 7 2 0
2 3 7 2 1
3 4 7 1 1
4 6 7 2 0
5 7 7 3 0
6 6 7 1 0
7 5 7 1 1
8 4 7 2 1
9 3 7 2 0
10 2 7 1 0
86
INTRODUCTORY BIOSTATISTICS
Use the data in the table to calculate the following for the 10 day period.
1. Bed complement. ………………………………..
2. Patient days ………………………………..
3. Available bed days ………………………………..
4. Average length of stay………………………………..
5. Turn-over interval………………………………..
6. % bed occupancy rate ………………………………..
7. Inpatient death rate ………………………………..
4
In 2001, a total of 15,555 CSM deaths occurred among males and 4753 CSM deaths
occurred among females. The estimated 2001 population for males and females were
139,813,000 and 144,984,000 respectively.
Q1. Calculate the CSM-related death rates for males and females
………………………………..
Q2. What type(s) of mortality rates did you calculate in Q1?
………………………………..
Q3. Calculate the ratio of CSM –mortality rates for males compared to females
………………………………..
Q4. Interpret the rate you calculated in Q3 as if you were presenting information to a
policymaker. ………………………………..
5
Between 2000 and 2009, a total of 143,497 cases of diphtheria were reported. During the
same decade, 11,228 deaths were attributed to diphtheria. Calculate the death-to-case
ratio. ………………………………..
6
In an epidemic of cholera traced to green onions from a restaurant, 555 cases were
identified. Three (3) of the patients died as a result of their infections. Calculate the case-
fatality rate. ………………………………..
87
INTRODUCTORY BIOSTATISTICS
REFERENCES
www.mathsisfun.com/data/frequency-grouped-mean-median-mode
Dodge, Y. (2006). The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613
www.mathsteacher.com.au/year9/ch17_statistics/06_quartiles/quartiles.htm
www.wyzant.com/resources/lesons/math/statistics_and_probability/averages/
88
INTRODUCTORY BIOSTATISTICS