0% found this document useful (0 votes)
19 views

Statistics DietCook

Simplified notes on introductory statistics for dietitian assistants

Uploaded by

Tia Vitus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Statistics DietCook

Simplified notes on introductory statistics for dietitian assistants

Uploaded by

Tia Vitus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 88

CHAPTER ONE

INTRODUCTION TO STATISTICS
Introduction
Information we receive through print and electronic media in classrooms and seminars
and at other various functions such as workshops appear in the form of numbers. Results
from polls, weather forecast, inflation rates, patient medical records, incidence and
prevalence rates, mortality rates, neonatal mortality rates among others are often
expressed in percentages, ratios, probabilities.

Concept of Statistics
Statistics is a mathematical principle that is applied in our day to day activities in this
current global age. It can be applied in the industry, health sector, business, social
environment and other areas. As health professionals the application of statistics in the
day to day discharge of our duties cannot be left out. In general one can say that statistics
is the methodology for collecting, analyzing, interpreting and drawing conclusions from
information.

Statistics can be defined as a mathematical body of science that deals with the collection,
organization, analysis and interpretation and presentation of data in a meaningful way.

For health care professionals, the effort to improve the quality of the services they
provide to patients never ends. The goal becomes more vital as health care budgets shrink
and the demands placed on health care systems push them to the breaking point.

The application of statistics to biological and medical data promises to have a tremendous
impact on the provision of health care and prevention of disease. The accurate
interpretation of biostatistical data can serve as the foundation for efforts to improve
public health and the quality of patient care. As with many burgeoning technologies,
however, there is much uncertainty among nursing professionals about the role of
biostatistics in health care.

1
INTRODUCTORY BIOSTATISTICS
Statistical methods can be used to find answers to the questions like:
 What kind and how much data need to be collected?
 How should we organize and summarize the data?
 How can we analyze the data and draw conclusions from it?
 How can we assess the strength of the conclusions and evaluate their uncertainty?
That is, statistics provides methods for
 Design: Planning and carrying out research studies.
 Description: Summarizing and exploring data.
 Inference: Making predictions and generalizing about phenomena represented by
the data.
Example (Statistics in practice)
 Consider the following problems:
 Agricultural problem: Is new grain seed or fertilizer more productive?
 Medical problem: What is the right amount of dosage of drug to treatment?

Branches of Statistics
It is very important for every student to know about the different branches of statistics to
enable them correctly understand statistics from a more holistic point of view. Statistics
can be divided into two distinct branches namely descriptive statistics and inferential
statistics. These two branches of statistics are employed in scientific analysis of data and
both are equally important in understanding statistics and undertaking statistical tasks.

2
INTRODUCTORY BIOSTATISTICS
 Descriptive Statistics

This is a branch of statistics that involves the collection and summarization of data in
order to express understanding and compare characteristics measured by the data.
Here, data is organized in the form of frequency distribution tables, pie charts, and bar
charts.
Summary statistics such as mean, mode, median, standard deviations etc. are done to
bring more meaning in the data.
It is usually the first part of a statistical analysis and also different areas of study requires
different kinds of analysis using descriptive statistics. For example, a nursing assistant
studying the OPD attendance for a particular month needs the daily attendance for that
hospital from which he or she can find the average attendance with regards to age group,
gender and other parameters using the mean, mode, percentages and others.

3
INTRODUCTORY BIOSTATISTICS
 Inferential Statistics

This is a branch of statistics that is concerned with drawing conclusion or hypotheses


about the characteristics of a population based on the sample drawn from the population.
As the name suggests, it involves drawing the right conclusions from the statistical
analysis that has already been performed using descriptive statistics.
Most predictions of the future and generalizations about a population by studying a
sample comes under the preview of inferential statistics. This therefore means that after
the descriptive statistics is done, results obtained are now interpreted and generalised for
the entire population of the study. Also whiles drawing conclusions, one need to be very
careful in selecting the sample for the study to avoid biased conclusions therefore a need
for a representative sample becomes necessary.

Importance of Statistics in Healthcare


Statistics is very important in many fields in human activities as it provides the basis for
good decision making, better assessment of chances and policies and on the long run
provides good inference and judgement. Generally in healthcare delivery, the following
are the key importance of statistics.

1. As a healthcare professional it makes your work reliable with data and numerical
reasoning. It provides you with the knowledge of reasoning and methods of
organising and interpreting data to the understanding of the ordinary person.
2. It helps solve the problem of variations in individual members or characteristics in
a population of study Example height, weight, haemoglobin level, temperature,
etc. Such measurement hardly give the same result when taken at different times,
hence the need to employ statistics for a mean value for that individual.
3. To the Ministry of Health, statistics is vital for planning, budgeting, forecasting
and implementation of key health policies. Some distinct use of statistics include;
i. To test the efficacy of a drug
ii. To predict bed availability and occupancy
iii. To calculate incidence and prevalence rates
iv. Helps us understand medical literature

4
INTRODUCTORY BIOSTATISTICS
4. Helps in the analysis and Presentation of Data - Tools like SPSS, Minitab, R and
Microsoft Excel are used to analyze data and present data in tables, charts, etc. to
give it a meaningful interpretation.
5. Resource Allocation - Guide managers as to where to allocate more resources

Definition of Basic Concepts and Terminologies


1. Data: Data are raw facts or figures that are not yet processed. Conclusions are
drawn from data.
2. Data Set: A collection of data values
3. Data Value or Datum: A value in a data set.
4. Information: Information is data that has been processed, organized and
interpreted so that meaning emerges from it. Information in short is a processed
data.
5. Frequency: This can be defined as the number of times an event is occurring.
6. Population: This can be defined as the entire pool from which a sample is draw.
It can also be said to be a complete group of unit that is being investigated. Some
examples are;
 All the students of College of Nursing and Midwifery Nalerigu.
 All the patients in the hospital, All the people in the East Mamprusi
District,
 All the children under 5 years in Nalerigu, and so on.
7. Finite population: This is a type of population in which its members can be
counted. It is also called a countable population. It is a population with a fixed
number of elementary units. For example the number of students in the health
promotion class, the number of patients in the female’s ward at the Baptist
Medical Centre, the number of nurses in Tamale Teaching Hospital, etc.
8. Infinite Population: This is a type of population in which its members cannot be
counted. It is an uncountable population. Examples are the number of germs in
the body of sick child, the number of plasmodium parasites inside a malaria
patient, etc.

5
INTRODUCTORY BIOSTATISTICS
9. Population size: This is the total number of elements in a population. It tells us
how many items are there in a population. For example, the total number of
patients in the emergency ward at Korle Bu Teaching hospital.
10. Sample: A sample is a subset of a population. It is a fraction of an entire
population. For a sample to be accurate and acceptable, it has to be representative
and bias free. Example of a sample is the number of Muslim students in Nalerigu
NMTC.
11. Representative Sample: This is sample drawn from a population of interest and
has the same characteristics as in those in the population.
12. Bias Sample: This is sample in which certain characteristics are more likely to be
included in the sample than other characteristics in the population. E.g. selecting
only Muslims students to solicit their views in order to determine a day of
worship for the school. Also another example is conducting a study on the meals
served to students by the College of Nursing and Midwifery School Nalerigu, the
researcher then chooses only the diploma students as the target population.
Conclusions generated from this study cannot be representative for the entire
student body which therefore makes it a bias study.
13. Sample Size: The total number of elements in a sample.

14. Parameter: This is a single value which is a summary number that describes an
entire population. It is a descriptive measure of a population. It uses the mean,
median mode, percentages, etc. For example the percentage population of the
students in Nalerigu NTC is distributed as 60 % and 40% for females and males
respectively.
15. Statistic: This is a single value which is a summary number that describes an
entire sample. It is a descriptive measure of the sample. It also uses the mean,
median, mode, etc. For example the mean age of students in RNAC E class is 24
years.
16. Elementary Units: Each individual element in a population is known as
elementary unit. Some examples are; each student in a school; each patient in a
hospital; each doctor in a hospital.

6
INTRODUCTORY BIOSTATISTICS
Variables
A variable is any characteristic of a person, object or phenomenon that can take on
different value. It can be manipulated and changed from time to time. Examples are age,
gender, weight, height, time, eye colour, region, occupation, etc.

Types of Variables
There are different ways variables can be described according to the ways they can be
studied, measured, and presented. Variables can be classified as quantitative or
qualitative.

1. Quantitative Variable: These are variables that are numerical and can
quantified. For example the variable age is numerical, and people can be ranked
in order according to the value of their ages. Other examples of quantitative
variables are heights, weights, and body temperatures. Quantitative variables can
be further classified into two groups as:
a. Discrete Variables - A discrete variable is one that assumes only specific
values with no possibility of any values between them. In other words, a
discrete variable can take only non0negative whole numbers or integers.
Discrete variable can be assigned values such as 0,1,2,3 and are said to be
countable. Examples of discrete variable are the number of patients in a ward,
number of children in a family, number of beds in a hospital, number of
students in a class, ages of tutors in a school, etc.
b. Continuous Variables - A variable that can assume all values between any
two specific values; a variable obtained by measuring. Temperature, for
example, is a continuous variable, since the variable can assume all values
between any two given temperatures. Other examples are, height, weight,
time, etc.
2. Qualitative Variables (Categorical): They variable that can be placed into
distinct categories, according to some characteristic or attribute. For example, if
subjects are classified according to gender (male or female), then the variable
gender is qualitative. Other examples of qualitative variables are religious
preference and geographic location.

7
INTRODUCTORY BIOSTATISTICS
a. Nominal Variables – These are observations that cannot be organized in any
logical sequence. Examples are gender, occupation, place of residence, hair
colour, etc.
b. Ordinal Variables – These are observations that can be organized in a logical
sequence. Examples are standard of living, socio-economic status,
classification of anaemia, etc.
The classification of variables can be summarized as follows;

ACTIVITY 1
Table 1: Indicate whether each of the following is a qualitative or quantitative variable

Variable Variable type

a. Average daily temperature

b. Gender

c. Hair colour

d. Height measured in number of feet

e. Tase

8
INTRODUCTORY BIOSTATISTICS
f. Weight of a patient

g. Marital status

h. Occupation

i. Nutrition status

j. Distance between two towns

k. Patient blood group

l. Educational background

m. Patients seen at the OPD/day

n. Stage of disease

o. Number of days it snowed

p. Cause of death

q. Nationality

r. Number of cancer cases who died

9
INTRODUCTORY BIOSTATISTICS
CHAPTER TWO

SOURCE OF DATA AND SAMPLING TECHNIQUES


Generally, in the practice of statistics, data can be classified into two main categories
which solely depends on its originality. These are called primary data sources and
secondary data sources. Hence data obtained from a primary source is referred to as a
primary data whiles data obtained from a secondary source is referred to as a secondary
data.

PRIMARY DATA
Primary data are first hand data collected by the researcher and is used directly for what it
is intended for. This type of data has not undergone any refining or transforming process.
It is collected and used for various purposes such as demographic characteristics, product
popularity, human right awareness, health education, prediction of rates such as maternal
and neonatal mortality, calculation of OPD attendance, etc.

There are three main methods of obtaining primary data namely;

i. Observational method
ii. Surveys
iii. Experiments

Observational Method

An observation is a data collection method, by which you gather knowledge of the


researched phenomenon through making observations of the phenomena, as and when it
occurs. You should aim to focus your observations on human behaviour, the use of the
phenomenon and human interactions related to the phenomenon.

Advantages of Observational Method

1. It is sometimes the only method of data collection available for some specific
types of information.
2. It helps us collect original data as and when they occur without necessarily
depending on the reports by others. Information passing through different people

10
INTRODUCTORY BIOSTATISTICS
removes the authenticity and originality at some stage and might not be the fair
account of what actually transpired.
3. Observation alone can also capture the whole event as it occurs in it natural
environment.
4. Some subjects seem to accept an observational intrusion better than questioning.
This is because it is less demanding and would not necessarily waste their time.

Disadvantages of Observational Method

1. The observer or researcher must always be at the scene of the event when it
happens, but naturally it is impossible to predict where and when the event will
occur.
2. It is a slow and expensive method of data collection.
3. The research world is more likely suited or designed to subjective assessment and
recording of data than observation and participation, hence making this data
collection method somehow inefficient.

Surveys

This is a method of primary data collection in which the researcher obtains data from the
population of interest on a subject matter using a list of questions. It may be conducted
using questionnaires, phone calls, emails and personal interviews. A survey conducted on
an entire population of interest is referred to as a census and alternatively a survey
conducted on a sample is referred to as a sample survey.

A questionnaire is a structured document containing a printed list of questions which is


used by researchers to solicit the views of respondents on a subject matter. In a
questionnaire respondents are either asked to choose responses based on what is provided
by the researcher or are given the opportunity to write their opinions on a space provided
by the researcher in the questionnaire. In the case where respondents are to choose from
the possible responses, this is referred to as close ended questions alternatively when the
responses or views of the respondents is to be recorded or written by him/herself
verbatim then the question is referred to as open ended.

11
INTRODUCTORY BIOSTATISTICS
Experiments

An experiment is a data collection method where the researcher change some variables
and observe their effect on other variables. The variables that you manipulate are referred
to as independent whiles the variable that changes as a result of the manipulation are
dependent variables. An experiment can be conducted in a laboratory or in a field setting
For example Tobinco Pharmaceuticals limited is testing the effect of a drug strength on
the number of bacteria in the body. The company decides to test the drug strength at
10mg, 20mg and 40mg. In this instance, the drug strength is the independent variable
while the number of bacteria is the dependent variable. The drug administered is the
treatment while the 10mg, 20mg and 40mg are the levels of the levels of treatment.

Experimental method of data collection can be adopted in different fields like medical
research, agriculture, sociology, psychology among others.

SECONDARY DATA
Secondary data are second hand data. They are data that has already been gathered or
published by other sources for other purposes. It is relatively faster to collect and less
expensive than primary data. Secondary data sources include journals, magazines,
newspapers, patient folder, bill boards, recipe books, menus, data available or published
in government agencies, internet sources among others.

SAMPLING
Statistical sampling is a process of selecting a smaller group of respondents (sample)
from the larger group (population) with the resulting sample representing the entire
population. A good sample must exhibit the characteristics of the population of interest.

Importance of Sampling

1. Cost: It saves cost and use limited resources


2. Time: It saves time and effort
3. Accessibility: Coverage of the sample its easily accessible compared to the
population

12
INTRODUCTORY BIOSTATISTICS
Qualities of a Good Sample

 Representativeness: A good sample must possess all the qualities that describes
the entire population. For example if we want to select students to travel for an
industrial tour or affiliation, we need to select students from the various programs
to attend the trip. If we select only students from a particular program or
department then that sample is said not to be representative of the population.
 Large: A good sample must be large enough before it could be used to generalize
for the population. We cannot select any number at all as we wish to draw
conclusion about a population. The larger the sample size, the better the accuracy
of the results. For example, would it be proper for someone to use the bad
behaviour of three students to conclude for the entire class?
 Relevance: The sample drawn from the population must be very useful for the
parameter under construction. For example if a researcher wants to find out if a
patient has malaria, the relevant sample one needs to take is the patient’s blood
and not the stool.

Types of Sampling
There are two types of sampling in statistics and research. These are; probability
sampling and Non-probability sampling.

Probability Sampling

This is a sampling technique in which every member of the population has equal chances
of being selected in the sampling. In this type of sampling technique, every member in
the population has equal chances of participating in the study and sampling is unbiased.
In this section, we shall discuss extensively the four types of probability sampling;

 Simple random sampling


 Stratified sampling
 Systematic sampling
 Cluster sampling

13
INTRODUCTORY BIOSTATISTICS
Simple Random Sampling

Simple random sampling is a probabilistic sampling technique which employs the


balloting method or the lottery method in which every individual or item in the
population has equal chance of being selected. In sampling, we use n to represent the
sample size and N is used to represent the frame or population size.

Stratified Sampling

This is a sampling technique in which the population N is divided into subgroups or


smaller populations known as strata and then simple random is used to select the
samples in each stratum or subgroup. This sampling technique is the most efficient
probability sampling technique because it caters for heterogeneous population and
hence provide better precision in estimating the population parameter.

Systematic Sampling

In this method, observable units of the population are arranged in some way and a
random starting point is selected, then every k th member of the population is selected
for the sample. To obtain a systematic sample, the first individual or item to be
selected is chosen at random from the population of interest and the rest is obtained
by selecting every kth individual or item thereafter from the entire population.

Cluster Sampling

This method is often employed when sampling is done over a large geographical area.
This area is often sub-divided into smaller areas called primary units. At this stage
several units are selected and sampling is concentrated in these units.

In a cluster sampling, the population is divided into several clusters and then a
random sampling is done to select clusters and finally members of the selected
clusters are sampled for the study. This is mostly effective when you have a large
geographical area such as electoral areas, enumeration areas, Districts, Regions, etc.
This sampling technique tends to be less effective compared to simple random
sampling and stratified sampling and often require a large overall sample size
compared to those obtained in the former.

14
INTRODUCTORY BIOSTATISTICS
Non-Probability Sampling

Non-probability sampling is one in which individuals or members of the sample are


selected without regard to their probability of occurrence. In these methods of
sampling, members of the population do not have equal chances of being selected. In
this section, we shall discuss extensively the three types of non-probability sampling;

 Convenience or accidental sampling


 Quota sampling
 Snowball Sampling

Convenience Sampling

The most common type of non-probability sampling technique is the convenience


sampling. Under this type of sampling, members or items are selected into the sample
based on the convenience of the researcher. A typical example of a convenience
sampling technique is sitting in your consulting room and selecting patients as they
come in to participate in your survey. Here, only the patients that visited your
consulting room on that day are selected to participate in your study and not
eligibility of all the population.

Quota Sampling

In this sampling technique, the researcher requires the sample to contain a certain
number of items with a given characteristics. In quota sampling, you select
participants or elements non-randomly according to some fixed quota. This method is
mostly used when you need information about a subject. Example if you want to
gather information about the culture of the people in the North East Region, you just
need to speak to the chiefs and a few elders and you would get the needed
information for your study.

Snowball Sampling

In snowball sampling, the researcher begins by identifying someone who meets the
criteria for inclusion in his or her study. This person then leads or may direct the
researcher to others who also meet the criteria. Although this method would hardly

15
INTRODUCTORY BIOSTATISTICS
lead to representative samples there are times when it may be the best method
available. Snowball sampling technique is very useful especially when you are trying
to reach populations that are inaccessible or hard to find. For instance, if you are
conducting a study on widows in a rural community you are likely not to able to
locate all widows but if you locate one person, that person can always give you
direction to the next and this continues until you get your desired sample size.

ACTIVITY 2

1. What is the difference between primary and secondary data sources? Give two
examples of each relevant to the field of nutrition or dietetics. ……………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
2. What sampling method is suitable when preparing a menu for the 100 cholera
patients in a hospital? State why you choose a particular method. ………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………

16
INTRODUCTORY BIOSTATISTICS
CHAPTER THREE

RATIOS, PROPORTIONS AND RATES AND THEIR


INTERPRETATIONS
Ratios, proportions and rates are common mathematical applications that helps us to
compare, divide and interpret basic statistical cases in health statistics. Ratio and
proportions collectively helps us compare cases like diseases, amount, population and
others. All three frequency
Measures have the same basic form:
numerator n
denominator X 10

Recall that:
100 = 1 (anything raised to the 0 power equals 1)
101 = 10 (anything raised to the 1st power is the value itself)
102 = 10 x 10 = 100
103= 10 x 10 x 10 = 1,000
So the fraction of (numerator/denominator) can be multiplied by 1, 10, 100, 1000, and so
on. This multiplier varies by measure and will be addressed in each section.

Ratios
A ratio compare things. A ratio is a relationship/comparison between two values that is
expressed in the form of a common fraction like x/y or x: y

In a ratio, the values of x and y are completely independent. The order in which the ratio
is given is very important. Therefore the ‘thing’ that is given in the comparison must be
written first in the ratio. For example, the ratio of child to adult is 5: 7, then alternatively
the ratio of adult to child is 7:5. Interpreting ratios are very significant because it gives
more meaning to your work and throws more light on the understanding of the values
obtained.

Some examples of ratios are;

 Child to adult ratio

17
INTRODUCTORY BIOSTATISTICS
 Doctor to patient ratio
 Teacher to student ratio
 Nurse to patient ratio
 Male to female ratio

Method for calculating a ratio


Number or rate of events, items, persons, etc. in one group n
X 10
Number or rate of events, items, persons, etc. in another group

After the numerator is divided by the denominator, the result is often expressed as the
result “to one” or written as the result “:1.”

Example
During the first quarter of 2013, the Baptist Medical Centre recorded 1,068 case of
malaria. 893 cases were females and 175 were males. Calculate and interpret the male-to-
female ratio for pneumonia.
Solution

First define your variables;

Let the number of males be x

And the number of females be y

Then set up your ratio;

Males: Females

X : Y

¿ 175 :893

175
¿
893

Reduce the fraction so that either x or y equals 1

1
¿ =1 :5. 1:5
5.1

Interpretation

18
INTRODUCTORY BIOSTATISTICS
This means that for every one malaria case recorded for male there are five female
malaria cases recorded in the first quarter of 2013 for BMC.

Trial

In Savelugu hospital during their second quarter in 2013, they recorded 45 meningitis
cases in adults out of which 25 were males and the rest were females. Calculate and
interpret the male to female ratio for meningitis.

Proportions
A proportion is a relationship or comparison of x/y in which x is included in y. Proportion
is therefore about sharing something in a given ratio. Proportion simply means comparing
one ‘thing’ to all other ‘things’ combined in the sense that that one thing is among all the
x
other things. .
x+ y

Examples of proportion;

 Proportion of malaria cases among all the other cases


 Proportion of dental abscess cases among all the other abscess cases
 Proportion of female students among students in Nalerigu NTC

In proportion, you first find the total parts for the ratio;

For example the ratio x: y gives (x+y) parts

Also, the ratio X+Y+Z gives (x+y+z) parts. Therefore to find one part proportionally;

x
Proportion of x =
x+ y+ z

y
Similarly, proportion of y =
x+ y+ z

Method for calculating a proportion

Number of persons or events with a particular characteristic


X 10n
Total number of persons or events, of which the numerator is a subset

For a proportion, 10n is usually 100 (or n=2) and is often expressed as a percentage

19
INTRODUCTORY BIOSTATISTICS
Example

During the first quarter of 2019, the Baptist Medical Centre recorded 1,068 cases of
severe malnutrition. 893 cases were females and 175 were males. Calculate and interpret
the proportion of malaria cases that are males.

Solution

First define your variables;

Let the number of males be x

And the total number of cases be y

Then set up your proportion;

Males: All malaria cases

X : Y

¿ 175 :1068

175
¿ X 100=16.39 %
1068

This means that in every 100 malaria cases registered, there are 16 male cases recorded.

Rates
A rate is a third type of frequency measure. It is often a proportion, with an added
dimension: it measures the occurrence of an event in a population over time and is
sometimes used as performance improvement measures.

The basic formula for finding rate is;

Number of cases∨events occurring during a given period n


X 10
Total Number of cases∨population at risk

Some examples of rates are; Caesarean section rates, OPD attendance rates, birth rates,
death rates, admission rates and so on.

20
INTRODUCTORY BIOSTATISTICS
In the case of rates, X and Y are the two quantities that are being compared and he added
dimension in this case is the 10n which is a constant.

Example

In the second quarter of 2016, 53 Caesarean sections (C-sections) were performed in


BMC Nalerigu and during the same time period, 259 women delivered. What is the C-
section rate for BMC in the second quarter of 2016?

Solution

Number of CS= 53

Total deliveries = 259

53
Therefore CS rate ¿ x 100=20.5 % ( 1 d . p ) ≈ 21 %
259

This therefore means that for every 100 deliveries undertaken in the first quarter at BMC,
there were 21 cases of C-sections done.

Or For every 100 women that delivered in the first quarter of 2016 at BMC, there were 21
of them who underwent CS.

ACTIVITY 3

Write the answers to the following questions in the spaces provided. Make sure to answer
questions individually in order to enhance understanding.

21
INTRODUCTORY BIOSTATISTICS
Age group Total OPD
Total admissions Total deaths
(yrs.) cases

Under 1yr. 3 1 0

1 - 4Yrs 4 1 0

5 - 9Yrs 4 2 1

10 - 14Yrs 5 2 1

15 - 17Yrs 7 3 1

18 - 19Yrs 8 3 0

18 - 19Yrs 11 3 1

25 - 34Yrs 8 2 0

35 - 49Yrs 4 1 0

50 and above 6 2 1
Use the data in
Total 60 20 5 Table 2 to
determine the
following:

1. The admission-to-OPD case ratio …………………………

2. The death-to-admission ratio ……………………………

3. The proportion of admissions that occurred in the 18-19yr age group?


………………………….

4. The proportion of deaths that occurred in the 5-9yr age group? …………………..

5. If 15yrs and above are considered as adults and those below children, what
proportion of:

22
INTRODUCTORY BIOSTATISTICS
a. Children were seen at the OPD …………………………………..
b. Adults were admitted and ………………………………..
c. Children who died ……………………………………

CHAPTER FOUR

FRACTIONS, PERCENTAGES AND CONVERSIONS


Fractions
In mathematics, a fraction means part of a whole number. Fraction simply means ‘part of
x
something’. It is expressed mathematically as; . x is called the numerator and y the
y
1 5 7 1
denominator. Examples are; , , ,2 , etc. The numerator is the number of parts
4 3 10 3
chosen and the denominator is the total number of parts.

Types of Fractions
Basically, there are three (3) main types of fractions. These are;

 Proper Fraction: As the name implies, this is a fraction that has its numerator
smaller than the denominator or its denominator larger than the numerator. Some
examples are;
1 5 7 3
, , , etc .
4 7 10 5
 Improper Fraction: On the other hand, an improper fraction is a fraction that has
its numerator larger than the denominator or its denominator smaller than the
14 5 7 13
numerator. Some examples are , , , etc
4 3 2 5
 Mixed Fraction: A mixed fraction is a fraction that has a whole number attached
1 5 7 3
to a fraction. 2 ,3 ,1 , 3 etc . Hence before simplifying this kind of
4 7 10 5
fractions, you MUST always remember to convert them into improper fractions
before working with them.

23
INTRODUCTORY BIOSTATISTICS
 Like Fractions: Two or more fractions are said to be like if they have the same

4
∧5
denominator. For example, 3 7 are like fractions
,
7 7

2
∧3
 Unlike Fractions: Unlike fractions have different denominators example 6
8

are unlike fractions

Simplest Form or Lowest Term


Cancelling common fractions is also called simplifying a fraction. The equivalent

fraction has smaller numbers in it. Therefore, a fraction is in its simplest form or lowest

term if it cannot be reduced any further. A simpler form of any fraction can be found by

dividing the numerator and denominator by the same number. The number must be a

factor of both the numerator and denominator. For example, when you have a fraction say

8
its simplest form is achieved by dividing both the numerator and denominator by 2
6

4
which is a common factor is thereby giving as .
3

Mixed Numbers and Improper Fractions


We can change improper fractions to mixed numbers and vice – versa.

Steps in Converting Mixed Fractions

1
Example 1; Convert 2 into animproper fraction
4

Multiply the whole number by the denominator of the fraction; 2 x 4=8

24
INTRODUCTORY BIOSTATISTICS
Then add the result to the numerator; 8+1=9

9
Finally, rewrite the result in a fractional form maintaining the denominator; ¿
4

The reverse process of changing improper fraction to mixed fraction is relatively


straightforward.

9
In the example above, = 2 remainder 1
4

9 1
Hence =2
4 4

Addition and Subtraction of Fractions


When adding or subtracting fractions, what is considered most are the denominators of
the fractions that are being added or subtracted.

 Like denominators
To add fractions with like denominators, you add the numerators together and
maintain the denominators.

Examples

2 1 20 4
1. Simplifying + 2. Simplifying −
5 5 20 20

2+ 1 3 20−4 16
= = = =
5 5 20 20

4
¿
5

 Unlike denominators
First, in adding unlike denominators, you have to change both fractions so they
are equivalent or equal, and they both have the same denominator. To do this,
you must first find the Lowest/Least Common Multiple (L.C.M) of the fractions.

25
INTRODUCTORY BIOSTATISTICS
A MULTIPLE of a number is a number which can be divided exactly by that
number.
E.g., multiples of 10 are 1, 2, 5 and 10 itself.

Facts about Multiples

 Each number is a multiple of itself

 Every number is a multiple of 1

 The first of every number is the number itself, so a multiple of a number cannot
be more than the number

 Multiples of any number are infinite

Examples

2 1 4 9 1
Simplify the following; (i) + + (ii ) −
5 2 3 4 2

Solution

(i) The L.C.M of 5, 2 and 3 is 30;


You then divide the L.C.M by each of the denominators, multiplying the
result by each numerator;
Finally, you perform your operation (+or-)
2 1 4 12+15 +40 67
¿≫ + + = =
5 2 3 30 30
9 1
(ii) −
4 2
9−2 7
L.C.M of 4 and 2 = 8 ≫ =
4 4

Exercise

1 3 10 4
Simplify the following; (i) + ( ii ) −
7 5 3 9

26
INTRODUCTORY BIOSTATISTICS
Multiplication and Division of Fractions
Fractions can be multiplied and simplified into simpler fractions. In multiplying fractions,
the numerators multiply themselves and the denominators also does same. Also if the
numbers diagonal from each other has the same factor, they can be reduced as low as
they can go.

Examples

2 1 2 X1 2 15 4 3 1 3 X 1
1. Multiply X ≫ = 2. Multiply X ≫ X =
5 3 5 X 3 15 8 25 2 5 2 X 5
3
¿
10

In example 2, the fractions were simplified diagonally using the common factors.

When dividing fractions, the basic rule is reciprocation. In dividing fractions, the first
fraction is maintained and the second fraction is turned upside down (i.e. the numerator
becomes the denominator and vice versa). After that, the operation is then changed from
division to multiplication and finally simplified.

Examples

5 3 1 3
1. Simplify ÷ 2.2 ÷
6 24 5 10

Solution

5 24 5 4 11 10 11
1. X = X 2. X = X3
6 3 1 3 5 3 1
5 X 4 20 11 X 2 22
= = ¿ =
1X3 3 1X3 3

27
INTRODUCTORY BIOSTATISTICS
ORDER OF OPERATIONS (BODMAS)
When working problems involving fractions which have more than one of the following

signs; ‘of’, x, +, - and /, we use what is known as BODMAS. The following steps should

be followed in the order shown:

Step 1: Deal with anything in Bracket first (if there is any) i.e.’ ()’

Step 2: Deal with 0 = ‘0f’ (if there is any) i.e. ‘x’

Step 3: Deal with D= division (if there is any) i.e. ‘/’

Step 4: Deal with M= multiplication (if there is any) i.e. ‘x’

Step 5: Deal with A= addition (if there is any) i.e. ‘+’

Step 6: Deal with S = subtraction (if there is any) i.e.’-‘

Illustrative examples

Evaluate the following

1 1 1
2 + of 3
2 3
( 5
1. 1 − 1 ÷ 2 2.
3 4 8
3 2
)
2 4
×
3 1 3
3. 2 + ÷ 8
3 2
1
5 ( )
3 5

Percentages
A percentage is a fraction whose denominator is 100. The term ‘per cent’ means out of a
hundred. In mathematics, percentages are used to describe parts of a whole – the whole
being made up of equal parts. The percentage symbol % is used commonly to show that
the number is a percentage.

28
INTRODUCTORY BIOSTATISTICS
Understanding percentages is a key skill that will potentially save you time, money and
makes you more useful in the area of health.

Examples

1. Find 20% of 250 students.


20
¿ X 250=50 students
100
2. In June 2013, the total number of malaria cases recorded at BMC was 1,250 out
of which 850 are males and the rest females. What percentage of malaria cases are
females.
Solution
Number of females = 1250 – 850
= 400
400
% of female cases ¿ X 100
1250
= 32%

Therefore, this means that in every 100 malaria cases recorded in 2013, there are 32
female malaria cases recorded.

Exercise

Dr Mubarik has to create the holiday schedule for the maternity unit at his hospital. He
knows that 20% of the staff will not be available because they are on holiday off. Of the
remaining staff members who will be available, only 10% are certified to work at the
maternity unit. What number of the remaining staff will be available but not certified to
work in the maternal unit if the total staff is 300?

Percentage Increase and Decrease


To increase or decrease a percentage a quantity by a percentage, work out the actual
increase or decrease and then add or subtract it to or from the original quantity to give the
new increase or decrease.

Examples

1. Increase 200 by 10%


29
INTRODUCTORY BIOSTATISTICS
Solution
10
= X 200=20
100
The new value = 200 + 20 = 220
2. Decrease 120 by 20%
20
= X 120=24
100
The new value = 120 – 24 = 96

Exercise

The population of malaria cases recorded in Tamale Teaching Hospital in the year 2000
was 500,000. Over the decade the population of malaria cases grew by 8%. What is the
population of malaria cases in the year 2010?

CONVERSIONS
1. Converting from percentage to decimal
To convert from percentage to decimal, you divide by 100, and remove the "%"
sign.
The easiest way to divide by 100 is to move the decimal point 2 places to the left.
Examples
i. Convert 75% to decimal

75
Or ¿ =0.75
100
ii. Convert 1.25%
1.25
= =0.0125
100
2. Converting from Decimal to percentage
To convert from decimal to percentage, you multiply the quantity by 100 i.e. you
move the decimals two places to your right. Always remember to add the ‘%’
after conversion

30
INTRODUCTORY BIOSTATISTICS
.
Examples
i. Convert 0.14 to percentage ii. Convert 2.4 to percentage
= 0.14 X 100=14 % = 0.4 X 100=40 %
3. Converting from fraction to decimal

The easiest way to convert a fraction to a decimal is to divide the top number by the
bottom number (divide the numerator by the denominator in mathematical language)

Example

1
ii. Convert to decimal
4

= 1 ÷ 4 = 0.25
4. Converting from decimal to fraction
Converting from decimal to fraction is a bit more involving and needs a more
technical approach

31
INTRODUCTORY BIOSTATISTICS
Example
i. Convert 0.64 to fraction
0.64
Firstly, you write down the decimal “over” 1 ≫
1
0.64 100
You then multiply the numerator and denominator ≫ X
1 100
by 10 or 100 for every number after the decimal
( i.e 10 for 1 number, 100 for 2 numbers, etc)
64 16
≫ =
100 25
ii. Convert 1.8 to fraction
1.8 10
= X
1 10
18 9
= =
10 5
5. Converting from fraction to percentage

The easiest way to convert a fraction to a percentage is to divide the top number
by the bottom number. Then multiply the result by 100, and add the "%" sign.

Examples
2 4
i. Convert ¿ percentage ii. Convert ¿ percentage
5 9
= 2 ÷5=0.4 = 4 ÷ 9=0.44
= 0.4 X 100 = 40% = 0.44 X 100 = 44%

6. Converting from percentage to fraction

32
INTRODUCTORY BIOSTATISTICS
To convert a percentage to a fraction, first convert to a decimal (divide by 100), then
use the steps for converting decimal to fractions (like above).

Examples
i. Convert 45% to fraction ii. Convert 26% to fraction
45 9 26 13
= simplifying this= = dividing by 2=
100 20 100 50
.

ACTIVITY 4 A
write your answers in the spaces provided

Convert the following;

i. 0.14 to percentage …………………………….


ii. 56% to decimal ……………………………….
1
iii. 3 ¿percentage ……………………………..
4
iv. 19 % ¿ fraction ………………………………
8
v. to decimal…………………………………..
9
vi. 0.44 to fraction ………………………………

ACTIVITY 4 B

33
INTRODUCTORY BIOSTATISTICS
write your answers in the spaces provided

Table : Top 10 Morbidity in War Memorial Hospital, 2011

No. Disease No. of cases

1 Malaria 39257

2 ARI 10068

3 Anaemia 9662

4 Acute Eye infection 9012

5 Cataract 7157

6 Skin diseases and ulcers 6032

7 Rheumatism and joint pain 5129

8 Diarrhoea Diseases 5124

9 Hypertension 4780

10 Pneumonia 4455

Use Table 5 to answer the questions that follow:

1. Express the anaemia cases as a fraction in it lowest terms to the total disease burden
in the health facility. ……………………………………….
2. Convert the fraction in (1) to a percentage ………………………………………..

CHAPTER FIVE
PICTORIAL AND GRAPHICAL REPRESENTATION OF DATA
Introduction

34
INTRODUCTORY BIOSTATISTICS
When we handle more or large data individually or collectively, tables and charts can be
used to properly organize and summarise our data effectively and efficiently to ease
understanding.

These statistical tools help us to understand relationships, trends and distributions of data
which will help give a clear picture of the current situation.

Tables
A table is a set of data arranged in rows and columns. Most quantitative information can
be organized and displayed in a table. Tables are useful for demonstrating patterns,
exceptions, differences, and other relationships. Mostly, tables serve as the basis for
preparing more visual displays of data, such as graphs and charts, where some of the
detail may be lost.

Tables designed to present information to others should be free from ambiguity. This
means that it should be simple as possible. Two or three small tables, each focusing on a
different aspect of the data, are easier to understand than a single large table that contains
many details or variables.

SCENARIO (A)

In June 2020, the total OPD morbidity reported by Baptist Health Centre was 250. Ninety
(90) of these were malaria, 70 were diarrhoeal diseases, 30 were typhoid fever, 50 were
intestinal worms and 10 were URTIs. Out of the total malaria cases, 40 were males and
50 were females. 45 females and 25 males made up the diarrhoeal diseases and 12 males
and 18 females made up the typhoid disease burden. The intestinal worms comprised 24
females and 26 males and only one case was male among the URTIs cases reported.

SCENARIO (B)

Cases Frequency

35
INTRODUCTORY BIOSTATISTICS
Male Female Total
Malaria 40 50 90
Diarrhoea 25 45 70
Typhoid fever 12 18 30
Intestinal Worms 26 24 50
URTIs 1 9 10
Total 104 146 250
Source: BHC 2013

Which scenario is easier to understand? It could be seen that scenario (B) is easier to
comprehend. Also, a table should always be self-explanatory such that even if it is taken
out from its original context it should still convey the information necessary for the
reader to understand it.

 To create a table that is self-explanatory, follow the guidelines below:

i. Use a precise and concise title that describes the what, where, and when of
the data in the table.

ii. Precede the title with a table number (for example, Table 2.1).

iii. Label each row and each column clearly and concisely and include the units
of measurement for the data (for example, years, mm Hg, mg/dl, rate per
100,000).

iv. Show totals for rows and columns. If you show percent (%), also give their
total (always 100).

v. Explain any codes, abbreviations, or symbols in a footnote. (For example,


Syphilis P&S = primary and secondary syphilis)

vi. Note the source of the data in a footnote if the data are not original.

Types of Variables
 One variable table: This is the most common type of table in descriptive
statistics which displays a frequency distribution with only one variable.
36
INTRODUCTORY BIOSTATISTICS
In this type of frequency distribution table, the first column shows the values or
categories of the variable represented by the data, such as age, gender, etc.
The second column shows the frequency or number of persons or events that fall
into each category. Often, a third column lists the percentage of persons or events
in each category.
Table 1: Age distribution of OPD attendance in March 2014

Age (years) Frequency Percentage


15-19 10 15.38
20 -24 30 46.15
25 – 29 12 18.46
30 – 34 6 9.23
35 and above 7 10.77
Total 65 100

Source: Gambaga Health Centre, 202

37
INTRODUCTORY BIOSTATISTICS
 Two and three variables table

Data can also be cross-tabulated to show counts by a second variable.

Table 2 shows the number of OPD attendance by both age and gender of the patient.

A two-variable table with cross-tabulated data is also known as a contingency table.

Table 2: Age and Gender distribution of OPD attendance in March 2014

Age Gender
Male Female Total
15-19 2 8 10
20 -24 12 18 30
25 - 29 10 2 12
30 - 34 3 3 6
35 and above 4 3 7
Total 31 34 65
Source: Gambaga Health Centre, 2020

Charts
A chart is a graphical representation of data in which the data is represented by symbols.
Symbols such as bars in a bar chart, lines in a line graph or slices in a pie chart are used
often.

Charts are often used to ease understanding of large quantities of data and the
relationships between parts of these data. They can usually be read more quickly than the
raw data and are used in a wide variety of fields. Charts are useful for displaying
information about frequency and are statistical snapshot that helps us see patterns, trends,
similarities and differences in the data. There are various types of charts which includes;
bar chart, pie chart, line chart, histogram, flow charts, area charts, etc.

38
INTRODUCTORY BIOSTATISTICS
Types of Charts
BAR CHARTS OR BAR GRAPHS
Explanation of the term-bar chart (graph): A bar chart or a bar graph is a graph that uses
vertical or horizontal bars to represent the frequencies of the categories in a data set.
Note

Example 1: A sample of 300 college students was asked to indicate their favorite soft
drink. The survey results are shown in Table 1-6.Display the information using a bar
chart.
Table: Frequency Distribution for the Example 1

Solution: Observe that these are categorical or qualitative data. The vertical bar chart for
this information is shown in Fig. 2.The number at the top of each category represents the
number of values (frequency) for that specific group (soft drink).

39
INTRODUCTORY BIOSTATISTICS
A horizontal bar chart for the same soft drink information is shown in Fig. 2
Fig. 2: Vertical bar chart for Example 1

Fig. 3: Horizontal bar chart for Example 1

40
INTRODUCTORY BIOSTATISTICS
HISTOGRAMS
Explanation of the term-histogram: A histogram is a graphical display of a frequency
or a relative frequency distribution that uses classes and vertical bars (rectangles) of
various heights to represent the frequencies.
It is the graphically representation of frequency distributions. A histogram consists of a
set rectangles having a base on a horizontal axis (x-axis) with its centre at the class mid-
values or class mid mid-point and lengths equal to the class sizes. The vertical axis
corresponds to the frequency of the class.
Class boundaries can also be used as the horizontal axis instead of the class midpoints as
shown in the diagram below. The vertical axis is labelled ‘frequency’ and the horizontal
axis is labelled with a description of the data.
Note

Example: Display the data in Example 1-3with a histogram using seven classes.

Solution: A histogram with seven classes for the data is shown in Fig. 4
The histogram shows the frequency count for each class, with each class having a width
of 10.

41
INTRODUCTORY BIOSTATISTICS
Fig. 1-8: Histogram for data in Example 1-3

FREQUENCY POLYGONS
Explanation of the term-frequency polygon: A frequency polygon is a graph that
displays the data using lines to connect points plotted for the frequencies. The frequencies
represent the heights of the vertical bars in the histograms.
Note: A frequency polygon provides an estimate of the shape of the distribution of the
population.
Example 1-7: Display a frequency polygon for the data in Example 1-3.
Solution: The display given in Fig. 1-9 shows the frequency polygon superimposed on
the histogram for Example 1-6.

42
INTRODUCTORY BIOSTATISTICS
Pie Charts or pie graphs

A pie graph or pie chart is a circle that is divided into slices according to the percentage
of the data values in each category.
A pie chart allows us to observe the proportions of sectors relative to the entire data set. It
can be used to display either qualitative or quantitative data. However, categorical or
qualitative data readily lend themselves to this type of graphical display because of the
inherent categories in the data set.
Steps in constructing a pie chart

1. Find the total of the category values


2. Calculate the angle of each category
Category value 0
Angle of category = X 360
∑ of category values
3. Draw the sectors on a circle diagram using the angles. To draw the diagram,
43
INTRODUCTORY BIOSTATISTICS
i. Draw a circle using a pair of compasses
ii. Draw a radius as a starting line
iii. Use a protractor to draw the angle of each sector. Make sure the protractor
is on the Centre of the circle. Measure the angle very carefully.
iv. Label each sector carefully and the size of the angle also.

Example

The table below gives the distribution of women who visited the nutrition unit of a
hospital in some selected five communities in Ghana in the Northern and North East
Region in 2020.

Community Number of Women


Nalerigu 60
Gambaga 90
Walewale 180
Gushegu 150
Savulugu 240
(i) Illustrate the data on a pie chart
(ii) What percentage of the total number of women attending ANC comes from
Gambaga?

Solution

(i) Total number of women = 60+ 90+180+150+240


= 720
Angle of a sector = 360
60 0
Nalerigu = X 360=30
720
90 0
Gambaga = X 360=45
720
180 0
Walewale = X 360=90
720
150 0
Gushegu = X 360=75
720

44
INTRODUCTORY BIOSTATISTICS
240 0
Savulugu = X 360=120
720

8%

13%
33%
Nalerigu
Gambaga
Walewale
Gushegu
Savulugu
25%

21%

Figure 1 Pie chart on the ANC attendance

(ii) Percentage of women from Gambaga =


Number of women ¿Gambaga ¿ X 100
Total number of women
90
= X 100=12.5 % ≅ 13 %
720

Line Graph
A line graph also known as a line chart is a type of chart used to visualize the value of
something over time. It consists of a horizontal x-axis and a vertical y-axis. A line graph
is best for illustrating a trend over time. More than one line may be plotted in the same
axis as a form of comparison.

In statistics, we commonly use this type of graph to show a long series of data and to
compare several series. It is the method of choice for plotting rates. Line graphs are best
suited for displaying the amount of change or difference in a continuous variable, which
is usually shown on the vertical axis

45
INTRODUCTORY BIOSTATISTICS
100
80
60
Frequency

40
20
0

Month
Figure 2 Trend of discharges by month in Goaso Hospital, 2018
Figure 4 Institutional Maternal Deaths in Cape Coast and UCC hosp, 2000-2020

CC hospital UCC hospital


10
8
6
Deaths

4
2
0

Time (Years)

46
INTRODUCTORY BIOSTATISTICS
ACTIVITY 5 A

Students are expected to do this activity in their personal graph books, and submit to the
tutor for assessment.

1. At Chereponi Hospital in the month of June 2020, the following data was
gathered with regards to the number of children admitted at the nutrition ward and
their towns. Use the table below to construct a bar chart.

Town Number of Admissions


Jawani 2
Gambaga 4
Za-ari 5
Nakpanduri 6
Nalerigu 10
Nagbo 1

2. The table below shows the distribution of health staff in Tamale Teaching
Hospital in 2020.

Category No. of Staff


Registered Nurses 345
Enrolled Nurses 539
Psychiatric Nurses 128
Dieticians 45
Administration 155
Doctors 70
Use the information above to;
(i) Construct a pie chart for the data
(ii) If the total number of dieticians and registered nurses is reduced by 35%
and 15% respectively and the rest remained unchanged, what percentage
of the total number of staff are dieticians and registered nurses?

47
INTRODUCTORY BIOSTATISTICS
FREQUENCY DISTRIBUTIONS
Explanation of the term-frequency distribution: A frequency distribution is an
organization of raw data in tabular form, using classes (or intervals) and frequencies.
The types of frequency distributions that will be considered in this section are categorical,
ungrouped, and grouped frequency distributions.

Explanation of the term-frequency count: The frequency or the frequency count for a
data value is the number of times the value occurs in the data set.

CATEGORICAL OR QUALITATIVE FREQUENCY DISTRIBUTIONS


Explanation of the term-categorical frequency distributions: Categorical frequency
distributions represent data that can be placed in specific categories, such as gender, hair
color, or religious affiliation.

Example: The blood types of 25 blood donors are given below. Summarize the data
using a frequency distribution.

Solution: We will represent the blood types as classes and the number of occurrences for
each blood type as frequencies. The frequency table (distribution) in Table 1-
1summarizes the data.

48
INTRODUCTORY BIOSTATISTICS
QUANTITATIVE FREQUENCY DISTRIBUTIONS-UNGROUPED
Explanation of the term-ungrouped frequency distribution: An ungrouped frequency
distribution simply lists the data values with the corresponding number of times or
frequency count with which each value occurs.

Example 1-2: The following data represent the number of defectives observed each day
over a 25-day period for a manufacturing process. Summarize the information with a
frequency distribution.

Solution: The frequency distribution for the number of defects is shown in Table 1-2.

49
INTRODUCTORY BIOSTATISTICS
Table 1-2: Frequency Table for Example 1-2

Explanation of the term-cumulative frequency: The cumulative frequency for a


specific value in a frequency table is the sum of the frequencies for all values at or below
the given value.

Explanation of the term-cumulative relative frequency: The cumulative relative


frequency for a specific value in a frequency table is the sum of the relative frequencies
for all values at or below the given value.

QUANTITATIVE FREQUENCY DISTRIBUTIONS-GROUPED


Here we will discuss the idea of grouped frequency distributions.

Explanation of the term-grouped frequency distribution: A grouped frequency


distribution is obtained by constructing classes (or intervals) for the data, and then listing
the corresponding number of values (frequency count) in each interval.

50
INTRODUCTORY BIOSTATISTICS
Example: The weights of 30 female students majoring in Physical Education on a college
campus are given below. Summarize the information with a frequency distribution using
seven classes.
143 151 136 127 132 132 126 138 119 104
113 90 126 123 121 133 104 99 112 129
107 139 122 137 112 121 140 134 133 123

Solution: A grouped frequency distribution for the data using seven classes is presented
in Table 1-5. Observe, for instance, that the upper limit value for the first class and the
lower limit value for the second class have the same value, 95. The value of 95 cannot be
included in both classes, so the convention that will be used here is that the upper limit of
each class is not included in the interval of values; only the lower limit value is included
in the interval.
Thus, the value of 95 is included only in the interval of values for the second class.

Table 1-5: Grouped Frequency Distribution for Example 1-3

Note: The class width for this frequency distribution is 10. It is obtained by subtracting
the lower class limit for any class from the lower class limit for the next class. For the
third class, the class limit = 115-105= 10.

51
INTRODUCTORY BIOSTATISTICS
MEASURES OF CENTRAL TENDENCY OR LOCATION
A measure of central tendency is a single value that describes a set of data by identifying
the central position within that data set. At such they are also called central location.

A measure of central location is the single value that best represents a characteristic such
as age or height of a group of persons. They are also classed as summary statistics. The
term central tendency dates back from the late 1920s.

Basically there are three main central tendencies, namely;

 The mean
 The mode
 The median

Mean
The mean is commonly called “average.” In formulas, the mean is usually represented as
“x-bar” represents the mean of a sample¿).

Finding the mean for a set of numbers

The mean of a set of numbers x 1 , x 2 , x 3 , … xn is calculated as ;

x 1+ x 2+ x 3+…+ xn ∑ x
Mean x = =
n n

Examples

1. In an outbreak of Cholera, 6 persons became ill with clinical symptoms after


exposure. The incubation periods for the affected persons (xi) were 29, 31, 24, 29,
30, and 25 days. Calculate the mean incubation period.
Solution

Mean x=
∑x
n
29+31+24+29+ 30+25 168
= = =28 days
6 6
2. The ages of five children seen at consulting room one at BMC are 4, 10, 24, x and
16. If their average age is 13 years, find the value of x .

52
INTRODUCTORY BIOSTATISTICS
Solution

x=
∑x
n
4+10+ 24+ x +16
≫ =13
5
54 + x
=≫ =13
5
=≫ 54+ x =65
=≫ x=11 years

TRIAL

1. Calculate the mean parity of the following sets of parity data:


A: 0, 3, 0, 7, 2, 1, 0, 1, 5, 2, 4, 2, 8, 1, 3, 0, 1, 2, 1
B: 5,5,6,7,8,3,4 5,5,6,7,8,3,4
C: 1, 0, 1, 5, 2, 4, 2, 8, 1, 3, 0, 1, 2, 1
D: 5, 5, 6, 5,5,6,7,8,3,4 5,5,6,7,8,3,4
2. The ages (in years) of five doctors in TTH are 48, y, 52, 50 and (2y-5). If their
average age is 47 years, how old is the oldest doctor.
3. The average age of 20 senior nurses at BMC is 35 years and the average age of
another group of 30 senior nurses at TTH is 40 years. Find the average age of
both BMC and TTH senior nurses combined.

Median
Median means middle, and the median is the middle of a set of data that has been put into
rank order.

Specifically, it is the value that divides a set of data into two halves, with one half of the
observations being larger than the median value, and the other half smaller. It can also be
the arithmetic mean of the two middle values. The symbol for the median is MD.

Finding the median of a set of numbers

To find the median of a set of numbers ‘n’; x1, x2,x3.......…xn

 Arrange the numbers in order of magnitude

53
INTRODUCTORY BIOSTATISTICS
1
 If n is odd then the median is the middle item i.e. ( n+1 ) thitem or observation.
2

 If n is even then the median is the arithmetic mean of the two middle items i.e.
1 1
( n ) th and (½ n +1) th items or ( n ) th item and the next item after it.
2 2

Examples with odd number of frequencies

Find the median of the following set of numbers

I. 2,1,2,3,5

II. 3,7,2,9,8,11,12

Solution

1. Rearranging the numbers in ascending order we have 1, 2, 2, 3, 5


=≫ n=5(which is odd )
1
Therefore the median = ( n+1 ) thitem
2

1
= ( 5+1 ) th item
2

= 3rd item which is 2

2. Rearranging the numbers in ascending order we have 2,3,7,8,9,11,12


=≫ n=7(whichis odd )
1
Therefore the median = ( n+1 ) th item
2

1
= ( 7+1 ) thitem
2

= 4th item which is 8

Examples with even number of frequencies

54
INTRODUCTORY BIOSTATISTICS
I. 15, 7, 13, 9, 10, 11

II. 9,3,1,5,8,5,7,4,8,4,7,6

Solution

1. Rearranging the numbers in ascending order we have 7,9,10,11,13,15


=≫ n=6(which iseven )
1
Therefore the median = ( n ) th item∧the next
2

1
= ( 6 ) th item∧the next
2

= 3rd and 4th item

10+11
= =10.5
2

2. Rearranging the numbers in ascending order we have 1,3,4,4,5,5,6,7,7,8,8,9


=≫ n=12(which is even)
1
Therefore the median = ( n ) th item∧the next
2

1
= ( 12 ) th item∧the next
2

= 6th and 7th item which corresponds to 5 and 6

5+6
= =5.5
2

TRIALS

55
INTRODUCTORY BIOSTATISTICS
1. The data below gives the number of times 20 pregnant women visited the Public
Health Unit of BMC in February 2020: 5,2,5,3,1,6,2,2,3,4,2,1,2,2,4,3,2,2,2,3.
a) What is the mean number of visits
b) What is the median?
2. The following data shows the ages of males admitted at the Male’s surgical ward
One at Margaret Marquart Catholic hospital in the Volta Region;
41,26,27,64,72,65,85,20,41. Calculate
a) The mean age of the males admitted
b) The median age
c) If new admissions are registered in the ages 32,72,99,44,57,71, what is the
new mean age of the male’s surgical ward one.

Mode
The third measure of average is called the mode. The modal value of a set of numbers is
the number which occurs most frequently. It is the observation that occurs most in a
given data. Mode may or may not exist.

If we find that every value occurs only once, the distribution has no mode. If we find that
two or more values are tied as the most common, the distribution has more than one
mode.

The mode for grouped data is the modal class. The modal class is the class with the
largest frequency.

Examples
Find the mode of the following set of numbers;

a) 3,2,2,4,3,5,4,2,4,2,3
b) 15,14,19,15,14,15,16,14,15,18,14,15,14,19,17
c) 1,2,3,4,5,6,7,8,

Solution

a) The mode is 2, since it is the most frequent occurring number

56
INTRODUCTORY BIOSTATISTICS
b) 14 and 15 have the same maximum frequency, hence 14 and 15 are the modes
c) Here, all the observations occurred once, therefore the data has no mode.

Mode from an Ungrouped Frequency Table

The mode is the value with the largest frequency. The mode can be read directly from
the frequency table. Remember that the mode is the value, not the frequency.

Example

The frequency table below shows the number of children in each of 30 families. What
is the mode of the data?

Number of 0 1 2 3 4 5
children

Number of 3 9 11 4 2 1
families

Solution

The largest frequency is 11

11 families each had 2 children

Therefore, the mode of the data is 2 children

The modal class of a grouped data

For a grouped data, the modal class is the class with the largest frequency

The mode from a histogram

The value of the mode within the modal class of a grouped data can be found from a
histogram of the distribution as shown in the diagram below

Using the highest triangle (modal class), find the intersection of the lines AC and BD.

57
INTRODUCTORY BIOSTATISTICS
Finding the mean, median and mode from frequency distribution tables
 Class Interval: It is a symbol defining a class such as 35 – 39 in the table above.
The numbers 35 and 39 are called class limits with smaller number called the
lower class limit and the larger upper class limit.
 Class boundaries: They separate one class from the other. They are the smallest
and largest values an item in that class can have. The smallest value is the lower
class boundary and the larger value called the upper class boundary. For a
grouped frequency distribution, each class boundary is usually half-way between
the upper limit of one class and the lower limit of the next class. i.e. ½( upper
class limit of one class + lower class limit of the next class)
Alternatively, to find the class boundaries of a grouped data, follow the steps
below:
a) First find the difference between the upper class limit of a particular class
interval and the lower class limit of the next class and divide it by 2.
From the table below, the difference between the upper class limit of a
particular class interval and lower class limit of the next class i.e. (40-39 =
1) and half of the difference is 0.5.
b) Subtract the 0.5 obtained from the lower class limits of each class and add
the 0.5 to the upper class limit of each class to obtain the class boundaries.
I.e. for the class 35- 39, 34.5 is the lower class boundary and 39.5 is the
upper class boundary as shown in the table below.

Marks Frequency Class boundaries

35- 39 3 34.5- 39.5

40- 44 2 39.5- 44.5

45- 49 4 44.5 – 49.5

Note that it is not always that you subtract 0.5 from the lower class limits and add
0.5 to the upper class limits. For the frequency table below, we subtract 40 (i.e.

58
INTRODUCTORY BIOSTATISTICS
half 80) from the lower class limits and add 40 to the upper class limits if we
follow the steps above.

Marks Frequency Class boundaries

980- 1000 10 940- 1040

1080 – 1100 15 1040- 1140

1180 – 1200 24 1140 - 1240

In most cases you subtract 0.5 from the lower class limits and add 0.5 to the upper class
limits.

Class Size or Width

The class size or width is the difference between the lower and upper boundaries of a
class interval. I.e. class size or width = upper class boundary – lower class boundary

The symbol for class size or width is ‘c’ or ‘w’

From the table above, the class size of the class interval 36-39 is (39.5 –34.5) = 5.

For the class interval 45- 49, the class size is (49.5 – 44.5) = 5.

Class mid- Value Or Midpoint

The class mid – value or midpoint is the middle value of a class. It is obtained by adding
the lower and the upper class limits and diving the result by 2 or it is the average of the
lower and the upper class boundaries. i.e.

Class mid – value or midpoint

= ½(lower class limit + upper class limit)

= ½(lower class boundary + upper class boundary)

From the table above, the class mid- value of the class interval 35- 39 is ½(35+39) = 37

59
INTRODUCTORY BIOSTATISTICS
The class mid- value (denoted by x) is often used in calculating the mean and standard
deviation of a grouped frequency distribution.

EXAMPLES

1. The frequency table below gives the exact age distribution of women that visited
BMC for nutritional attendance to their children suffering from severe wasting in
June 2020.

Age (yrs) 17 18 19 20 21
No of girls 4 10 8 5 2
a) Calculate the mean age of the women
b) Find the median age
c) What is the modal age of the women?

Solution

No of girls Mean = ∑ fx = 542


Age (x) (f) fx ∑f 29
17 4 68
18 10 180 = 18.69 ≈ 19 years
19 8 152
20 5 100 Median =
21 2 42 1
( ∑ f + 1 ) th item
∑f = 29 ∑fx =542 2

1
= ( 29+1 ) th item
2
= 15th item which corresponds to 19 years

Mode = 18 years (It has the highest


frequency 10)

2. The table gives the number of surgeries performed by 60 surgeons at TTH in


December 2020.

No of
surgerie
s 0 1 2 3 4 5 6 7

60
INTRODUCTORY BIOSTATISTICS
No of
doctors 15 9 6 19 5 3 2 1

Find the;

a) Average number of surgeries performed by the doctors in December 2020.


b) Median number of surgeries
c) Percentage of doctors that performed more than 2 surgeries.

Solution

No of No of
surgeries doctors fx Mean number of surgeries = ∑ fx = 132
(x) (f) ∑f 60
0 15 0 = 2.2 ≈2
1 9 9
2 6 12
Median =
3 19 57
4 5 20
5 3 15 1
( ∑ f ) th item∧thenext
6 2 12 2
7 1 7
∑f =60 ∑fx =132

1
= ( 60 ) thitem∧the next
2
=30th and 31st item which corresponds to 2&3
2+ 3
=≫ =2.5 ≈ 3
2

Number of doctors that performed more than 2 surgeries = 19+5+3+2+1

= 30

30
% of doctors that performed more than 2 surgeries = X 100=50 %
60

61
INTRODUCTORY BIOSTATISTICS
3. The management of BMC wanted to know the ages of patients who died in the
hospital within the first quarter of 2013 so that they can put in place age specific
measures to reverse the trend in 2014. Records on 30 deaths were therefore
reviewed and the data obtained were as follows;

1 2 4 25 15 10 12 2 4 2
5 7 6 8 12 14 14 17 19 1
2 2 21 23 2 4 5 1 3 11

a) Group the data using a frequency table beginning with class interval 1-5, 6-10
and ending with class interval 21-25.
b) Determine the mean age of the distribution
c) What is the median age group?
d) What is the median age?
e) What is the modal age group?
f) What is the modal age?

Solution
a)

Age Midpoint Frequenc Cumulative


fx
group (x) y (f) frequency (F)

1-5 3 15 45 15
6-10 8 4 32 19
11-15 13 6 78 25
16-20 18 2 36 27
21-25 23 3 69 30
∑f x
Total ∑f =30 =260

∑ fx 260
b) Mean = = = 8.7 ≈ 9 years
∑f 30
1
c) Median age group = ( ∑ f ) th item∧thenext
2

62
INTRODUCTORY BIOSTATISTICS
1
= ( 30 ) th item∧the next
2
=15th and 16th item which falls in 1-5 and 6-10 age groups
3+ 8
=≫ =5.5 therefore the median is within 6-10 age group
2

[ ]
n
−F
d) Using the formula; Median = 2
L1 + Xw
fm

Where L1 = Lower class boundary of the lower class limits of the median class

n = total frequency

F = Cumulative frequency of the class before the median class

fm = frequency of the median class

w = class width

L1 = 5.5, n = 30, F= 15, fm = 4 and w = 5

[ ]
30
−15
Median = 2
5.5+ X5
4

= 5.5+ ( 0 ) X 5=5.5

e) Modal age group = 1- 5

f) Modal age = L1 +
∆1
[
∆1 +∆ 2
Xw
]
Where L1 = Lower class boundary of the lower class limits of the modal class

∆ 1= difference between the frequency of the modal class and the class
before the modal class

∆ 2 = difference between the frequency of the modal class and the class
after the modal class

w = class width

L1 = 0.5, ∆ 1 = (15-0 = 15), ∆ 2= (15-4 =11), w = 5

63
INTRODUCTORY BIOSTATISTICS
Mode = 0.5+ [ 15
15+11
X 5s
]
= 0.5+ ( 0.58 ) X 5=3.4

ACTIVITY 5 B

Calculate and write your answers in the spaces provided.

1. The table below gives the weights of some neonates admitted at the pediatrics’
ward at the Volta Regional Hospital Ho.
Weight (kg) 2 3 4 5 6 7 8 9
Number of
children 14 21 42 83 118 12 7 3

a) Calculate the mean weight of the distribution, correct to the nearest


kilogram. ………………………………….
b) Calculate the median weight of the children. ……………………………….
c) Find the modal weight ……………………………………………………
2. The following are the ages of patients scheduled to undergo hernia repairs surgery
at BMC between August and October 2019

65 84 91 58 43
57 33 53 29 40

64
INTRODUCTORY BIOSTATISTICS
37 14 18 92 13
22 41 27 51 63
73 33 76 80 86
72 19 51 67 27
61 39 23 22 45
19 35 39 72 47

a) Using the class interval 11-20, 21-30 … 91-100 construct a cumulative frequency
table for the data.
b) Calculate the mean age of the patients. ……………………………………….
c) i) Find the median age group ………………………………………………..
ii) Calculate the median age …………………………………………………
d) i) Find the modal age group …………………………………………………
ii) Find the modal age …………………………………………………….

CUMULATIVE FREQUENCY TABLES

A cumulative frequency table can be made from a frequency table. The cumulative
frequency for any class is the sum of the frequencies of that class and all the lower
classes.

Cumulative frequency tables are usually written in vertical column. The last cumulative
frequency in the table should be equal to the total frequency.

Cumulative Frequency Diagram


A cumulative frequency diagram can be drawn from a cumulative frequency table. On
the graph, the upper class boundaries are always along the horizontal axis and the
cumulative frequencies along the vertical axis. Joining all the consecutive points by a
straight line gives cumulative frequency polygon or joining all the consecutive points by
the best shape curve is called a cumulative frequency curve (or OGIVE)

65
INTRODUCTORY BIOSTATISTICS
See the figures below. Accurate values are obtained from the cumulative frequency
curve than from the cumulative frequency polygon. But in an examination draw the
diagram stated in the question.

A cumulative frequency diagram can be used to estimate values from the data. Always
draw lines on your graph to show how you obtained your answers.

66
INTRODUCTORY BIOSTATISTICS
MEASURES OF DISPERSION AND VARIABILITY

In statistics, dispersion (also called variability, scatter, or spread) denotes how stretched
or squeezed a distribution (theoretical or that underlying a statistical sample) is. Common
examples of measures of statistical dispersion are the variance, standard
deviation and interquartile range.

Dispersion is contrasted with location or central tendency, and together they are the most
used properties of distributions. Some common measures of dispersions are range,
quartiles, percentiles, variance, Mean deviation and standard deviations.

Range

The simplest of our methods for measuring dispersion is range. Range is the difference
between the largest value and the smallest value in the data set. While being simple to
compute, the range is often unreliable as a measure of dispersion since it is based on only
two values in the set.

Example

The following data below are the mark scored by ten students in a mid-semester exams in
College of Nursing and Midwifery, Nalerigu; 12,11,12,18,7,14,10,17,15 and 13. What is
the range of the data set?

Solution

Range = 18 – 7= 11

Therefore the range for the data is 11

The inter – quartile range and semi- interquartile range

The inter- quartile range and semi- interquartile range are slightly better measures of
dispersion than the range. They are not affected by extreme values because they are based
on the ‘middle- half’ of the data, i.e. between the upper and lower quartiles.

67
INTRODUCTORY BIOSTATISTICS
The inter- quartile range = upper quartile – lower quartile. i.e. Q3 – Q1

The semi- interquartile range is half the lower quartile range. i.e. ½(Q3- Q1)

Example:

A distribution has a lower quartile of 147and upper quartile of 166. Calculate the inter –
quartile range and hence the semi- interquartile range.

Solution

The inter- quartile range

= upper quartile – lower quartile

= 166 – 147 = 19

The semi- interquartile range = 19/2 =9.5

Exercise:

The heights of 15 boys, to the nearest cm, are given below.

156,149,129,140,143,135,137,145,146,158,147,136,143,143,138.

For these data, find;


a) the quartiles
b) the inter- quartile range
c) The semi- interquartile range.

68
INTRODUCTORY BIOSTATISTICS
Quartiles

In descriptive statistics, the quartiles of a ranked set of data values are the three points
that divide the data set into four equal groups, each group comprising a quarter of the
data. A quartile is a type of quantile.

The quartiles of a grouped data divide the set of data into four (4) groups, separated by

Q1, Q2, and Q3. The QUARTILES of a set of numbers are found in the same way as the

median. Also the same method is applied when finding the quartiles from a frequency

distribution table.

For e.g.: To find the quartiles of a set of “n” numbers: x1, x2, x3…xn

 Arrange the numbers in order of magnitude

 The 1st quartile , Q1 is the ¼(n+1)th item or observation

 The 3rd quartile ,Q3 is ¾(n+1) th item or observation

Note that when n is large, Q1 is the ½(n) th item or observation, Q3 is the

¾(n) th item or observation

The quartiles are usually found from the cumulative frequency curve.

The First Quartile (Q1) also known as the lower quartile is defined as the middle number
between the smallest number and the median of the data set.

1
Q1 ¿ ( n+1 ) thitem
4

The Second Quartile (Q2) which is also the median splits the data into two equal halves.
The Third Quartile (Q3) also known as the upper quartile is the middle value between the
median and the highest value of the data set.
3
Q3 ¿ ( n+1 ) th item
4
69
INTRODUCTORY BIOSTATISTICS
The interquartile range (IQR) is the difference between the upper and lower quartile. IQR
= Q3 – Q1
Alternatively, the quartiles can also be found by following the procedure.

I.e. finding data values corresponding to Q1, Q2, and Q3:


 Step 1: Arrange that data in order from lowest to highest
 Step 2: Find the median of the data values. This is the value for Q2
 Step 3: Find the median of the data values that fall below Q2.this is the value for
Q1.
 Step 4: Find the median of the data values that fall above Q2. This is the value for
Q3.

Example

The following are the ages of patients seen in a particular room at BMC for a particular day;
42,43,36,39,41,40,49,47,7,6,15. Find the;
a) Lower quartile
b) Upper quartile
c) Interquartile range

Solution

Rearranging the data in ascending order; 6,7,15,36,39,40,41,42,43,47,49

n = 11
1
a) Q1 ¿ ( n+1 ) th item
4
1
= ( 11+1 ) thitem
4
= 3rd item which correspondents to 15
3
b) Q3 ¿ ( n+1 ) thitem
4
3
= ( 11+1 ) th item
4
= 9th item which correspondents to 43
c) IQR = Q3 – Q1

70
INTRODUCTORY BIOSTATISTICS
= 43 – 15 = 28

TRIAL

The following set of data are the ages of students who failed Introductory Statistics quiz
one; 18, 20, 23,20,23,27,24,23,29. Find the;
a) Mean age of the students
b) Median age of the students
c) Lower quartile of the data
d) Upper quartile of the data
e) Interquartile range

Percentiles

In estimating percentiles, you first start by calculating the ordinal rank and then taking
the value from the ordered list that corresponds to that rank. But before you calculate the
ordinal rank, rank your data in ascending order of magnitude. The ordinal rank n is
calculated using this formula;

n=
[ P
100
xN
]
Where P = percentile value

N = total number of observations


 If n is not a whole number, round it up to the nearest whole number and count
your values from left to right until you reach the value it corresponds to.
 If n is a whole number, then the index is the average of the nth value and the next

TRIALS

1. The following data are the ages of Malaria death cases recorded at Savulugu
Hospital in April 2015; 30, 15, 50, 40 and 35. Find the 30 th, 40th, 50th and 100th
percentile.
2. The marks scored by ten students in a GCOM quiz are as follows; 15, 10, 3, 6, 20,
8, 7, 8, 16 and 13. Find the 25th, 50th and 75th percentile.
71
INTRODUCTORY BIOSTATISTICS
Finding quartiles and percentiles from a cumulative frequency curve

The cumulative frequency curve can also be used to estimate the quartiles and
percentiles.

The lower quartile (Q1) is the mark corresponding to ¼ ∑f on the cumulative frequency
axis,

The upper quartile (Q3) is the mark corresponding to ¾ ∑f on the cumulative frequency
axis.

n
The nth percentile is the mark corresponding to ¿ ×∑ f on the cumulative frequency
100
axis.

60
For example, the 60th percentile is the mark corresponding to ¿ ×∑ f on the
100
cumulative frequency axis.

72
INTRODUCTORY BIOSTATISTICS
Standard Deviation and Variance

Standard deviation is the measure of spread most commonly used in statistical practice
when the mean is used to calculate central tendency. Thus, it measures spread around the
mean. Standard deviation is also influenced by outliers one value could contribute largely
to the results of the standard deviation. In that sense, the standard deviation is a good
indicator of the presence of outliers. This makes standard deviation a very useful measure
of spread for symmetrical distributions with no outliers.

Generally, the more widely spread the values are, the larger the standard deviation is. It is
represented as σ

The standard deviation is the positive square root of the variance. For instance, if the
variance of a data is 1.69, then the SD is √ 1.69 = 1.3

The variance is the square of the standard deviation. For example, if the SD of a data is
1.2, then the variance is 1.2 2 = 1.44. Thus to find the variance, we use the formulae below
for SD but without the square root sign

Properties of Standard deviation

 It is used to measure only the spread or dispersion of data around the mean.

 Standard deviation is NEVER negative

 It is sensitive to outliers

Finding Standard Deviation for a Set of Numbers


 Finding standard deviation for a set of numbers;

S=

∑(x −x) 2
n−1

73
INTRODUCTORY BIOSTATISTICS
Example

The ages of patients scheduled to undergo cleft surgery at Korle-Bu teaching hospital
are as follows; 35, 56, 19, 6, 43, 45 and 57. Find the standard deviation and variance
of the data.

SOLUTION

Mean =
∑ x = 35+56+ 19+6+ 43+45+57 =37.29 ≈ 37
n 7

x
35
(
X −X ¿
-2
¿¿
4
S=

∑(x −x) 2
n−1

56 19 361
19
6
-18
-31
324
961 √
= 2150
7−1
43 6 36
45 8 64 = 18.93
57 20 400
2150

NB: Students should find the variance by squaring the standard deviation

 Finding Standard deviation from a frequency distribution table ;

s=
√ ∑ ( X− X)2 f
n−1

The variance combines all the values in a data set to produce a measure of spread.

Variance = σ 2

74
INTRODUCTORY BIOSTATISTICS
Example

The data below gives the ages in range of patients seen in a consulting room by a
particular doctor.

Ages 12 - 17 18 - 23 24 - 29 30 - 35
Number of
patient 3 6 4 2

Calculate the;

a) Mean of the data


b) Standard deviation of the data

Solution

Ages Frequency (x) fx ¿¿ ¿¿ ¿¿


3 14.5 43.5 -8 64 192
18 - 23 6 20.5 123 -2 4 24
24 - 29 4 26.5 106 4 16 64
30 - 35 2 32.5 65 10 100 200
Total ∑f = 15 ∑f x = 337.5 480

∑ fx 337.5
a) Mean ¿ = = = 22.5
∑f 15

b) σ =
√ ∑ (x−x) 2
n−1
f

=
√ 480
14
= √ 34.3

= 5.86

75
INTRODUCTORY BIOSTATISTICS
ACTIVITY 5 C

Compute and write your answers in the spaces provided

1. The table gives the number of surgeries performed by 60 surgeons at TTH in


December 2015.

No of
surgerie
s 0 1 2 3 4 5 6 7
No of
doctors 15 9 6 19 5 3 2 1

Calculate the;
a) Standard deviation for the data. ………………………………….
b) Variance. ………………………………………………..

2. The following are the ages of patients scheduled to undergo hernia repairs surgery
at BMC between August and October 2017.

65 84 91 58 43
57 33 53 29 40
37 14 18 92 13
22 41 27 51 63
73 33 76 80 86
72 19 51 67 27
61 39 23 22 45

76
INTRODUCTORY BIOSTATISTICS
19 35 39 72 47

a) Using the class interval 11-20, 21-30 … 91-100 construct a cumulative frequency
table for the data.
b) Cumulative frequency curve
c) Using the ogive, estimate the;
i) Lower quartile ……………………………………….
ii) Median ……………………………………………….
iii) Upper quartile ………………………………………..
iv) 40th percentile …………………………………………
v) 75th percentile …………………………………………

CHAPTER SIX

INFERENTIAL STATISTICS
Inferential statistics are used to test hypothesis about a population or sample. Statistical
inference is the process of using data analysis to infer properties of an underlying
distribution of probability. Inferential statistics analysis infers properties of a population,
for example by testing hypothesis and deriving estimates. It is assumed that the observed
data set is sampled from a larger population.

Given a hypothesis about a population, for which we wish to draw inferences, statistical
inference consists of first selecting a statistical model of the process that generates the
data and deducing propositions from the model. The conclusion of a statistical inference
is a statistical proposition. Some common forms of statistical proposition are the Point
and Interval estimation.

Point estimate is a single value estimate of a parameter. For example a sample mean
is a point estimate of a population mean. An interval estimate on the other hand gives
you a range of values where the parameter is expected to lie or fall in. A typical
example of the interval estimate is a confidence interval.

Some common examples of inferential statistical tests are;

77
INTRODUCTORY BIOSTATISTICS
 Normal distribution
 T-test
 Chi square test
 Correlation and Regression

Normal Distribution
The normal distribution is the most important probability distribution in statistics because
it fits many natural phenomena. For example heights, blood pressures, IQ scores follow
the normal distribution. It is also known as the Gaussian distribution and the bell curve.

The normal distribution is a probability function that describes how the values of a
variable are distributed. It is a symmetric distribution where most of the observations
cluster around the central peak and the probabilities for values further away from the
mean taper off equally in both directions.

It is a very simple concept and even made easier by the fact that distributions are very
easy to visualize. Indeed in most cases a single glimpse of a graphical representation of a
distribution will tell you more about it than several minutes of staring at a bare list of
numbers.

Properties of the Normal Distribution

1. It is bell shaped
2. It is symmetrical about its mean. 50% of the curve to the left side of its
centre and 50% of the curve to the right of its centre.
3. Total area under the curve sums to 1, that is, the area of the distribution on
each side of the mean is 0.5.
4. It is unimodal
5. The normal distribution has a mean of zero and a standard deviation of 1.
6. About 68% of all data falls within ± 1 SD of the mean.
7. About 95% of all data falls within ± 2 SD of the mean.
8. About 99.7% of all data falls within 3 SD of the mean.
9. The standard normal curve is symmetrical.

78
INTRODUCTORY BIOSTATISTICS
Empirical Rule
The empirical rule tells you what percentage of your data falls within a certain number of
standard deviations from the mean;

 68% of the data falls within one standard deviation of the mean.
 95% of the data falls within two standard deviations of the mean.
 99.7% of the data falls within three standard deviations of the mean.

The standard deviation controls the spread of the distribution. A smaller standard
deviation indicates that the data is tightly clustered around the mean; the normal
distribution will be taller. A larger standard deviation indicates that the data is spread out
around the mean; the distribution will be flatter and wider. For example, if the mean
weight of a population is 70kg with a standard deviation of 3.4kg then, 68% of the
weights fall between 66.6 and 73.4kg, 1 SD (70 ±3.4). 95% of the weight fall between
63.2 and 76.8kg, 2 SDs. 99.7% of the weights fall within 59.8 and 80.2kg, 3 SDs.

79
INTRODUCTORY BIOSTATISTICS
Skewness

In probability theory and statistics, skewness is a measure of asymmetry of the


probability distribution of a real valued random variable about its mean. The skewness
value can be positive or negative. A positively skewed data is a data whose tail is skewed
to the right side of the of the probability density function whiles a negatively skewed data
is one whose tail is skewed to the left side of the probability density function.

Negative Skewness Positive Skewness

ACTIVITY 6

1. What is inferential statistics? …………………………………………………………


…………………………………………………………………………………………
…………………………………………………………………………………………
2. In your own words explain normal distribution. ……………………………………..
…………………………………………………………………………………………
…………………………………………………………………………………………
3. State two major properties of normal distribution
1. …………………………………………………………………………………..
2. ………………………………………………………………………………….

80
INTRODUCTORY BIOSTATISTICS
3. ………………………………………………………………………………….

CHAPTER SEVEN

HOSPITAL MANAGEMENT STATISTICS


Data are collected on a daily basis at the hospital on both inpatients and outpatients.
These data are used to monitor the number of patients treated daily, weekly, monthly and
even yearly. These statistics give health care decision-makers the information they need
to plan and monitor inpatient and outpatient revenue streams. For these reasons, health
information management (HIM) professionals must be well versed in data collection and
reporting methods. Because most of the data collection and reporting are now automated,
it is important to understand the underlying concepts from which these data are
generated.

Some of the important statistics are:


1. OPD per Capita
Number of outpatient (OPD) visits per person per year
Number of outpatients visits in a year
Total population of catchment area

2. Bed Complement
Total number of hospital beds for admission of cases

3. Patient Days
Number of inpatients x days in Period

4. Available bed days


Bed complement x days in the period

81
INTRODUCTORY BIOSTATISTICS
5. Crude death rate
Total deaths in a year X 1000
Total population in the same year

6. Sex-specific death rate


The proportion of deaths in a particular sex group (male or female)
Deaths in a particular sex in a specified period X 1000
Total population of same sex in the same period

7. Age-specific death rate


The proportion of deaths in a particular age group
Deaths in a specific age group in a year X 1000
Total population of that age group in the same year

8. Cause-specific death rate


The proportion of deaths due to a specific cause in the population
Deaths from a stated cause in a year X 1000
Total population in the same year

9. Inpatient Death rate


The proportion of who die in the hospital
Total inpatient deaths in a specified X 100
Total discharges + deaths in the same period

10. Turn over interval


The average length of time (in days) that elapses between the discharge of one
inpatient and the admission of the next inpatient to the same bed
(Available bed days – Patient days)
(Discharges + Deaths)

11. Turnover per bed/ Bed turnover rate

82
INTRODUCTORY BIOSTATISTICS
Bed Turnover Rate (BTR) is the average number of inpatients admitted per each
hospital bed.
Discharges + Deaths
Bed complement

12. Average length of stay


The average length of stay (ALOS) is a measure of the average duration of inpatient
hospital admissions (mean number of days from admission to discharge).
Patient days
Discharges + Deaths

13. Hospital admission rate


The hospital admission rate is the average number of hospital admissions per 1000
population per year
Total admissions X 1000
Total population of catchment area

14. Percentage bed occupancy rate


% Bed Occupancy Rate (BOR) measures the percentage of beds occupied by clients
in a given period
Patient days x 100
Available bed days

15. Average Daily Occupancy


The average number of inpatients who occupy hospital beds per day
Patient days
Number of days in the period

16. Maternal Mortality Ratio (MMR)


MMR is the ratio of women who die as a result of childbearing in a given year per
100,000 live births in that year.
Number of deaths in women as a result of child bearing x 100,000

83
INTRODUCTORY BIOSTATISTICS
Total live births in the same period

17. Infant Mortality Rate


Proportion of deaths to children under 1 year of age
Considered a good indicator of health status in any given area
Can be broken up into Neonatal Mortality (first 4weeks or 28 days) Rate and Post-
neonatal Mortality Rate
Number of deaths in children under 1 year of age x 1000
Total live births in the same period

18. Case-fatality rate


The proportion of persons with a particular condition (case) who die from that
condition. It is a measure of the severity of the condition.
Number of deaths from a specific condition x 100
Number of cases of that condition

Example
Table 10: Inpatient statistics for OLG hospital, March, 2012
No. of Patient Available
Ward Admissions Discharges Deaths
Beds days bed days
Male Ward 28 117 102 7 921 840

Maternity 17 154 154 1 315 510


Female
37 174 150 6 634 1110
ward
Children
33 187 178 3 854 990
ward
Total 115 632 584 17 2724 3450

Use the data in table 10 to answer the questions that follow


1. Calculate:
a. % death rate
b. Average length of stay
c. Turn-over interval

84
INTRODUCTORY BIOSTATISTICS
d. Turn-over per bed
e. % bed occupancy

Solution

a. % death rate = (Total death X 100)/Discharges+ deaths


= (17/100)/584 + 17
= (1700/601
= 2.8%

b. Average length of stay = (Patient days)/ (Discharges + Deaths)


= (2724)/(584+17)
=2724/601
=4.5
c. Turn-over interval= (available bed days - patient day)/(Discharges + Deaths)
= (3450 – 2724)/ (584 + 17)
= 726/601
= 1.2

d. Turn-over per bed = (Discharges + Deaths)/(Bed Complement)


= (584 +17)/ (115)
=601/115
= 5.2

e. % bed occupancy = (Patient days X 100)/Available bed days


= (2724 X 100)/3450
= 272400/3450
= 78.96%

ACTIVITY 7
Write your answers in the spaces provided.
Table 11: Inpatient statistics for Lawra hospital, 2010-2013

No. of Patient Available


Year Admissions Discharges Deaths
Beds days days

2010 23 112 102 8 789 843

2011 15 136 131 2 416 514

85
INTRODUCTORY BIOSTATISTICS
2012 42 185 167 4 543 1123

2013 25 169 148 7 765 864

Total 105 602 548 21 2513 3344

Use the data in Table 11 to answer the questions that follow


1. Calculate:
a. % death rate. …………………………………
b. Average length of stay. ……………………………………
c. Turn-over interval. …………………………………………….
d. Turn-over per bed …………………………………………….
e. % bed occupancy ……………………………………………

2. In 2010, the male and female annual catchment populations for Presbyterian clinic
were 5,000 and 6,000 respectively. If the clinic recorded 250 and 300 deaths in males and
females respectively, within the same period of the year, Calculate the:
a) Sex-specific death rate (SSDR) for males. ………………………………..
b) Sex-specific death rate (SSDR) for females ………………………………..
c) Crude death rate (CDR) of the population ………………………………..

3.
Fig. 12: Basic statistics for the 1st 10 days in Our Lady of Grace Hospital- June, 2011
No. of patients in Total beds in
Day Discharges Deaths
ward ward
1 5 7 2 0
2 3 7 2 1
3 4 7 1 1
4 6 7 2 0
5 7 7 3 0
6 6 7 1 0
7 5 7 1 1
8 4 7 2 1
9 3 7 2 0
10 2 7 1 0

86
INTRODUCTORY BIOSTATISTICS
Use the data in the table to calculate the following for the 10 day period.
1. Bed complement. ………………………………..
2. Patient days ………………………………..
3. Available bed days ………………………………..
4. Average length of stay………………………………..
5. Turn-over interval………………………………..
6. % bed occupancy rate ………………………………..
7. Inpatient death rate ………………………………..

4
In 2001, a total of 15,555 CSM deaths occurred among males and 4753 CSM deaths
occurred among females. The estimated 2001 population for males and females were
139,813,000 and 144,984,000 respectively.
Q1. Calculate the CSM-related death rates for males and females
………………………………..
Q2. What type(s) of mortality rates did you calculate in Q1?
………………………………..
Q3. Calculate the ratio of CSM –mortality rates for males compared to females
………………………………..
Q4. Interpret the rate you calculated in Q3 as if you were presenting information to a
policymaker. ………………………………..

5
Between 2000 and 2009, a total of 143,497 cases of diphtheria were reported. During the
same decade, 11,228 deaths were attributed to diphtheria. Calculate the death-to-case
ratio. ………………………………..

6
In an epidemic of cholera traced to green onions from a restaurant, 555 cases were
identified. Three (3) of the patients died as a result of their infections. Calculate the case-
fatality rate. ………………………………..

87
INTRODUCTORY BIOSTATISTICS
REFERENCES
www.mathsisfun.com/data/frequency-grouped-mean-median-mode

Dodge, Y. (2006). The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613

Romjin, Jan-Willem, (2014). Philosophy of statistics. Stanford Encyclopedia of


philosophy.

www.mathsteacher.com.au/year9/ch17_statistics/06_quartiles/quartiles.htm

www.wyzant.com/resources/lesons/math/statistics_and_probability/averages/

88
INTRODUCTORY BIOSTATISTICS

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy