0% found this document useful (0 votes)
85 views118 pages

SPH 2 Lecture - 1 Introduction and Data

Poor, Fair, Good, Excellent

Uploaded by

Lij Fire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views118 pages

SPH 2 Lecture - 1 Introduction and Data

Poor, Fair, Good, Excellent

Uploaded by

Lij Fire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

Biostatistics

1
Introduction

• What is statistics?
• Statistics: A field of study concerned with:
– collection, organization, analysis, summarization and
interpretation of numerical data, &
– the drawing of inferences about a body of data when only a
small part of the data is observed.

• Statistics helps us use numbers to communicate ideas

2
 Biostatistics: The application of statistical methods to
the fields of biological and medical sciences.

 Concerned with interpretation of biological data &


the communication of information derived from these
data

 Has central role in medical investigations

3
• The numbers must be presented in such a way that
valid interpretations are possible

• Statistics are everywhere – just look at any


newspaper or the current medical and public health
literature.

4
Importance of Biostatistics

• Provide methods of organizing health related information

• Assessment of health status of a community,individual….

• Health program evaluation (immunization, maternal and child health

programs, nutritional intervention)

• Resource allocation(3M+IT)

• Magnitude of association

• Strong vs weak association between exposure and outcome

5
Importance of Biostatistics
• Assessing risk factors for the occurrence of disease
– Cause & effect relationship

• Evaluation of a new vaccine or drug(hypothesis testing)


– What can be concluded if the proportion of people free from the
disease is greater among the vaccinated than the unvaccinated?
– How effective is the vaccine (drug)?
– Is the effect due to chance or some bias?

• Drawing of inferences(inferential statistics)


– Information from sample to population

6
What does biostatistics cover?
Research Planning

Design The best way to


Biostatistical learn about
thinking biostatistics is to
Execution (Data collection)
contribute in follow the flow of a
every step in a research from
Data Processing
research inception to the
final publication
Data Analysis

Presentation

Interpretation
Publication 7
Research Design

• We can not study all subjects (all pregnant women, or


all people) living in a given geographical area

– Sampling technique

– Inclusion/exclusion criteria

– Sample size calculation

– Study design

– Method of data collection

– Etc
8
Analysis

• Analysis part is the major part of learning about


biostatistics
– There are dozens of different methods of analysis,
which makes difficult the choice of the correct method
for a particular case
– It is necessary to consider the philosophy that underlies
all methods of analysis:
• Use data from a sample to draw inference about a wider
population

9
Interpretation

• Interpretation of results of statistical analysis is


simpler when the study has a clearer aim

• If the study has been well designed and correctly


analyzed the interpretation of results can be fairly
simple

10
Types of Statistics

Descriptive statistics:

• Ways of organizing and summarizing data


• Helps to identify the general features and trends in a set of
data and extracting useful information

• Also very important in conveying the final results of a


study

• Example: tables, graphs, numerical summary measures

11
Types of Statistics……
Inferential statistics:
• Methods used for drawing conclusions about a population
based on the information obtained from a sample of
observations drawn from that population
» Principles of probability
» Estimation
» confidence interval
» comparison of two or more means or proportions
» hypothesis testing, etc.

12
DATA

13
Objectives
At the end of this session, the students will able to:

o Define and identify the types of data

o Define and identify the type of variables

o Understand the data collection techniques in


quantitative and qualitative methods

14
• Data are numbers which can be measurements
or can be obtained by counting
• The raw material for statistics
• Can be obtained from:
– Routinely kept records, literature
– Surveys
– Counting
– Experiments
– Reports
– Observation
– Etc 15
Types of Data

1. Primary data: collected from the items or individual


respondents directly by the researcher for the purpose
of a study.

2. Secondary data: which had been collected by certain


people or organization, & statistically treated and the
information contained in it is used for other purpose
by other people

16
Variable
Variable: A characteristic which takes different
values in different persons, places, or things.
• Any aspect of an individual or object that is
measured (e.g., BP) or recorded (e.g., age, sex)
and takes any value.
• There may be one variable in a study or many.
• E.g., A study of treatment outcome of TB

17
• Variables can be broadly classified into:
– Categorical (or Qualitative) or

– Quantitative (or numerical variables).

18
• Categorical variable: A variable or characteristic
which can not be measured in quantitative form
but can only be sorted by name or categories

• Not able to be measured as we measure height or


weight

• The notion of magnitude is absent or implicit.

19
• Quantitative variable: A variable that can be
measured (or counted) and expressed
numerically.

• Height, wt, # of children, etc.

• Has the notion of magnitude.

20
Quantitative variable is divided into two:
1. Discrete: It can only have a limited number of discrete
values (usually whole numbers).
– E.g., the number of episodes of diarrhoea a child has had in a
year. You can’t have 12.5 episodes of diarrhoea
• Characterized by gaps or interruptions in the values
(integers).
• Both the order and magnitude of the values matter.
• The values aren’t just labels, but are actual measurable
quantities.

21
2. Continuous variable: It can have an infinite
number of possible values in any given interval.
• Both the magnitude and the order of the values
matter
• Does not possess the gaps or interruptions
• Weight is continuous since it can take on any
number of values (e.g., 34.575 Kg).

22
SUMMARY

Variable

Types
of Qualitative Quantitative
variables or categorical measurement

Nominal Ordinal Discrete Continuous


(not ordered) (ordered) (count data) (real-valued)
e.g. ethnic e.g. response to e.g. # of e.g. height
group treatment admissions

Measurement scales
23
Scales of measurement

• All measurements are not the same.


• Measuring weight = eg. 40kg
• Measuring the status of a patient on scale =
“improved”, “stable”, “not improved”.
• There are four types of scales of measurement.

24
1. Nominal scale:
• The simplest type of data, in which the values fall
into unordered categories or classes
• Consists of “naming” observations or classifying
them into various mutually exclusive and
collectively exhaustive categories
• Uses names, labels, or symbols to assign each
measurement.
– Examples: Blood type, sex, race, marital status, etc.

25
Example of nominal Scale:

Race/Ethnicity:
1. Black • The numbers have NO meaning
2. White • They are labels only
3. Latino
4. Other

26
• If nominal data can take on only two possible
values, they are called dichotomous or binary.
• So sex is not just nominal, it is dichotomous
(male or female).
• Yes/no questions
– E.g., cured from TB at 6 months of Rx

27
2. Ordinal scale:
• Assigns each measurement to one of a limited
number of categories that are ranked in terms of
order.
• Although non-numerical, can be considered to have
a natural ordering
– Examples: Patient status, cancer stages,
social class, etc.

28
Example of ordinal scale:

• Pain level: • The numbers have


1. None LIMITED meaning
2. Mild 4>3>2>1 is all we know
3. Moderate apart from their utility as
labels
4. Severe

29
3. Interval scale:

- Measured on a continuum and differences between any two


numbers on a scale are of known size.
Example: Temp. in oF on 4 consecutive days
Days: A B C D
Temp. oF: 50 55 60 65

- It has no true zero point. “0” is arbitrarily chosen and doesn’t


reflect the absence of temp.

30
4. Ratio scale:

- Measurement begins at a true zero point and the scale


has equal space.

- Examples: Height, age, weight, BP, etc.


• Note on meaningfulness of “ratio”-
– Someone who weighs 80 kg is two times as heavy as
someone else who weighs 40 kg. This is true even if
weight had been measured in other measurements.
31
32
Interval
Ordinal
Nominal

Ratio
Degree of precision in measuring
Data collection

• data Collection is an important aspect of any


type of research study.

• Inaccurate data collection can impact the


results of a study and ultimately lead to invalid
results

33
The Quantitative data collection methods-

- Rely on random sampling and structured data


collection instruments that fit diverse
experiences into predetermined response
categories.

- They produce results that are easy to


summarize, compare, and generalize.

34
• Quantitative research is concerned with testing
hypotheses derived from theory and/or being able
to estimate the size of a phenomenon of interest.

• If the intent is to generalize from the research


participants to a larger population, the researcher
will employ probability sampling to select
participants.

35
 Typical quantitative data gathering strategies include:
1. Experiments/clinical trials.

2. Observing and recording well-defined events (e.g., counting


the number of patients waiting in emergency at specified times
of the day).

3. Obtaining relevant data from management information


systems.

4. Administering surveys with closed-ended questions (e.g., face-


to face and telephone interviews, questionnaires etc).

36
1.Interviews

a researcher asks a standard set of questions

A. Face -to -face interviews- have a distinct advantage of enabling


the researcher to establish rapport with potential participants and
there for gain their cooperation. interviews yield highest
response rates in survey research.

They also allow the researcher to clarify ambiguous answers and


when appropriate, seek follow-up information.

Disadvantages include - when large samples are involved time


consuming and expensive.
37
B. Computer Assisted Personal Interviewing (CAPI):

is a form of personal interviewing, but instead of


completing a questionnaire, the interviewer brings along
a laptop or hand-held computer to enter the information
directly into the database.

• can be expensive to set up and requires that interviewers


have computer and typing skills.

38
C. Questionnaires

• Paper-pencil-questionnaires can be sent to a large


number of people and saves the researcher time and
money.

• Majority of the people who receive questionnaires


don't return them and those who do might not be
representative of the originally selected sample

39
D.Telephone interviews - are less time consuming
and less expensive and the researcher has ready
access to anyone on the planet who has a
telephone.

• Disadvantages are that the response rate is not as


high as the face-to- face interview

40
E.Web based questionnaires :
A new and inevitably growing methodology is the use of
Internet based research.
This would mean receiving an e-mail on which you would
click on an address that would take you to a secure
web-site to fill in a questionnaire.
This type of research is often quicker and less detailed.
Some disadvantages of this method include the exclusion
of people who do not have a computer or are unable to
access a computer.
41
• Questionnaires often make use of Checklist and rating scales.
These devices help simplify and quantify people's behaviors
and attitudes.
• A checklist- is a list of behaviors,characteristics,or other
entities that the researcher is looking for. Either the researcher
or survey participant simply checks whether each item on the
list is observed, present or true or vice versa.
• A rating scale- is more useful when a behavior needs to be
evaluated on a continuum.
• They are also known as Likert scales.

42
Qualitative Data Collection Method:

 Data collection methods include unstructured


interviews, direct observation, case studies,
field notes, diaries, or historical documents; that
can be observed, written, taped, or filmed.

 Participants often are interviewed and observed


in their natural settings.
43
Common Qualitative Methods

 Observations
 In-depth interviews
 Focus groups discussion.

44
1.observation
 Observation is a technique which involves
systematically selection, watching, and
recording behaviors and characteristics of
living things, objects, or phenomena.
 Observational techniques are methods by
which an individual or individuals gather first
hand data on behaviors being studied.

45
Observational cont…

 Through participation and observation, data


are collected in an unstructured manner as field
notes written in journal.

 Observations are usually complementary to


other data collection techniques

46
Observational cont…

• They can give additional accurate information


on behavior than interview and questionnaire.

• They provide evaluators with an opportunity to


collect data on a wide range of behaviors, to
capture a great variety of interactions, and to
openly explore the evaluation topic.
47
Cont…
• It can be undertaken in two different ways: participant
observation and non-participant observation.

• Participant observation the technique involves direct


observation through involvement with subjects in their
natural setting, participating in their lifestyle activities.

• The key element in participant observation is


involvement with subjects in their environment and
development of trusting relationship.
48
Cont…
• Non-participant observation: the observer watches the
situation openly or concealed but doesn’t participate.

 Observational approaches also allow the evaluator to


learn about things the participants or staff may be
unaware of

 Or that they are unwilling or unable to discuss in an


interview or focus group.

49
Advantages:
• Provide direct information about behaviors of
individuals or groups

• Permits the evaluator to enter into and understand


situations.

• Provide good opportunity for identifying


unanticipated outcomes

• Exist in natural, unstructured, and flexible setting


50
Disadvantages
• Expensive and time consuming
• Needs well qualified, highly trained observer
• May affect behavior of participant
• Selective perception of observers may distort
data that is called observer bias
• Investigator has little control over situation
• Ethical issue concerning confidentiality or
privacy may raise

51
2.In-depth Interviews

• An in-depth interview is a dialogue between a


skilled interviewer and an interviewee.

• In-depth interviews are characterized by


extensive probing and open-ended questions.

• Such interviews are best conducted face to face,


although in some situations telephone
interviewing can be successful.
52
In-depth interview cont…
• In-depth interviews also encourage capturing of
respondents’ perceptions in their own words, a
very desirable strategy in qualitative data
collection.
• This allows the evaluator to present the
meaningfulness of the experience from the
respondent’s perspective.
• In-depth interviews are conducted with
individuals or with a small group of individuals.

53
Cont…

• Audio and video recordings may be used to


complement interviews.

• Study subjects might be sensitive to the


presence of audio and video equipment and
researcher should ask for the subject’s
permission.

54
Advantages:
• Permits face to face content respondents

• Provide opportunity to explore topics in depth

• Usually yields rich data, details, new insight

• Allow interviewers to be flexible in


administering interview to particular
individuals or circumstances.
55
Disadvantages:
• Expensive and time consuming
• Need highly qualified and trained interviewer
• Interviewee may distort information through
recall error, selective perception, or desire to
please interviewer.

• Flexibility can result inconsistency across


interview.
56
3.Focus Groups Discussion
 Focus group discussion is a method used to
collect information from a group through
guided discussion of a study topic.
 Focus groups are a gathering of 8 to 12 people
who share some characteristics relevant to the
evaluation.
 Focus groups conducted by experts take place
in a focus group facility that includes
recording apparatus (audio and/or visual) and
an attached room with a one-way mirror for
observation
57
Cont…
 Focus group participants are typically asked
to reflect on the questions asked by the
moderator.
 Participants are permitted to hear each
other’s responses and to make additional
comments beyond their own original
responses as they hear what other people
have to say.
 It is not necessary for the group to reach
any kind of consensus, nor to disagree.

58
Cont…
 As a rule, the focus group session should
not last longer than 1 1/2 to 2 hours.
 The participants are usually a relatively
homogeneous group of people.
 So that respondents’ social class, level of
expertise, age, cultural background, and sex
should always be considered.
 Although focus groups and in-depth
interviews share many characteristics, they
should not be used interchangeably.
59
Methods of Data Organization and
Presentation

 Tables

 Graphs

 Numerical summaries

60
Frequency Distributions (Tables)
1.Ordered array: A simple arrangement of individual observations in the
order of magnitude.
• Very difficult with large sample size

12 19 27 36 42 59
15 22 31 39 43 61
17 23 31 41 44 65
18 26 34 41 54 67

61
2.Frequency distribution: A table which has a
list of each of the possible values that the data
can assume along with the number of times
each value occurs.

62
Simple Frequency Distribution
• Primary and secondary cases of syphilis morbidity
by age, 1989
Age group Cases
(years) Number Percent
0-14 230 0.5
15-19 4378 10.0
20-24 10405 23.6
25-29 9610 21.8
30-34 8648 19.6
35-44 6901 15.7
45-54 2631 6.0
>44 1278 2.9
Total 44081 100
63
Tables can also be used to present more than
three or more variables.
Variable Frequency (n) Percent
Sex
Male
Female
Age (yrs)
15-19
20-24
25-29
Religion
Christian
Muslim
Occupation
Student
Farmer
Merchant 64
Guidelines for constructing tables
• Keep them simple,
• Limit the number of variables to three or less,
• All tables should be self-explanatory,
• Include clear title telling what, when and where,
• Clearly label the rows and columns,
• State clearly the unit of measurement used,
• Explain codes and abbreviations in the foot-note,
• Show totals,
• If data is not original, indicate the source in foot-note.
65
Diagrammatic Representation

• Pictorial representations of numerical data

66
Importance of diagrammatic representation:

1. Diagrams have greater attraction than


mere figures.
2. They give quick overall impression of the
data.
3. They have great memorizing value than
mere figures.
4. They facilitate comparison
5. Used to understand patterns and trends
67
• Well designed graphs can be powerful means of
communicating a great deal of information

• When graphs are poorly designed, they not only


ineffectively convey message, but they are often
misleading.

68
Specific types of graphs include:
• Bar graph Nominal, ordinal
• Pie chart data

• Histogram
• Stem-and-leaf plot
• Box plot Quantitative
• Scatter plot data
• Line graph
• Others

69
1. Bar charts (or graphs)
• Categories are listed on the horizontal axis (X-
axis)
• Frequencies or relative frequencies are
represented on the Y-axis (ordinate)
• The height of each bar is proportional to the
frequency or relative frequency of observations
in that category

70
Bar chart for the type of ICU for 25 patients

71
Method of constructing bar chart
• All the bars must have equal width
• The bars are not joined together (leave space
between bars)
• The different bars should be separated by
equal distances
• All the bars should rest on the same line
called the base
• Label both axes clearly

72
Example: Construct a bar chart for the following data.

73
74
2. Sub-divided bar chart
• If there are different quantities forming the
sub-divisions of the totals, simple bars may
be sub-divided in the ratio of the various
sub-divisions to exhibit the relationship of
the parts to the whole.
• The order in which the components are
shown in a “bar” is followed in all bars used
in the diagram.
– Example: Stacked and 100% Component bar
charts

75
Example: Plasmodium species distribution for
confirmed malaria cases, Zeway, 2003

76
3. Multiple bar graph
• Bar charts can be used to represent the
relationships among more than two variables.
• The following figure shows the relationship
between children’s reports of breathlessness
and cigarette smoking by themselves and
their parents.

77
We can see from the graph quickly that the prevalence of the symptoms
increases both with the child’s smoking and with that of their parents.

78
4. Pie chart
• Shows the relative frequency for each category by
dividing a circle into sectors, the angles of which
are proportional to the relative frequency.
• Used for a single categorical variable
• Use percentage distributions

79
Steps to construct a pie-chart
• Construct a frequency table

• Change the frequency into percentage (P)

• Change the percentages into degrees, where:


degree = Percentage X 360o

• Draw a circle and divide it accordingly

80
Example: Distribution of deaths for females, in England
and Wales, 1989.

Cause of death No. of death


Circulatory system 100 000
Neoplasm 70 000
Respiratory system 30 000
Injury and poisoning 6 000
Digestive system 10 000
Others 20 000
Total 236 000

81
82
5. Histogram
• Histograms a re f re q u e n c y d i st r i b u t i o n s w i t h
continuous class intervals that have been turned into
graphs.
• To construct a histogram, we draw the interval
boundaries on a horizontal line and the frequencies on
a vertical line.
• Non-overlapping intervals that cover all of the data
values must be used.

83
• Bars are drawn over the intervals in such a way
that the areas of the bars are all proportional in
the same way to their interval frequencies.

• The area of each bar is proportional to the


frequency of observations in the interval

84
Example: Distribution of the age of women at the time of marriage

Age 15-19 20-24 25-29 30-34 35-39 40-44 45-49


group
Number 11 36 28 13 7 3 2

85
Histogram for the ages of 2087 mothers with <5 children,
Adami Tulu, 2003

86
Two problems with histograms

1. They are somewhat difficult to construct


2. The actual values within the respective
groups are lost and difficult to reconstruct

 The other graphic display (stem-and-leaf


plot) overcomes these problems

87
6. Stem-and-Leaf Plot
• A quick way to organize data to give visual impression
similar to a histogram while retaining much more detail
on the data.
• Similar to histogram and serves the same purpose and
reveals the presence or absence of symmetry
• Are most effective with relatively small data sets
• Are not suitable for reports and other communications,
but
• Help researchers to understand the nature of their data

88
Example

• 43, 28, 34, 61, 77, 82, 22, 47, 49, 51, 29, 36,
66, 72, 41
2 2 8 9
3 4 6
4 1 3 7 9
5 1
6 1 6
7 2 7
8 2

89
Steps to construct Stem-and-Leaf Plots

1. Separate each data point into a stem and leaf


components
• Stem = consists of one or more of the initial digits
of the measurement
• Leaf = consists of the rightmost digit
The stem of the number 483, for example, is 48 and the
leaf is 3.
2. Write the smallest stem in the data set in the
upper left-hand corner of the plot

90
Steps to construct Stem-and-Leaf Plots

3. Write the second stem (first stem +1) below the first
stem
4. Continue with the remaining stems until you reach the
largest stem in the data set
5. Draw a vertical bar to the right of the column of stems
6. For each number in the data set, find the appropriate
stem and write the leaf to the right of the vertical bar

91
Example: 3031, 3101, 3265, 3260, 3245, 3200, 3248,
3323, 3314, 3484, 3541, 3649 (BWT in g)

Stem Leaf Number


30 31 1
31 01 1
32 65 60 45 00 48 5
33 23 14 2
34 84 1
35 41 1
36 49 1
92
7. Frequency polygon
• A frequency distribution can be portrayed
graphically in yet another way by means of a
frequency polygon.
• To draw a frequency polygon we connect the mid-
point of the tops of the cells of the histogram by a
straight line.
• The total area under the frequency polygon is equal
to the area under the histogram
• Useful when comparing two or more frequency
distributions by drawing them on the same diagram

93
Frequency polygon for the ages of 2087 mothers with <5 children,
Adami Tulu, 2003

94
It can be also drawn without erecting rectangles by joining the top
midpoints of the intervals representing the frequency of the classes as
follows:

95
8. Ogive Curve (The Cummulative Frequency
Polygon)
• Some times it may be necessary to know the number of
items whose values are more or less than a certain
amount.
• We may, for example, be interested to know the no. of
patients whose weight is <50 Kg or >60 Kg.
• To get this information it is necessary to change the form
of the frequency distribution from a ‘simple’ to a
‘cumulative’ distribution.
• Ogive curve turns a cumulative frequency distribution in
to graphs.
• Are much more common than frequency polygons
96
Cumulative Frequency and Cum. Rel. Freq. of Age
of 25 ICU Patients

Relative Cumulative Cumulative


Age Interval Frequency Frequency frequency Rel. Freq.
(%) (%)
10-19 3 12 3 12
20-29 1 4 4 16
30-39 3 12 7 28
40-49 0 0 7 28
50-59 6 24 13 52
60-69 1 4 14 56
70-79 9 36 23 92
80-89 2 8 25 100
Total 25 100
97
Cumulative frequency of 25 ICU patients

98
99
9. Box and Whisker Plot
• It is another way to display information when
the objective is to illustrate certain locations
(skewness) in the distribution .
• Can be used to display a set of discrete or
continuous observations using a single vertical
axis – only certain summaries of the data are
shown
• First the percentiles (or quartiles) of the data
set must be defined

100
• A box is drawn with the top of the box at the
third quartile (75%) and the bottom at the
first quartile (25%).
• The location of the mid-point (50%) of the
distribution is indicated with a horizontal line
in the box.
• Finally, straight lines, or whiskers, are drawn
from the centre of the top of the box to the
largest observation and from the centre of the
bottom of the box to the smallest observation.

101
• Percentile = p(n+1), p=the required percentile
• Arrange the numbers in ascending order
A. 1st quartile = 0.25 (n+1)th
B. 2nd quartile = 0.5 (n+1)th
C. 3rd quartile = 0.75 (n+1)th
D. 20th percentile = 0.2 (n+1)th
C. 15th percentile = 0.15 (n+1)th

102
The pth percentile is a value that is p% of the

observations and  the remaining (1-p)%.
• The pth percentile is:

– The observation corresponding to p(n+1)th if p(n+1)


is an integer
– The average of (k)th and (k+1)th observations if p(n+1)

is not an integer, where k is the largest integer less


than p(n+1).
• If p(n+1) = 3.6, the average of 3th and 4th observations

103
• Given a sample of size n = 60, find the 10th
percentile of the data set.
p(n+1) = 0.10(60+1) = 6.1
= Average of 6th and 7th
– 10% of the observations are less than or equal to this
value and 90% of them are greater than or equal to the
value

104
How can the lower quartile, median and lower quartile be used
to judge the symmetry of a distribution?

1. If the distribution is symmetric, then the upper and


lower quartiles should be approximately equally spaced
from the median.

2. If the upper quartile is farther from the median than the


lower quartile, then the distribution is positively skewed.

3. If the lower quartile is farther from the median than the


upper quartile, then the distribution is negatively skewed.

105
106
Box plots are useful for comparing two or
more groups of observations

107
Outlying values
• The lines coming out of the box are called the
“whiskers”.
• The ends of the “whiskers’ are called “adjacent
values) [The largest and smallest non-outlying
values].

• Upper “adjacent value” = The largest value that is


less than or equal to P75 + 1.5*(P75 – P25).

• Lower “adjacent value” = The smallest value that


is greater than or equal to P25 – 1.5*(P75 – P25).

108
• The box plot is then completed:
– Draw a vertical bar from the upper quartile to the
largest non-outlining value in the sample
– Draw a vertical bar from the lower quartile to the
smallest non-outlying value in the sample
– Outliers are displayed as dots (or small circles) and
are defined by:
Values greater than 75th percentile + 1.5*IQR
Values smaller than 25th percentile − 1.5*IQR
– Any values that are outside the IQR but are not
outliers are marked by the whiskers on the plot.
– IQR = P75 – P25
109
• Number of cigarettes smoked per day was
measured just before each subject attempted to
quit smoking

110
10. Scatter plot
• Most studies in medicine involve measuring more
than one characteristic, and graphs displaying the
relationship between two characteristics are
common in literature.
• When both the variables are qualitative then we
can use a multiple bar graph.
• When one of the characteristics is qualitative and
the other is quantitative, the data can be displayed
in box and whisker plots.

111
• For two quantitative variables we use bivariate
plots (also called scatter plots or scatter
diagrams).

• In the study on percentage saturation of bile,


information was collected on the age of each
patient to see whether a relationship existed
between the two measures.
•A scatter diagram is constructed by drawing X-and Y-axes.
• Each point represented by a point or dot( ) represents a pair of values
measured for a single study subject
112
• The graph suggests the possibility of a positive
relationship between age and percentage
saturation of bile in women.
113
11. Line graph
• Useful for assessing the trend of particular situation overtime.
• Helps for monitoring the trend of epidemics.
• The time, in weeks, months or years, is marked along the horizontal
axis, and
• Values of the quantity being studied is marked on the vertical axis.
• Values for each category are connected by continuous line.
• Sometimes two or more graphs are drawn on the same graph taking
the same scale so that the plotted graphs are comparable.

114
No. of microscopically confirmed malaria cases by species and
month at Zeway malaria control unit, 2003

115
Line graph can be also used to depict the relationship between two
continuous variables like that of scatter diagram.

• The following graph shows level of zidovudine (AZT)


in the blood of AIDS patients at several times after
administration of the drug, for with normal fat
absorption and with fat mal absorption.

116
117
Thank you

118

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy