0% found this document useful (0 votes)
3 views

Assessing Data Quality

The document discusses the importance of assessing data quality in research through the concepts of reliability and validity of measurement instruments. Reliability refers to the consistency of results across different applications, while validity measures whether an instrument accurately represents the variable it intends to measure. Various methods for evaluating reliability, such as test-retest, inter-rater, and split-half techniques, as well as types of validity including face, content, criterion, and construct validity, are outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Assessing Data Quality

The document discusses the importance of assessing data quality in research through the concepts of reliability and validity of measurement instruments. Reliability refers to the consistency of results across different applications, while validity measures whether an instrument accurately represents the variable it intends to measure. Various methods for evaluating reliability, such as test-retest, inter-rater, and split-half techniques, as well as types of validity including face, content, criterion, and construct validity, are outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

ASSESSING DATA QUALITY (VALIDITY AND RELIABILITY)

Measurement is an essential process to obtain data that is gathered from measurement tool
constructed by the researcher during a course of research study. Usually, the researchers construct the
measurement tool, and they do not simply assume that their measures work; instead, they collect data
to show that they are valid and reliable instrument for quantitative research study. In case, the tools
show that they are not valid and reliable, the researcher stops using them and try to construct
alternative means of measurement. However, to demonstrate the data quality, the researchers have
two distinct criteria to evaluate their quantitative measures i.e., reliability and of validity of instrument.

Reliability of measuring instruments


Reliability is the degree to which research method produces stable and consistent results. A specific
measure is considered to be reliable if its application on same object produces the same results on
number of times. The reliability of an instrument can be assessed in various ways based on their nature
of instrument and on the aspects of reliability. The reliability of any instrument is concerned with
consistency of a measure; across time (test-retest reliability to assess stability), across items (split half
test to assess the internal consistency/homogeneity), and across different researchers (inter-rater /
inter-observer reliability for equivalence).

Types of reliability
1. External reliability: it is the extent to which a measure varies from one use to another
a. Test-retest reliability: measures the stability of test over a time
b. Inter-rater reliability: to the degree to which different raters give consistent estimates of
the same behavior
2. Internal reliability: it is the extent to which a measure is consistent within itself
c. Split half test: measures the extent to which all parts of the test contribute equally to
what is being measured

Test-retest reliability (stability over time): this test focus on instrument’s susceptibility to extraneous
factors over time, such as subject fatigue or environmental conditions. Stability is the extent to which
an instrument is administered twice to the same sample on two separate occasions (Time 1 and Time 2
usually after 7 days). A researcher will administer the measures on two occasions, and then compare
the scores using coefficient-correlation. Theoretically, the score value of coefficient reliability ranges
between -1.00 through .00 to +1.00. In general, perfection of coefficient reliability is difficult and hence
most of the researchers accept a test-retest correlation at 0.70 or greater depending upon the type of
instrument and area of research. Having good test re-test reliability signifies the internal validity of a
test and ensures the measurements obtained in two occasions are stable over time.

For example; if a scale weighed a person at 60 kg one minute and 60.01 kg the next, we would consider
it reliable instrument. The less variation of an instrument in repeated measures produces the higher in
its reliability. Any good measure of instrument should produce roughly the same scores, and a measure
that produces highly inconsistent scores over time cannot be a good construct and not reliable.

Test-retest reliability is used when the attributes are fairly stable in nature (e.g., self esteem that
usually does not fluctuate). This method is relatively an easy approach and can be used with interview
schedule, questionnaire, observational and physiological measures.

Procedure of conducting test re-tests method


1. Select the subjects from target population (10% of total samples other than actual study)
2. Time 1: administer measuring instrument to a group of sample
3. Time 2: re-administer same measures to same group of sample (usually after 7 days; time
duration may vary in occasions based on some attributes)
4. Compare the scores of two different occasions using Karl Pearson’s coefficient formula
5. Interpret the results: Perfect reliability (+1.00), Acceptable reliability (+.70 and above),
Questionable reliability (+.69 and below) or No reliability (+00)
Example of test re-test reliability on self esteem instrument

Subject Number Time 1 Time2


1 55 57
2 49 46
3 78 74
4 37 35
5 44 46
6 50 56
7 58 55
8 62 66
9 48 50
10 67 63
r=.95 (highly acceptable reliability)
Drawbacks of test re-tests method
a. Many traits (attitude, behavior, knowledge, physical conditions etc.) do change independently
with time which affects the measures of stability.
b. The observer’s or researcher’s coding on the second administration may be influenced by their
memory of first administration regardless of actual values of present behavior.
c. Subjects may change as a result of first administration
d. Subjects and researchers may not be careful using the same instrument in the second time
e. Boring on the second occasion

Inter-rater / inter-observer reliability (for equivalence): is estimated by having two or more trained
observers watching an single event simultaneously and independently recording the data to measure
the reliability to establish equivalence or consistency in their judgments primarily with observational
measures of instruments.
Inter-rater reliability is used to compute an index level of equivalence or agreement between the raters
or judges using coefficient correlation technique to demonstrate the strength of relationship between
one observer’s ratings to another’s.
There is another procedure is to compute reliability as a function of agreements between observers
using equation formula. If everyone agrees, it is 1 (or 100%) and if everyone disagrees, it is 0 (0%). The
equation formula for measuring reliability is -

Number of agreements
Number of agreements + Disagreements
Drawbacks of inter-rater / inter-observer reliability
a. The observers may tend to overestimate or underestimate the observation
b. The agreement equation formula may tends to overestimate observer agreements; the observe
may code for absence or presence, 50% of time by chance only

Split half test/technique (Internal consistency or homogeneity across the items)–Internal consistency
is concerned with consistency or homogeneity of results to an extent which all parts of the test
items/subparts contribute equally to what is being measured. In simpler terms, it is the degree to
which the subparts of an instrument yield the same results within the same test. One of the oldest,
cheapest and easiest methods for assessing internal consistency is the split-half technique where
researchers prefer to measure the reliability by including two versions of same instrument within the
same test.
In split-half reliability, the items of instrument are split into two parts or groups, and then both parts
are given to one group of subjects at the same time. Then, scores from both parts of the test are
correlated (Cronbach’s alpha formula) to test the reliability.

Procedure of split half test/technique


1. Select the subjects from target population (10% of total samples other than actual study)
2. Randomly divide the items or questions of instrument into two parts or groups; either odd/
even number or first half / second half.
a. Based on first half and second half: divide the instrument in two equal parts.
b. Based on odd and even numbers: one part with odd items like; 1, 3, 5, 7, 9 and so on
(first half test). And the other with even items like; 2, 4, 6, 8, 10 and so on (second half ).
3. Administer both parts simultaneously; first half & second half test items to one group of
subjects at the same time
4. compare the scores of first half & second half test using Cronbach’s alpha formula to measure
the internal consistency of an instrument
5. Measure the score and interpret results: Perfect reliability (+1.00), Acceptable reliability (+.70
and above), Questionable reliability (+.69 and below) or No reliability (+00)

Split half technique is easy, economical and widely used reliability test as it requires single
administration. And it is the best means of assessing an important source of measurement error in
psychological instruments. This technique is most commonly used for multiple choice tests (also used
for other type of tests). The multiple choice tests contain distinct subtests or subparts, but related
concepts. In Split half test, the internal consistency of subparts is typically assessed, and if subpart
score are summed for an overall score, the scale’s internal consistency can be assessed. One of the
drawbacks is that it works only for a large set of questions that measure the construct.

Validity of measuring instruments


Validity is the extent to which the scores from a measure represent the variable they are intended to.
But how do researchers make this judgment? We have already considered one factor i.e., reliability.
When a measure has good test-retest reliability and internal consistency, researchers should be more
confident that the scores represent what they are supposed to. There has to be more to it, however,
because a measure can be extremely reliable but have no validity whatsoever. As an example, imagine
someone who believes that people’s index finger length reflects their self-esteem and therefore tries to
measure self-esteem by holding a ruler up to people’s index fingers. Though it is good test-retest
reliability, has no validity. The fact that one person’s index finger is a centimeter longer than another’s,
and that would indicate nothing about one had higher self-esteem. Therefore an instrument can,
however, be reliable without being valid.

Therefore, the second important criterion for evaluating a quantitative instrument is its validity. It is
the degree to which an instrument measures what it is supposed to measure. In other words, validity is
the appropriateness, completeness, and usefulness of an attribute measuring research instrument. For
example, a thermometer supposed to measure only body temperature; and can’t be considered as
valid instrument if it measures an attribute other than the temperature. Similarly, if a researcher
constructed instrument measures on pain, and if it includes the items on anxiety, can’t be considered
as valid. Hence, the valid instrument should measure only what it supposed to measure.

Measures of validity / types of validity / aspects of validity


1. Face validity
2. Content validity
3. Criterion validity
4. Construct validity

Face validity – is the overall looking of an instrument with regard to its appropriateness to measure
specific attribute. Although it is not considered as primary evidence, it is helpful for a measure to have
face validity if other type of validity has been demonstrated. For example; most people would expect a
self-esteem questionnaire to include items about whether they see themselves as a person of worth
and whether they think they have good qualities. So a questionnaire that included these kinds of items
would have good face validity.
Content validity – is the extent to which a measuring instrument provides adequate coverage of the
specific content. In other words, it is concerned with the degree to which an instrument has
appropriate and representative samples of items for the construct being measured. The content
validity of an instrument is primarily based on judgment, and there are no completely objective
methods to ensure adequate content coverage of an instrument. However, in recent years it has
become common to use a panel of experts to evaluate the new instruments for its adequacy and
appropriateness. The panel typically consist at least three members excluding language expert. The
content validity of an instrument is relevant for both; cognitive measures and affective measures
(feeling, emotions, and other psychological traits).

Criterion–related validity – involves determining the relationship between an instrument and an


external criterion. In other words, it is the extent to which scores on a measure are correlated with
other variables (known as criteria). The instrument said to be valid if its scores correlates highly with
scores on the criterion.

Construct validity – is the most complex and abstract. A measure is said to possess construct validity to
the degree that it confirms to predicted correlations with other theoretical propositions.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy