0% found this document useful (0 votes)
61 views

Reliability and Validity Analysis: Dr. Jeevan Jyoti Dept. of Commerce University of Jammu

This document discusses reliability and validity in research analysis. It defines four types of reliability: inter-rater, test-retest, parallel-forms, and internal consistency. Several methods for measuring internal consistency reliability are described, including average inter-item correlation, average item-total correlation, split-half correlation, and Cronbach's alpha. The document also discusses construct validity, including convergent and discriminant validity assessment. Validity is defined as the degree to which a measure accurately represents the concept being studied.

Uploaded by

jyotigupta645375
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Reliability and Validity Analysis: Dr. Jeevan Jyoti Dept. of Commerce University of Jammu

This document discusses reliability and validity in research analysis. It defines four types of reliability: inter-rater, test-retest, parallel-forms, and internal consistency. Several methods for measuring internal consistency reliability are described, including average inter-item correlation, average item-total correlation, split-half correlation, and Cronbach's alpha. The document also discusses construct validity, including convergent and discriminant validity assessment. Validity is defined as the degree to which a measure accurately represents the concept being studied.

Uploaded by

jyotigupta645375
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

RELIABILITY AND VALIDITY

ANALYSIS

Dr. Jeevan Jyoti


Dept. of Commerce
University of Jammu
Reliability
A measure is said to have a high reliability if it
produces similar results under consistent
conditions. Some measures are more reliable
than others, because the tools we use to
measure are better, and there is more
consistency in what we are measuring. For
example, measurements of height and weight
are often extremely reliable. However,
measuring blood pressure and processing
speed might be less reliable.
Types of Reliability
There are four general classes of reliability estimates,
each of which estimates reliability in a different way.
These are:
1. Inter-Rater or Inter-Observer Reliability

2. Test-Retest Reliability

3. Parallel-Forms Reliability

4. Internal Consistency Reliability


Inter-rater Reliability, Inter-rater
Agreement, or Concordance, is the degree
of agreement among raters. It gives a score
of how much homogeneity, or consensus,
there is in the ratings given by judges.
Test –Retest Reliability

It is measured with a test-retest correlation. Test-Retest


Reliability (sometimes called retest
reliability)measures test consistency — the reliability of
a test measured over time. In other words, given the
same test twice to the same people at different times does it
result in same scores.

• Example
Various questions for a personality test are tried out with a
class of students over several years. This helps the
researcher determine those questions and combinations that
have better reliability.
Parallel-Forms Reliability
It is a measure of reliability obtained by administering different
versions of an assessment tool (both versions must contain
items that probe the same construct, skill, knowledge base,
etc.) to the same group of individuals. evaluates different
questions and question sets that seek to assess the same
construct.
The creation of parallel forms begins with the generation of a
large pool of items representing a single content domain or
universe. At minimum, the size of this item pool should be more
than twice the desired or planned size of a single test form.
• Examples
An experimenter develops a large set of questions. They split
these into two and administer them each to a randomly-
selected half of a target sample.
In development of national tests, two different tests are
simultaneously used in trials. The test that gives the most
consistent results is used, whilst the other (provided it is
sufficiently consistent) is used as a backup.
Internal Consistency Reliability
Average inter-item correlation compares
correlations between all pairs of questions
that test the same construct by calculating the mean of
all paired correlations.
Average item total correlation takes the average inter-item
correlations and calculates a total score for each item, then
averages these.
Split-half correlation divides items that measure the same
construct into two tests, which are applied to the same group of
people, then calculates the correlation between the two total
scores.
Cronbach’s alpha is a measure of internal consistency, that is,
how closely related a set of items are as a group. It is considered
to be a measure of scale reliability. A "high" value for alpha does
not imply that the measure is unidimensional.
Internal Consistency Reliability

Item 1 Average inter-item correlation

Item 2 I1 I2 I3 I4 I5 I6

I1 1.00
Item 3
I2 .89 1.00
Test I3 .91 .92 1.00
Item 4 I4 .88 .93 .95 1.00
I5 .84 .86 .92 .85 1.00
I6 .88 .91 .95 .87 .85 1.00
Item 5

Item 6
Internal Consistency Reliability

Item 1 Average inter-item correlation

Item 2 I1 I2 I3 I4 I5 I6

I1 1.00
Item 3
I2 .89 1.00
Test I3 .91 .92 1.00
Item 4 I4 .88 .93 .95 1.00
I5 .84 .86 .92 .85 1.00
I6 .88 .91 .95 .87 .85 1.00
Item 5

Item 6

.90
Internal Consistency Reliability
Average item-total correlation

Item 1
I1 I2 I3 I4 I5 I6
Item 2
I1 1.00
I2 .89 1.00
Item 3 I3 .91 .92 1.00
Test I4 .88 .93 .95 1.00
Item 4 I5 .84 .86 .92 .85 1.00
I6 .88 .91 .95 .87 .85 1.00
Total .84 .88 .86 .87 .83 .82 1.00
Item 5

Item 6

.85
Internal Consistency Reliability
Split-half correlations
Item 1

Item 2 Item 1 Item 3 Item 4

Item 3
Test
Item 4 .87
Item 5

Item 2 Item 5 Item 6


Item 6
Internal Consistency Reliability
Cronbach’s alpha ()

Item 1
item 1 item 3 item 4 item 1 item 3 item 4 item 1 item 3 item 4

.87 .87 .87


Item 2
item 2 item 5 item 6 item 2 item 5 item 6 item 2 item 5 item 6

Item 3
SH1 .87
Test
SH2 .85
Item 4 SH3 .91 Like the average of all
SH4 .83 possible split-half
SH5 .86 correlations
Item 5
...
SHn .85
Item 6
 = .85
Composite Reliability

Whereby, λ (lambda) is the standardized factor loading for


item i and ε is the respective error variance for item i.
Online calculator website:
http://www.thestatisticalmind.com/composite-reliability/
• 1. Enter the standardized loading for the first item.
• 2. Click “add item” and continue to enter the
standardized loading for each item.
• 3. To remove any item, click “delete”.
• 4. To reset the form, click “reset”.
Validity
• Validity is concerned with how well the
concept is defined by the measure
• It shows the extent to which a measure
or set of measures correctly represent
the concept of the study
• It is the degree to which the measure is
free of any systematic or non-random
error
Predictive validity is the extent
to which a score on a scale or
test predicts scores on some
criterion measure. For example,
the validity of a cognitive test
for job performance is the
correlation between test scores
and, for example, supervisor
performance ratings.
• Content /face validity: this form of
validity is established by evaluating the
relevance of the test items individually
and as a whole. The item should
constitute a representative sample of
the variable that is to be measured. It
can be done by review of concerned
literature and discussion with the
subject experts
Construct Validity
It defines how well a test or experiment measures
up to its claims. A test designed to measure
depression must only measure that particular
construct, not closely related ideals such as
anxiety or stress. It can be measured by
establishing:
•Convergent validity

•Discriminant validity
Construct Validity
• Convergent validity tests that variables used to measure a constructs
expected to be related are, in fact, related. It is established through
• Factor Loadings > .50
• AVE > .50
• AVE=
Sum of Squared Standardised Regression Weights
Sum of Squared standardised Regression Weights+ Sum of Standard
error

• Discriminant validity tests that constructs that should have no


relationship do, in fact, not have any relationship. (also referred to as
divergent validity). It is established when AVE is greater than squared
correlation between the variables.
• AVE> r 2
Discriminant validity and Correlation Analysis

Emotional
Cynicism Inefficacy
Constructs Ability Motivation Opportunity Exhaustion
Ability 0.529
Motivation (.233) 0.544

.483**
Opportunity (.099) (.145) 0.523

.314** .382**
Emotional (.041) (.056) (.073) 0.489
Exhaustion
-.203** -.236** -.270**
Cynicism (.099) (.083) (.094) (.389) 0.529

-.316** -.288** -.308** .624**


Inefficacy (.076) (.087) (.122) (.398) (.439) 0.533

-.277** -.296** -.349** .631** .663**


Values on the diagonal axis represent AVE and values in the parentheses are squared
correlations. ** p<.01

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy