0% found this document useful (0 votes)

12 views

Kyu Edu 2301 WK3

Uploaded by

suleimanmwamsunye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Kyu Edu 2301 WK3

Uploaded by

suleimanmwamsunye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CHAPTER 3

TEST CONSTRUCTION

Objectives
At the end of this topic, you should be able to:
i. Define basic terms
ii. Explain the importance of Validity and reliability of tests
iii. Discuss different ways of ascertaining Validity and reliability of tests

3.0 Introduction
Test construction is the process of building a test. For a test to be deemed good, the tests
reliability and validity must be determined. This chapter discusses test validity and reliability.

3.1 Reliability of Tests

Reliability is the degree to which an assessment tool produces stable and consistent results.
Reliability is, therefore, the extent to which the research measures that which it is purported to
measure

3.2Types of Reliability
1. Test-retest reliability is the degree to which scores are consistent over time. Test-retest
reliability is a measure of reliability obtained by administering the same test twice over a
period of time to a group of individuals. The scores from Time 1 and Time 2 can then be
correlated in order to evaluate the test for stability over time. Example: A test designed to
assess student learning in psychology could be given to a group of students twice, with the
second administration perhaps coming a week after the first. The obtained correlation
coefficient would indicate the stability of the scores.
2. Parallel forms reliability/ Equivalent-Forms or Alternate-Forms Reliability:
Two tests that are identical in every way except for the actual items included. Used when
it is likely that test takers will recall responses made during the first session and when alternate
forms are available. Correlate the two scores. The obtained coefficient is called the coefficient of
stability or coefficient of equivalence. Problem: Difficulty of constructing two forms that are
essentially equivalent.
It is a measure of reliability obtained by administering different versions of an assessment tool
(both versions must contain items that probe the same construct, skill, knowledge base, etc.) to the
same group of individuals. The scores from the two versions can then be correlated in order to
evaluate the consistency of results across alternate versions.
Example: If you wanted to evaluate the reliability of a critical thinking assessment, you might
create a large set of items that all pertain to critical thinking and then randomly split the questions
up into two sets, which would represent the parallel forms.
Both of the above require two administrations

1
3. Inter-rater reliability is a measure of reliability used to assess the degree to which
different judges or raters agree in their assessment decisions. Inter-rater reliability is useful
because human observers will not necessarily interpret answers the same way; raters may
disagree as to how well certain responses or material demonstrate knowledge of the construct or
skill being assessed.
Example: Inter-rater reliability might be employed when different judges are evaluating the
degree to which art portfolios meet certain standards. Inter-rater reliability is especially useful
when judgments can be considered relatively subjective. Thus, the use of this type of reliability
would probably be more likely when evaluating artwork as opposed to math problems.
4. Internal consistency reliability. It is determining how all items on the test relate to all
other items. It is a measure of reliability used to evaluate the degree to which different test items
that probe the same construct produce similar results.
A. Average inter-item correlation is a subtype of internal consistency reliability. It is
obtained by taking all of the items on a test that probe the same construct (e.g., reading
comprehension), determining the correlation coefficient for each pair of items, and finally taking
the average of all of these correlation coefficients. This final step yields the average inter-item
correlation.
B. Split-half reliability is another subtype of internal consistency reliability. The process of
obtaining split-half reliability is begun by “splitting in half” all items of a test that are intended to
probe the same area of knowledge (e.g., World War II) in order to form two “sets” of items. The
entire test is administered to a group of individuals, the total score for each “set” is computed,
and finally the split-half reliability is obtained by determining the correlation between the two
total “set” scores. Requires only one administration. Especially appropriate when the test is very
long. The most commonly used method to split the test into two is using the odd-even strategy.
Since longer tests tend to be more reliable, and since split-half reliability represents the reliability
of a test only half as long as the actual test, a correction formula must be applied to the
coefficient. Spearman-Brown prophecy formula. Split-half reliability is a form of internal
consistency reliability.

3.3 Validity
Validity refers to how well a test measures what it is purported to measure or the extent to which
a test measures what it is supposed to measure.

Why is it necessary?
While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also needs to be
valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an excess of
5lbs. The scale is reliable because it consistently reports the same weight every day, but it is not
valid because it adds 5lbs to your true weight. It is not a valid measure of your weight.

3.4 Types of Validity

1. Content Validity:
When we want to find out if the entire content of the behavior/construct/area is
represented in the test we compare the test task with the content of the behavior. This is a logical

2
method, not an empirical one. Example, if we want to test knowledge on American Geography it
is not fair to have most questions limited to the geography of New England.

2. Face Validity:
Basically face validity refers to the degree to which a test appears to measure what it
purports to measure. Face Validity ascertains that the measure appears to be assessing the
intended construct under study. The stakeholders can easily assess face validity. Although this is
not a very “scientific” type of validity, it may be an essential component in enlisting motivation of
stakeholders. If the stakeholders do not believe the measure is an accurate assessment of the
ability, they may become disengaged with the task.
Example: If a measure of art appreciation is created all of the items should be related to the
different components and types of art. If the questions are regarding historical time periods, with
no reference to any artistic movement, stakeholders may not be motivated to give their best effort
or invest in this measure because they do not believe it is a true assessment of art appreciation.
3. Construct Validity.
Construct validity is the degree to which a test measures an intended hypothetical construct. it
Construct validity is the degree to which a test measures an intended hypothetical construct. is
used to ensure that the measure is actually measure what it is intended to measure (i.e. the
construct), and not other variables. Using a panel of “experts” familiar with the construct is a way
in which this type of validity can be assessed. The experts can examine the items and decide what
that specific item is intended to measure. Students can be involved in this process to obtain their
feedback.
Example: A women’s studies program may design a cumulative assessment of learning
throughout the major. The questions are written with complicated wording and phrasing. This
can cause the test inadvertently becoming a test of reading comprehension, rather than a test of
women’s studies. It is important that the measure is actually assessing the intended construct,
rather than an extraneous factor.

4. Criterion-Related Validity
When you are expecting a future performance based on the scores obtained currently by the
measure, correlate the scores obtained with the performance. The later performance is called the
criterion and the current score is the prediction. This is an empirical check on the value of the test
– a criterion-oriented or predictive validation. It is used to predict future or current performance -
it correlates test results with another criterion of interest.
Example: If a physics program designed a measure to assess cumulative student learning
throughout the major. The new measure could be correlated with a standardized measure of
ability in this discipline, such as an ETS field test or the GRE subject test. The higher the
correlation between the established measure and new measure, the more faith stakeholders can
have in the new assessment tool.

5. Formative Validity
When applied to outcomes assessment it is used to assess how well a measure is able to provide
information to help improve the program under study. Example: When designing a rubric for
history one could assess student’s knowledge across the discipline. If the measure can provide

3
information that students are lacking knowledge in a certain area, for instance the Civil Rights
Movement, then that assessment tool is providing meaningful information that can be used to
improve the course or program requirements.
6. Concurrent Validity:
Concurrent validity is the degree to which the scores on a test are related to the scores on
another, already established, test administered at the same time, or to some other valid criterion
available at the same time. Example, a new simple test is to be used in place of an old
cumbersome one, which is considered useful, measurements are obtained on both at the same
time. Logically, predictive and concurrent validation are the same, the term concurrent
validation is used to indicate that no time elapsed between measures. 7. Sampling Validity
(similar to content validity)
Ensures that the measure covers the broad range of areas within the concept under study. Not
everything can be covered, so items need to be sampled from all of the domains. This may need
to be completed using a panelof “experts” to ensure that the content area is adequately sampled.
Additionally, a panel can help limit “expert” bias (i.e. a test reflecting what an individual
personally feels are the most important or relevant areas).
Example: When designing an assessment of learning in the theatre department, it would not be
sufficient to only cover issues related to acting. Other areas of theatre such as lighting, sound,
functions of stage managers should all be included. The assessment should reflect the content area
in its entirety.

3.5 Ways to improve validity

1. Make sure your goals and objectives are clearly defined and operationalized. Expectations of
students should be written down.
2. Match your assessment measure to your goals and objectives. Additionally, have the test
reviewed by faculty at other schools to obtain feedback from an outside party who is less
invested in the instrument.
3. Get students involved; have the students look over the assessment for troublesome wording, or
other difficulties.
4. If possible, compare your measure with other measures, or data that may be available.

3.6 Review Questions

1. Define test reliability
2. Explain 5 forms of reliability
3. What is test validity?
4. Discuss the following forms of test validity
a. construct validity
b. content validity
c. criterion related validity
5. How can the validity of a test be enhanced?
References for Further Reading
J.P. Lal (2006); Educational Measurement And Evaluation; Anmol Publications Pvt Ltd
Orodho J. A (2005); Techniques in Writing Research Proposal and Reports; Kanezja H. P
Enterprises – Nairobi
4
Burger W. F (2004); Essentials of Mathematics for Elementary Teachers; Wiley 6th Ed.
Swarupa R. T. P (2006); Educational Measurement and Evaluation; Discovery Publishing
House.

McKeachie, W. J., & Svinicki, M. D. (2006). Assessing, testing, and evaluating: Grading is not
the most important function. In McKeachie's Teaching tips: Strategies, research, and theory
for college and university teachers (12th ed., pp. 74-86). Boston: Houghton Mifflin Company.

McMillan, J. H. (2001). Classroom assessment: Principles and practice for effective instruction.
Boston: Allyn and Bacon.

Piontek, M. (2008). Best practices for designing and grading exams. CRLT Occasional Paper No.
24. Ann Arbor, MI. Center for Research on Learning and Teaching. Available:
/sites/default/files/resources/occasional.php

Leaving Cert Religion Coursework Sample
50% (2)
Leaving Cert Religion Coursework Sample
4 pages
Weissmark Syllabus - Psychology of Diversity - Summer 2022
No ratings yet
Weissmark Syllabus - Psychology of Diversity - Summer 2022
14 pages
PSY 323 TOPIC 3
No ratings yet
PSY 323 TOPIC 3
5 pages
Reliability and Validity: Written Report in Educ 11a
No ratings yet
Reliability and Validity: Written Report in Educ 11a
4 pages
Reliability and Validity in Assessment
100% (2)
Reliability and Validity in Assessment
5 pages
Exploring Reliability in Academic Assessment
No ratings yet
Exploring Reliability in Academic Assessment
6 pages
Lesson 6.2 Item Analysis and Validation 3
No ratings yet
Lesson 6.2 Item Analysis and Validation 3
11 pages
Lesson 6.2 Item Analysis and Validation
No ratings yet
Lesson 6.2 Item Analysis and Validation
24 pages
Issues of Realiability and Validity
100% (1)
Issues of Realiability and Validity
23 pages
LESSON-ACTIVITIES VII: How Do I Know Whether Students Learned? Assessment
No ratings yet
LESSON-ACTIVITIES VII: How Do I Know Whether Students Learned? Assessment
6 pages
Chapter 4 Assessment & Evaluation
No ratings yet
Chapter 4 Assessment & Evaluation
10 pages
Measuring Reliability and Validity
No ratings yet
Measuring Reliability and Validity
18 pages
KPD Validity & Realibility
No ratings yet
KPD Validity & Realibility
25 pages
QUALITY OF A TEST
No ratings yet
QUALITY OF A TEST
7 pages
Validity and Reliability
No ratings yet
Validity and Reliability
5 pages
Validity and Reliability Lesson 3.
No ratings yet
Validity and Reliability Lesson 3.
48 pages
Topic 3 Characteristics and Principles of Assessment
100% (1)
Topic 3 Characteristics and Principles of Assessment
45 pages
Validity & Realibility
No ratings yet
Validity & Realibility
13 pages
Validity and Reliability
100% (1)
Validity and Reliability
6 pages
Validity & Reliability
No ratings yet
Validity & Reliability
27 pages
66cee8ee676c720018ba7acb_##_Research Aptitude 02- Daily Classnotes
No ratings yet
66cee8ee676c720018ba7acb_##_Research Aptitude 02- Daily Classnotes
13 pages
Validity and Reliability
No ratings yet
Validity and Reliability
6 pages
Qualities of Good Test
No ratings yet
Qualities of Good Test
37 pages
What Is Validit1
No ratings yet
What Is Validit1
5 pages
Unit 9
No ratings yet
Unit 9
27 pages
Characteristicsofagoodtest3 140227023631 Phpapp02
No ratings yet
Characteristicsofagoodtest3 140227023631 Phpapp02
41 pages
Class quiz 4
No ratings yet
Class quiz 4
6 pages
Educ105 - Coverage Exam
No ratings yet
Educ105 - Coverage Exam
14 pages
Validity&Reliability
No ratings yet
Validity&Reliability
16 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Qualities of Good Measuring Instruments
56% (9)
Qualities of Good Measuring Instruments
4 pages
Concept of Reliability, Validity and Norms (AutoRecovered)
No ratings yet
Concept of Reliability, Validity and Norms (AutoRecovered)
10 pages
Quantitative Analysis - Sir Audrey
No ratings yet
Quantitative Analysis - Sir Audrey
6 pages
Test - Education (1) STANDARDIZED TESTS
No ratings yet
Test - Education (1) STANDARDIZED TESTS
9 pages
Test Validity and Reliability
No ratings yet
Test Validity and Reliability
3 pages
Test Validity and Reability
No ratings yet
Test Validity and Reability
11 pages
Introduction To Validity and Reliability
No ratings yet
Introduction To Validity and Reliability
6 pages
L9 Qualities of A Good Measuring Instrument
No ratings yet
L9 Qualities of A Good Measuring Instrument
22 pages
Paprint
No ratings yet
Paprint
3 pages
What Is Questionnaire?
No ratings yet
What Is Questionnaire?
4 pages
Establishing Validity-and-Reliability-Test
No ratings yet
Establishing Validity-and-Reliability-Test
28 pages
2.measurement of Validity Reliability
No ratings yet
2.measurement of Validity Reliability
31 pages
Validity and Reliability
100% (2)
Validity and Reliability
20 pages
Validity and Relability
No ratings yet
Validity and Relability
4 pages
Test Ok
No ratings yet
Test Ok
8 pages
6499e3fbd5519 Explain Approaches To Gathering Reliability and Validity Evidence For Specific Psychological Testing Purposes
No ratings yet
6499e3fbd5519 Explain Approaches To Gathering Reliability and Validity Evidence For Specific Psychological Testing Purposes
6 pages
Criterion-Related Validity: Incremental, Local, & Experimental Validity Will Be Discussed Below
No ratings yet
Criterion-Related Validity: Incremental, Local, & Experimental Validity Will Be Discussed Below
15 pages
Reliability and Validity in Research
No ratings yet
Reliability and Validity in Research
5 pages
Validity Refers To How Well A Test Measures What It Is Purported To Measure
No ratings yet
Validity Refers To How Well A Test Measures What It Is Purported To Measure
6 pages
Properties of Assessment Method: Validity
No ratings yet
Properties of Assessment Method: Validity
30 pages
Scales Reliability and Validity
No ratings yet
Scales Reliability and Validity
10 pages
meai.21 (1)
No ratings yet
meai.21 (1)
11 pages
Validity Explains How Well The Collected Data Covers The Actual Area of Investigation
No ratings yet
Validity Explains How Well The Collected Data Covers The Actual Area of Investigation
7 pages
Unit 2 Reliability and Validity (External and Internal)
No ratings yet
Unit 2 Reliability and Validity (External and Internal)
3 pages
Psychological Assessment
No ratings yet
Psychological Assessment
47 pages
Reliability and Validity of Research Instruments: Correspondence To
No ratings yet
Reliability and Validity of Research Instruments: Correspondence To
19 pages
Unit 6
No ratings yet
Unit 6
37 pages
Topic 8F Validity Reliability and Sources of Error
No ratings yet
Topic 8F Validity Reliability and Sources of Error
24 pages
Qualities of Test(Validity & Relibility Etc)
No ratings yet
Qualities of Test(Validity & Relibility Etc)
38 pages
Testing Impact Review
From Everand
Testing Impact Review
Mason Ross
No ratings yet
Research in Psychology
From Everand
Research in Psychology
Connor Whiteley
No ratings yet
Performance-Based Assessment for 21st-Century Skills
From Everand
Performance-Based Assessment for 21st-Century Skills
Todd Stanley
4.5/5 (14)
AGE 2301 AGRICULTURAL GEOGRAPHY
No ratings yet
AGE 2301 AGRICULTURAL GEOGRAPHY
11 pages
ECT 2306 subject methods in geography Year III Semester II
No ratings yet
ECT 2306 subject methods in geography Year III Semester II
1 page
AGE 2304 Environmental Conservation Year III Semester
No ratings yet
AGE 2304 Environmental Conservation Year III Semester
1 page
Age 2302 Geography of Development
No ratings yet
Age 2302 Geography of Development
31 pages
ECT 2305 subject methods history Year III Semester II
No ratings yet
ECT 2305 subject methods history Year III Semester II
2 pages
ECT 2305 Subject Methods History
No ratings yet
ECT 2305 Subject Methods History
2 pages
Kyu Edu 2301 WK11
No ratings yet
Kyu Edu 2301 WK11
7 pages
ECT 2304 Subject Methods (Religious Studies) Year III Semester II
No ratings yet
ECT 2304 Subject Methods (Religious Studies) Year III Semester II
2 pages
ECT 2306 Subject Methods in Geography Supplementary & Special
No ratings yet
ECT 2306 Subject Methods in Geography Supplementary & Special
2 pages
Kyu Edu 2301 WK7
No ratings yet
Kyu Edu 2301 WK7
4 pages
Topic 2 History and Theoretical Foundations of Educational Technology
No ratings yet
Topic 2 History and Theoretical Foundations of Educational Technology
4 pages
Kyu Edu 2301 WK8
No ratings yet
Kyu Edu 2301 WK8
5 pages
Table of Specification
No ratings yet
Table of Specification
9 pages
Edtech Lecture Notes On Justification of Use of Education Technology
No ratings yet
Edtech Lecture Notes On Justification of Use of Education Technology
32 pages
Edu 2301 WK10
No ratings yet
Edu 2301 WK10
6 pages
Content
No ratings yet
Content
151 pages
PNUR 126 Weekly guide W25
No ratings yet
PNUR 126 Weekly guide W25
4 pages
15-16 Cis
No ratings yet
15-16 Cis
133 pages
VVA_Prospectus_2024-2025
No ratings yet
VVA_Prospectus_2024-2025
25 pages
W3 - PCK106, Thrizha Veronica M - BEED3
No ratings yet
W3 - PCK106, Thrizha Veronica M - BEED3
7 pages
Paper 3
No ratings yet
Paper 3
32 pages
Nsec - 2024 Tentative Answer Key
No ratings yet
Nsec - 2024 Tentative Answer Key
1 page
CAS Application Form For Senior Professor UGC Regulation 2018
No ratings yet
CAS Application Form For Senior Professor UGC Regulation 2018
18 pages
Federal Public Service Commission: No.F.4-29 /2022-R 13 March, 2023
No ratings yet
Federal Public Service Commission: No.F.4-29 /2022-R 13 March, 2023
2 pages
Asset-V1 LinuxFoundationX+LFD110x+1T2024+type@asset+block@Introduction To RISC-V LFD110x Syllabus
No ratings yet
Asset-V1 LinuxFoundationX+LFD110x+1T2024+type@asset+block@Introduction To RISC-V LFD110x Syllabus
7 pages
2025-SPRING_2252-MAE-2360-001
No ratings yet
2025-SPRING_2252-MAE-2360-001
4 pages
Hospitality Training Catalog
No ratings yet
Hospitality Training Catalog
100 pages
Achievement Test Proficiency Test
100% (1)
Achievement Test Proficiency Test
3 pages
Amir Aziz
No ratings yet
Amir Aziz
1 page
Tuyet Resume
No ratings yet
Tuyet Resume
2 pages
Cuet Bba Syllabus
No ratings yet
Cuet Bba Syllabus
4 pages
Govt - City College (Autonomous) : Examination Application Form
No ratings yet
Govt - City College (Autonomous) : Examination Application Form
1 page
Dns 37 January 2024
No ratings yet
Dns 37 January 2024
8 pages
Psychological Assessment Chapter 1 - Psychological Testing and Assessment PDF
100% (1)
Psychological Assessment Chapter 1 - Psychological Testing and Assessment PDF
8 pages
doc_1015_15_2023_12_2812_40_52
No ratings yet
doc_1015_15_2023_12_2812_40_52
36 pages
Class 11 Physics NCERT Exemplar Chapters-1
No ratings yet
Class 11 Physics NCERT Exemplar Chapters-1
10 pages
GTU Percentage Conversation Formula (Point 5)
No ratings yet
GTU Percentage Conversation Formula (Point 5)
1 page
Vacancy Announcement - He National College of Tourism (NCT)
No ratings yet
Vacancy Announcement - He National College of Tourism (NCT)
7 pages
stageIII_Instructions
No ratings yet
stageIII_Instructions
12 pages
Isac 1
No ratings yet
Isac 1
36 pages
Academic Calendar Winter VIT 2020-21
No ratings yet
Academic Calendar Winter VIT 2020-21
1 page
Pink Yellow Playful Thesis Defense Presentation _20250222_021203_0000
No ratings yet
Pink Yellow Playful Thesis Defense Presentation _20250222_021203_0000
31 pages
March 2022 (v2) QP - Paper 1 CAIE Maths IGCSE
No ratings yet
March 2022 (v2) QP - Paper 1 CAIE Maths IGCSE
12 pages
Academic Calendar Updated (2)
No ratings yet
Academic Calendar Updated (2)
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Kyu Edu 2301 WK3

Uploaded by

Kyu Edu 2301 WK3

Uploaded by

CHAPTER 3

3.1 Reliability of Tests

3.4 Types of Validity

3.5 Ways to improve validity

3.6 Review Questions

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.