0% found this document useful (0 votes)
60 views

Week 7 - Test Development

The test development process involves several steps: 1) Test conceptualization which involves answering preliminary questions about what the test will measure, who will use it, and how it will be administered. 2) Test construction including writing test items, developing scales, and constructing the test. 3) Test tryout which involves piloting the test and analyzing item performance and reliability. 4) Test revision based on results of the tryout to improve the test before full implementation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Week 7 - Test Development

The test development process involves several steps: 1) Test conceptualization which involves answering preliminary questions about what the test will measure, who will use it, and how it will be administered. 2) Test construction including writing test items, developing scales, and constructing the test. 3) Test tryout which involves piloting the test and analyzing item performance and reliability. 4) Test revision based on results of the tryout to improve the test before full implementation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Test Development Process

Test Development ➢

Test Conceptualization SOME PRELIMINARY QUESTIONS


Test Construction
• What is the test designed to measure?
• What is the objective of the test?
• Is there a need for this test?
• Who will use this test?
Test Tryout • Who will take this test?
• What content will the test cover?
• How will the test be administered?
• What is the ideal format of the test?
Item Analysis • Should more than one form of the test be developed?
• What special training will be required of test users for
administering or interpreting the test?

• What types of responses will be required of test
takers?
• Who benefits from an administration of this test?
→ • Is there any potential for harm as the result of an
administration of this test?
• How will meaning be attributed to scores on this test?
Test Revision Norm-Referenced vs Criterion-Referenced Tests: Item
Development Issues:


Norm-Referenced Test

Test Conceptualization


“There ought to be a test designed to measure [fill in
the blank] in [such and such] way.”
➢ Criterion Referenced Test



TYPES OF SCALES:
Age-Based Scale

Grade-Based Scale

Stanine Scale

Unidimensional Scale
PILOT WORK
Multidimensional Scale
Pilot Work, Pilot Study, and Pilot Research

SCALING METHODS:

➢ Rating Scale




Test Construction ➢

SCALING
→ Summative Rating Scale
Scaling


→ Likert Scale

scale values

L.L. Thurstone


➢ absolute scaling

→ ➢

Method of Paired Comparison ➢ Scalogram Analysis


➢ Objective

Method of Equal-Appearing Intervals

➢ ➢

WRITING ITEMS
➢ Advantage

Sorting Task

➢ Comparative Scale → What range of content should the items cover?


→ Which of the many different types of item formats
should be employed?

→ How many items should be written in total and for
each content area
➢ Categorical Scale
Item Pool

Guttman Scale ➢


→ True-False Item


→ → Other Variety of Binary-Choice Format

→ →

→ → Disadvantage

Constructed-Response Format

Item Format
Types of Constructed-Response Item Format:
Completion Item

Selected-Response Format →

→ Disadvantage

Short-Answer Item

Types of Selected-Response Item Format:



Multiple-Choice Format

Essay Item

Matching Item


→ Drawback

Writing Items for Computer Administration


Binary-Choice Format
Item Bank ➢

➢ Advantages
→ E.g., If a respondent answers an item in a way that
suggests he or she is depressed, the computer
might automatically probe for depression-related
➢ symptoms and behavior.

Item Branching

➢ SCORING ITEMS

Cumulative Model

➢ Computerized Adaptive Testing (CAT)

Class Scoring or Category Scoring


→ →

→ Advantages:
Ipsative Scoring

Floor Effect →


Test Tryout
Ceiling Effect

→ ➢
➢ ➢

Item-Endorsement Index

phantom
factors ➢ p
p1

→ If 50 of the 100 examinees answered item 2


WHAT IS A GOOD ITEM? 𝟓𝟎
correctly, then: 𝒑𝟐 = 𝟏𝟎𝟎 =. 𝟓; and
➢ → If 75 of the 100 examinees answered item 2
➢ 𝟕𝟓
correctly, then: 𝒑𝟑 = 𝟏𝟎𝟎 =. 𝟕𝟓; we could say that
→ item 3 was easier than item 2.

Item Analysis ∑𝒑
𝒂𝒗𝒆𝒓𝒂𝒈𝒆 𝒑 = 𝒏
Item Analysis ➢

➢ quantitatively
qualitative ➢


𝐜𝐡𝐚𝐧𝐜𝐞 𝐨𝐟 𝐬𝐮𝐜𝐞𝐬𝐬 𝐩𝐫𝐨𝐩𝐨𝐫𝐭𝐢𝐨𝐧 + 𝟏. 𝟎𝟎
𝑰𝒅𝒆𝒂𝒍 𝒑 =
𝟐
item’s difficulty
item’s reliability ideal p = 0.75
item’s validity
ideal p = 0.6
item discrimination
ITEM-RELIABILITY INDEX
ITEM-DIFFICULTY INDEX
Item-Reliability Index
Item’s Difficulty



(s)
(r)

[1] The item-score standard deviation



s1

p1

𝒔𝟏 = √𝒑𝟏 (𝟏 − 𝒑𝟏 )
Factor Analysis – The correlation (r) between the item score and the
criterion score


(r1 C)
(s1),

Item-validity index = 𝒔𝟏 𝒓𝟏 𝒄


ITEM-DISCRIMINATION INDEX

Item-Discrimination Index

→ → E.g., A multiple-choice item on an achievement


test is a good item if most of the high scorers
answer correctly and most of the low scorers
answer incorrectly.
→ An item on an achievement test is not doing its job
ITEM-VALIDITY INDEX if it is answered correctly by respondents who
Item-Validity Index least understand the subject matter.
→ An item on a test purporting to measure a
particular personality trait is not doing its job if
responses indicate that people who score very low
on the test as a whole (indicating absence or low
levels of the trait in question) tend to score very

high on the item (indicating that they are very
high on the trait in question—contrary to what the
test as a whole indicates).


d d

(U)
(L)

➢ Alternatives
𝑼−𝑳 ∙A B C D E
𝒅=( ) 𝒏= Item 1 U 24 3 2 0 3
𝒏
L 10 5 6 6 5

U L

(U) (L)
Alternatives
A B C D ∙E
Item 2 U 2 13 3 2 12
L 6 7 5 7 7

U
Alternatives
U A B ∙C D E
L Item 3 U 0 0 32 0 0
L 3 2 22 2 3

U
U
L
L
Alternatives
A ∙B C D E
d Item 4 U 5 15 0 5 7
L 4 5 4 4 14

d = +1.00 → U
L

d=0→ U
L

d = –1.00 → U
L Alternatives
A B C ∙D E
Item 5 U 14 0 0 5 13
L 7 0 0 16 9
ANALYSIS OF ITEM ALTERNATIVES. L
U


ITEM-CHARACTERISTIC CURVES OTHER CONSIDERATIONS IN ITEM ANALYSIS

Item-Characteristic Curves GUESSING.

➢ For Discriminability Level:


➢ For Difficulty Level:


ITEM FAIRNESS.


SPEED TESTS. Expert Panels

→ Sensitivity Review


QUALITATIVE ITEM ANALYSIS



Qualitative Methods


Test Revisions

TEST REVISION AS A STAGE IN NEW TEST DEVELOPMENT



Qualitative Item Analysis


➢ One cautionary note: →


“Think Aloud” Test Administration



TEST REVISION IN THE LIFE CYCLE OF AN EXISTING TEST
➢ ➢


An existing test be kept in its present form as long


as it remains “useful” but that it should be revised ➢
“when significant changes in the domain
represented, or new conditions of test use and
interpretation, make the test inappropriate for its →
intended use.”
➢ →

Cross-Validation

validity shrinkage

Co-Validation

co-norming
THE USE OF IRT IN BUILDING AND REVISING TESTS

[1] Evaluating the properties of existing tests and guiding


test revision. →



[2] Determining measurement equivalence across test
taker populations.
→ Differential Item Functioning (DIF)


→ DIF Analysis

→ DIF Items

[3] Developing item banks.


You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy