Test Construction and Administration
Test Construction and Administration
Fourth stage
Group : B
Test Construction
(II) Introduction: Test is the most widely used approach of all assessment methods; testing has
been the centre of discussion and debate among educators for years. We may use the evidence
to make statements about student competence or make decisions about the next aspect of
teaching for particular students. Testing consists of four primary steps: test construction, test
administration, scoring and analysing the test. Each of these steps can result in various test
forms and elicit a variety of valuable outcomes.
Types of Test
Broad Categories
Formative – Short term assessment such as classroom assessment techniques (CAT)
Summative– Long term assessment such as comprehensive final exams
• Aptitude Test: An aptitude test is an exam used to determine an individual's skill or propensity
to succeed in a given activity.
1
• Proficiency Test: check learner levels about general standards. They provide a broad picture
of knowledge and ability.
Achievement Test: Achievement or progress tests measure the students’ improvement in their
syllabus. These tests only contain items which the students have been taught in class. There are
two types of progress tests: short-term and long-term.
Short-term progress tests check how well students have understood or learned the material
covered in specific units or chapters. They enable the teacher to decide if remedial or
consolidation work is required.
Long-term progress tests check the learners’ progress over the course. They enable the students
to judge how well they have progressed. Administratively, they are often the sole basis of
decisions to promote to a higher level.
The primary goal of assessment via examination is to accurately measure student achievement
of desired knowledge and competencies, which are generally articulated through learning
objectives3.
b) Validity
Validity indicates how well an assessment is measuring what it is supposed/claim to measure.
Approaches to Validity
i. Face Validity: is the extent to which a test is subjectively viewed as covering the concept
it purports to measure. It refers to the transparency or relevance of a test as it appears to
test participants.
ii. Content Validity: Content validity assesses whether a test is representative of all aspects
of the construct.
iii. Construct Validity: evaluates whether a measurement tool represents the thing we are
interested in measuring. It’s central to establishing the overall validity of a method.
iv. Criterion Validity: evaluates how closely your test results correspond to the results of a
different test.
c) Reliability
Reliability tells you how consistently a method measures something. You should get the same
results when you apply the same method to the same sample under the same conditions. If
not, the method of measurement may be unreliable. There are four main types of reliability.
Each can be estimated by comparing different sets of results produced by the same method.
2
i. Test-retest reliability: measures the consistency of results when you repeat the same test
on the same sample at a different point in time. You use it when you measure something
that you expect to stay constant in your sample.
ii. Interrater reliability: (also called interobserver reliability) measures the degree of
agreement between different people observing or assessing the same thing. For example,
when researchers collect data, you use it to assign ratings, scores or categories to one or
more variables.
iii. Parallel forms reliability: measures the correlation between two equivalent versions of a
test. You use it when you have two different assessment tools or sets of questions designed
to measure the same thing.
iv. Internal consistency: assesses the correlation between multiple items in a test intended to
measure the same construct.
• Weight to Content: Indicates the various aspects of the content to be tested and the
weight to be given
3
• Weight to Form of Questions: Indicates the forms of questions to be included in the
test and their weight
4
Domains of Learning/Levels of Assessment
5
\Example of a blue print cognitive domain
6
• Creating: create a drink using three fruits that would be considered highly
healthiest
7
• Processing speed: Some question types are more easily processed and can be more
quickly answered. This can impact the timing of the test and the distribution of students’
effort across different knowledge domains.
All test items should:
• Assess the achievement of learning outcomes for the unit and/or course
• Measure essential concepts and their relationship to that unit and course
• Align with your teaching and learning activities and the emphasis placed on concepts
and tasks
• Measure the appropriate level of knowledge
• Vary in levels of difficulty (factual recall and demonstration of knowledge, application
and analysis, and evaluation and creation)
Categories of Test Items: There are two general categories for test items
1. Objective Items – students select the correct response from several alternatives or supply a
word or short phrase answer. These items are easier to create for lower-order Bloom’s (recall
and comprehension) while still designing for higher-order thinking test items (apply and
analyse). Objective test items include:
a) Multiple choice: provide an excellent pre-assessment indicator of student knowledge and a
source for a post-test discussion.
Use Multiple Choice Questions to Assess:
• Information recall
• Application
• Evaluation
• Understanding concepts
Advantages
• easy to score,
• increase reliability,
• may lower test anxiety,
• requires little instruction, and
• manageable for beginning learners who can't produce a lot.
• Can cover lots of content areas on a single exam
Disadvantages
• Often test literacy skills: “if the student reads the question carefully, the answer
is easy to recognise even if the student knows little about the subject.”
• Provide unprepared students with the opportunity to guess, and with correct
guesses, they get credit for things they don’t know
• Expose students to misinformation that can influence subsequent thinking about
the content
• Take time and skill to construct (especially good questions) .
Tips for developing for MCQs
• Avoid using the same correct answer option for each question
• Keep answer options to a minimum. Too many become confusing
• Keep question text clear and to the point
• Try to keep all answers the same length
• Avoid using “all of the above” - too obvious
• Avoid using “none of the above.”
• Keep distractors plausible
• Randomise answer options - will prevent candicates from memorising the letters
• When using numbers, keep answer options in a logical order
• Avoid using double negatives
• Repeating words/phrases such as “Did you know” - keep this in the question only.
8
Poor example:
A nurse is assessing a client who has pneumonia. Which of these assessment findings
indicates that the client does not need to be suctioned?
a) Diminished breath sounds
b) Absence of adventitious breath sounds
c) Inability to cough up sputum
d) Wheezing following bronchodilator therapy
Good example:
Which of these assessment findings, if identified in a client who has pneumonia, indicates
that the client needs to be suctioned?
a) Absence of adventitious breath sounds
b) Respiratory rate of 18 breaths per minute.
c) Inability to cough up sputum.
d) Wheezing before bronchodilator therapy.
Make the distractors mutually exclusive.
Poor example:
How long does a biennial plant generally live?
a) It dies after the second year
b) It lives for many years
c) It lives for more than one year
d) It needs to be replanted every two years
Good example
How long does a biennial plant generally live?
a) One year
b) Two years
c) Several years
Make distractors approximately equal in length. Students often select the most extended
option as the correct answer.
b) True-false: True and False Questions consist of a question and two answer options. More
often than not, the answer options used are 'True and False'. You can however use other
options, such as 'Yes' and 'No', 'I Agree' and 'I Disagree'.
Also known as: TF, binary choice questions, objective
Use True and False Questions to Assess:
• Recognizing facts
• Reflection of materials learned
• Knowledge check
Question Usage Ideas:
• Statement Analysis
• Feedback
• Item Analysis
• Pre Tests
• Surveys
Advantages of True and False Questions:
• Can customize to use 'Yes' and 'No' or 'I Disagree' and 'I Agree'
• Easy to grade on paper
• Automatically graded online
• Can be answered quickly by Test takers
• Large range of content can be tested
• Questions are easy to create
9
Disadvantages of True and False Questions:
• Takes time to create questions
• There's a 50% chance of candidates getting the question correct
• Hard to determine who knows the material and who doesn't
• Can be “too easy”
• Candidates can just check an answer without any comprehension of the question
Tips for developing true/false questions
• Keep question text to a minimum
• Add more 'false' questions than 'true'. Candidates tend to choose 'true' more than they
do 'false'.
• Use your own wording
• Avoid using double negatives
• Use only one fact/statement per each question
• Keep the statement either all true or all false - no in between
• Be clear with your wording
• Keep both true and false statements the same length
10
• Keep matches (right side) plausible
• Shuffle matches and clues
• Consider limiting the number of items to 10 or less.
2. Subjective Items – students present an original answer. These types of items are easier to
use for higher-order Bloom’s (apply, analyse, synthesise, create, evaluate). Subjective test
items are best used when essay questions assess:
• Comprehension of material learned
• Writing skills
• Evaluation
• Analysis
• User's ability to organise facts and ideas
• Vocabulary
• Problem Solving
Question Usage Ideas:
• Gain Feedback
• Gather information
• Comparison of two items
• Discussion
a) Essay There are two significant categories of essay items-short response (also referred to
as restricted or brief) and extended response.
i) Short Response
• Usually require students to respond to an open-ended prompt using a few words to a
few sentences.
11
• Short Response items are more focused and constrained than extended response
questions. For example, a short response might ask a student to “write an example”,
“list three reasons”, or “compare and compare two techniques”.
Advantages
• Quick and easy to grade
• Quick and easy to write
Disadvantages
• Encourage students to memorise terms and details so that their understanding of
the content remains superficial
• It can be challenging to develop a key that can accommodate a variety of responses.
b) Performance Testing
• An assessment of individual performance in a systematic way.
• It requires an examinee to perform a task or activity rather than simply answering questions
referring to specific parts.
• The purpose is to ensure greater fidelity to what is being tested. It can be individual or group.
Some performance tests are simulations.
12
• An example is OSCE questions
Advantages
• It can be used to assess from multiple perspectives
• Direct observation of student ability
• Can be scored holistically or analytically
• Active student engagement
• Authentic assessment of ability
• Assess transfer of skills and integration of content
• Encourages time on academics outside of class
• Provide a dimension of depth not available in the classroom
• Promote student creativity
• Can be summative or formative
• Place faculty more in a mentor role than as a judge
• Provide an avenue for student self-assessment and reflection
• Can be embedded within courses
• Most valid way of assessing skill development
Disadvantages
• It can be very time consuming
• It can be costly
• Relies heavily on student initiative and drive
• It relies heavily on specific skill sets of students
• Ratings and Results can be subjective
• It can be intimidating to students
• Requires careful design and training of ratter’s
• Sample of behaviour or performance may not be typical, especially if observers are present
In summary, Objective and subjective test items are both suitable for measuring most learning
outcomes and are often used in combination. Both types can be used to test comprehension,
application of concepts, problem-solving, and ability to think critically. However, ertain types
of test items are better suited than others to measure learning outcomes. Learning outcomes
that require a student to ‘demonstrate’ may be better measured by a performance test item. In
contrast, work requiring the student to ‘evaluate’ may be better measured by an essay or short
answer test item.
13
TEST ADMINISTRATION
(I) Introduction: Test administration procedures are developed for an exam program to help
reduce measurement error and increase the likelihood of fair, valid, and reliable assessment.
Consistent, standardised administration of the exam allows you to compare examinees' scores
directly. However, the examinees may have taken their tests on different dates, at various
sites, and with different proctors.
14
(IV) Methods of Test Administration
(i) Paper-and-Pencil Tests (PPT)
• PPT refer to a general group of assessment tools in which candidates read questions
and respond in writing.
• One of the most common and systematic ways of gathering information about the
learners’ behaviour and performance.
• Assess the level of knowledge and ability or skill qualifications.
• Because many candidates can be assessed simultaneously with a paper-and-pencil test,
such tests are an efficient method of assessment.
Advantages
• Economical in terms of time and money
• Provides an opportunity to obtain detailed feedback for both the teachers and learners
because the responses are being recorded.
• A large number can be tested at the same time
• Allows one to test the students under uniform conditions because examination time can
be strictly controlled.
• Cover a wide area of syllabus than performance and oral tests.
• All students answer the same question paper; hence comparison of the results can be
made effectively.
Disadvantages
• The results may be influenced by external factors like sickness, stress etc.
• High cost associated with the process.
• Non-eco-friendly- a lot of paper is needlessly wasted in the traditional evaluation
process.
15
• Computerised Adaptive Testing (CAT)
• Computerised Simulations and Multimedia
16
• The number of tests administered will differ with more tests administered to candidates
whose knowledge or ability is close to the passing point and fewer tests that pass or fail.
A CCT requires several components:
• An item bank calibrated with a psychometric model selected by the test designer
• A starting point
• An item selection algorithm
• A termination criterion and scoring procedure
Advantages of T/CMT: Promote retentiveness and mastery skills
Disadvantage of CCT/CMT: Not suitable for a large number of students
17
• Mistakes may be made in the programming or rules of the simulation or model.
• The cost of a simulation model can be high.
• The cost of running several different simulations may be high.
• Time may be needed to make sense of the results.
• People’s reactions to the model or simulation might not be realistic or reliable
(ii) Oral Testing: An oral test is a test that is answered orally (verbally). The teacher or oral
test assessor will verbally ask a student, who will then answer it using words.
Advantages
• The oral test provides direct contact between the examiner and the examinee.
• More than one examiner can assess more than one candidate simultaneously
• Provides an opportunity to evaluate the strong and weak areas of each learner.
• Provides an opportunity to question the candidate about how she arrived at the answer.
• Provides the examiner with an opportunity to clarify the question in case the candidate
has not understood.
Disadvantages
• It depends heavily on the examiner's experience and their ability to retain in their minds
an accurate impression of the standard required.
• Lacks standardisation; hence the results of the test cannot compare across candidates
• Expensive because an examiner cannot examine more than fifteen students in a day.
• Lacks objectivity because it is very subjective; examiner can be affected by other
factors external to the test.
• Lacks a precise definition of the criteria for the award of a satisfactory rate
18
In conclusion, test construction and administration are critical components of effective
testing/assessment to evaluate and improve learning performance and quality educational
outcomes.
.
Group Activity
Participants (in groups) will be asked to discuss with group members the type of CBT that can
be adopted for the NMCN Professional examination
• Strategies of implementation
• Perceived challenges and
• Suggestions on the way forward
Then, a group report should be submitted, or group presentations can be conducted.
References
1. S.M. Downing, in International Encyclopedia of Education (Third Edition), 2010
2. Reference: McAllister, D., and Guidice, R.M. (2012). This is only a test: A machine-
graded improvement to the multiple-choice and true-false examination. Teaching in
Higher Education, 17 (2), 193-207.
3. Ahmad RG, Hamed O Impact of adopting a newly developed blueprinting method and
relating it to item analysis on students' performance. Med Teach. 2014 Apr; 36 Suppl
1():S55-61
4. Clark, D. (2010). Bloom’s taxonomy of learning domains. The three types of
learning.
5. Anderson, L., & Krathwohl, D. A. (2001). Taxonomy for learning, teaching and assessing:
A revision of Bloom's Taxonomy of Educational Objectives. New York: Longman.
6. Armstrong, P. (n.d.). Bloom’s Taxonomy. Center for Teaching, Vanderbilt University.
7. Bloom, B. S.; Engelhart, M. D.; Furst, E. J.; Hill, W. H.; Krathwohl, D. R.
(1956). Taxonomy of educational objectives: The classification of educational goals.
Handbook I: Cognitive domain. New York: David McKay Company.
8. Harrow, A.J. (1972). A taxonomy of the psychomotor domain. New York: David McKay
Co.
19