Norm and Criterion-Referenced Tests
Norm and Criterion-Referenced Tests
Norm and Criterion-Referenced Tests
array of educational situations that exist in today’s schools. Testing can rank students with each
other or some other sociocultural norm, or testing can be based on some performance criteria that
focus on assessing certain understandings or skill set. Ideally, a combination of both testing
types exists in a way that is valid, reliable, and fair. Thus, given that many classrooms contain
students with different socioeconomic and cultural backgrounds, testing becomes quite a
challenge. Therefore, in order to assure that all students receive the most appropriate feedback, a
variety of testing techniques is needed so that proper decisions and actions can be made that best
Virtually all students have taken some kind of standardized test by the time they enter
high school or college. Moreover, many standardized tests (i.e., high stakes tests) are used as a
condition of graduation, acceptance, or financial aid. Because these tests are used as a way to
rank or compare students, they are often referred to as norm-referenced tests (NRT) (Kubiszyn
and Borich, 2007). NRTs are commonly used when stakeholders are interested in the central
tendency of the results of a group of students, as when descriptive statistics are used to find the
average, mean, median, and mode of a particular data set. When using tests to diagnose or to
figure the aptitude of a student, inferences are made based on how students compare with each
other or some other sample based on a social norm. Since results are “objective” – test items are
usually in terms of right and wrong answers – and since many tests can be applied at once, NRTs
are typically more appropriate for making decisions that are non-instructional based.
In addition to NRTs being used externally to rank students (e.g., SAT, ACT, etc.),
teachers oftentimes use NRTs to test students in the classroom. Multiple-choice, true-false,
Norm and criterion-referenced tests 2
matching, and essay questions are common testing types that fall under this same category. Test
results are gathered, averaged, and ranked in order for teachers to make their best inference as to
what level a student has understood, obtained the necessary skill set, or developed the intended
disposition based on the goals and objectives of the classroom. Subsequently, instructional
decisions are often made based on these results either by reviewing past information that students
continue to struggle with or continuing on with new information that makes up part of the
curriculum. Having framed NRT first as an external instrument, such as an ACT, then as an
internal instrument used by teachers in their classrooms, one can see a noticeable difference in
why they are being used in each circumstance. The former is to make decisions regarding
achievement while the latter is to make decisions regarding instruction. This distinction is
important when talking about a second type of test that is based on criteria.
Instead of ranking students to some certain norm, another testing method aids in basing
students performance in terms of meeting certain criteria. Kubiszyn and Borich (2007) define
criterion-referenced test (CRT) as tests that “tells us about a student’s level of proficiency in or
mastery of some skill or set of skills” (p. 66). Wiggins and McTighe (2005) also put forth the
notion of promoting the six facets of understanding (e.g., explain, interpret, apply, perspective,
empathy, and self-knowledge) when testing students regarding what they know and their
disposition they possess. In other words, CRTs can provide teachers with greater insight on
performance criteria. Rubrics are often used in order to qualitatively assess performances and
products. Arter and McTighe (2001) distinguish between a holistic and analytical trait rubric
when they state “A holistic rubric gives a single score or rating for an entire product or
performance based on an overall impression of a student’s work” and “an analytical trait rubric
Norm and criterion-referenced tests 3
divides a product or performance into essential traits or dimensions so that they can be judged
Communicating these “essential traits” with students provides the basis for what constitutes a
“good” and “bad” performance or product, and is essential in setting the expectations between
teacher and student. Indeed, CRTs are specifically suited for assessing understandings,
(Popham, 2008, p. 73) drive the level of predictability an instrument has in making proper
versions of the ACT, for example, are expected to contain test items that measure the same
content. Similarly, the same ACT should yield similar results (i.e., a high correlation coefficient)
if students retake the exam without being exposed to a learning intervention in the interim. The
validity of a test pertains to the three Cs: “content, criterion, and construct” (Popham, 2008, p.
53). Content validity addresses how test items represent concepts that are covered in the
curriculum. Criterion validity in NRTs deals with how accurate the testing items are in
predicting future behavior (e.g., ACT and SAT scores and subsequent academic success or
failure). Criterion validity in CRTs deals with rubric traits and how valid they are in terms of a
student’s future performance. The final C, construct validity, has to do with how a student’s
performance over time is gauged in terms of meeting criteria that is aligned to the curriculum.
And finally, absence-of-bias centers on how test items present information that is fair; that is,
does not lean towards a certain group of people based on socioeconomic status, race, ethnic
NRTs and CRTs should not be considered dichotomous, but are two different approaches
to assessing students in a complementary way. Ranking and comparing students has a purpose
when the goal is to measure achievement and to predict future academic success. Conversely,
testing understandings, knowledge, skills, and disposition through performance and product
criteria serves a vital role in making inferences that influence instructional decisions and student
tactic adjustments. In order for tests to be valid, reliable, and absent of bias, test designers
should conduct a variety of reviews to assure that tests measure curricular aims, are reliable
within the same and different versions of an exam, and do not discriminate minority groups
based on age, race, gender, socioeconomic status, or sexual orientation. Tests are the link
between the written and taught curriculum, the ideal and the reality of what schools are for all its
stakeholders. Thus, in order to continue the development and improve the feedback that tests
provide all of its stakeholders, a collaborative effort is needed in bringing together a community
References
Arter, J. and McTighe, J. (2001). Scoring rubrics in the classroom: Using performance criteria
for assessing and improving student performance. Thousand Oaks: CA: Corwin Press.
Popham, W. (2008). Classroom assessment: What teachers need to know. New York: Pearson.