Udsm Muce Ep300 Basic Concepts Emr
Udsm Muce Ep300 Basic Concepts Emr
Udsm Muce Ep300 Basic Concepts Emr
Permission is granted under a Creative Commons Attribution licence to replicate, copy, distribute,
transmit or adapt this work freely, provided that attribution is provided as illustrated in the citation
below. To view a copy of this licence, visit http://creativecommon.org/licenses/by/3.0 or send a letter to
Creative Commons, 559 Nathan Abbott Way, Stanford, California, 94305, USA.
Citation:
Mkwawa University College of Education (MUCE). (2013). Educational Measurement and Evaluation.
Iringa: MUCE.
MUCE welcomes feedback on these materials and would like to hear from anybody who has used them
as is or used and adapted them or who would be interested to work with MUCE more generally.
PRINCIPAL ADDRESSES
Tel. 026-2701191
DEPUTY PRINCIPAL (ADMINISTRATION)
Fax: 026-2702751
P.O. BOX 2513
E-
IRINGA
mail:dpadministration.muce@udsm.ac.tz
2
Course: Educational Measurement and Evaluation
Introduction
This course aims at introducing you to basic concepts of educational
measurement, monitoring, assessment and evaluation. Furthermore, it seeks
to equip you with basic knowledge and skills that are important for
developing tools for measurement, assessment, and monitoring of educational
attainments and institutional performance.
Learning outcomes
Course Outline
Module 1: Basic Concepts in Measurement and evaluation
1.1 Measurement, Evaluation, Testing
1.2 Assessment and Monitoring
1.3 Purpose of Evaluation
1.4 Instructional objectives and evaluation
1.5 Taxonomies of Educational Objectives
3
3.2 Test administration and marking
3.3 Summarizing test results
3.4 Item analysis: level of difficult and discrimination
3.5 Reporting test performance
Course Evaluation :
Coursework 40%
Final Examination 60%
References
Core readings
1. Airasian, P.W. (2001). Classroom assessment: Concepts and applications (4th
ed.). NY: McGraw Hill.
2. Cohen, R.J. & Swerdlik, M.E. (2005). Psychological testing and assessment.
NY: McGraw Hill.
3.
4. Ingule, F.O., Rono, R.C., Ndambuki, P.W. (1996). Introduction to
educational psychology. Nairobi: East African Educational
Publishers
5. Ebel R. L. and Frisbie, (1991), D.A. Essentials of Education
Measurement. New York: Prentice Hall.
6. Gronlund, N.E. & Linn, R.L. (1990). Measurement and Evaluation in Teaching
N.Y.
Macmillan,.
6. Sax, G. (1997). Principles of educational and psychological measurement and
evaluation. London: Wadsworth Publishing Company
7. Omari, I.M. (1995). Conceptualizing quality in Primary Education, Papers in
Education and Development, 16, 25 – 48.
8. Gronlund N.E(1985). Stating Objectives New York: Macmillan.
4
Module 1: Basic Concepts in Measurement and
Evaluation
Overview
Testing, measurement and evaluation are important activities in the teaching
and learning process. Without them, it would have been very difficult for
teachers to track and follow-up their students’ achievement. Equally
important, are the concepts monitoring and assessment. These two additional
concepts supplement/complement the roles played by the other concepts.
In this module, the five concepts are defined and clarified so that their
similarities and differences are addressed when it comes to their use in
educational settings.
Learning outcomes
By the end of this module you should be able to:
define basic concepts: test, measurement assessment, monitoring and
evaluation.
distinguish various types of evaluation in education.
explain the purposes of assessment and evaluation in teaching and
learning.
state basic principles of evaluation.
use the taxonomies of educational objectives in stating instructional
objectives and designing assessment tasks.
5
Case study 1.1
One day, tutors at Mkanyageni Teachers College were arguing about which concept
was more appropriate to use when addressing different areas pertaining to their
students’ achievement. One of them said that “If I want to determine how good my
students are in my subject, then measurement would be appropriate to use”. Another
one argued that “For me, if I want to determine my students’ achievement, I would
use tests”. The last one in this conversation claimed that “For me, assessment is more
appropriate for that purpose”. It was not easy for these tutors to reach an agreement
on who was right in this issue.
Having gone through Case Study 1.1, attempt Activity 1.1. To get more support in
attempting the activity, go to learning Resource 1.1.
Activity 1.1
Attempt the following question:
Among the three tutors in the Case Study 1, which one was using the
appropriate concept that would help him/her get comprehensive data about
his/her students’ achievement? Why?
Resource 1.1
What is a test?
What is measurement?
6
What is evaluation?
Evaluation
Measurement Non-measurement
(e.g. Testing) and/or (e.g. informal observation)
plus plus
Value Judgments
(e.g. good learning progress)
7
Assessment versus Evaluation
The terms assessment and evaluation are related and often used
interchangeably – yet they differ when used in an educational or
training context.
Assessment is the systematic, continuous process of monitoring the
various pieces of learning to evaluate student achievement and
instructional effectiveness – includes tests, homework assignments,
class projects, class presentations, class participation and teacher
observation.
In short, assessment means those activities that are designed to
measure learner achievement brought about as result of an
instructional programme of some sort. Figure 1.2 shows the
characteristics of effective assessment.
Evaluation refers to a broader process that involves examining many
components of a whole and making instructional decisions.
The evaluation process focuses upon determining the attainment of
previously-established priorities and goals.
Evaluation helps document the effectiveness of a course or
programme, identifies weaknesses and strengths, and spots areas in
need of revision.
In short, evaluation refers to a series of activities that are designed to
measure the effectiveness of the instructional system or a component
thereof.
The two processes are closely related – the results of student
assessment constitute one of the most important sets of data in the
evaluation of any course or curriculum.
8
Congruent with
instructional objectives
student – valid
centered Effective
assessment
is:
comprehensive relevant
salient
Activity 1.2
Answer the following questions in not more than one page and then send your
answers through the following email address………
1. Why is it important for you to be able to identify the differences and
similarities among the following concepts: test, measurement, assessment,
monitoring and evaluation?
9
Learning Unit 1.2: Purposes and Types of Evaluation
Introduction
Evaluation as an umbrella term among the five concepts, serves different
purposes. Firstly, evaluation can be used to determine the effectiveness of
courses and educational programmes. Secondly, evaluation helps to provide a
basis for improving courses or programmes. In order to gain some insights
about the concept evaluation as it is used in education, go through a Case
Study 2.
Activity 1.3
From the Case Study 2, attempt to classify the types of tests that are
conducted at the mentioned secondary school.
Resource 1.2
Purposes of evaluation
10
to determine the effectiveness of courses and educational programme.
to provide a basis for improving courses/ programme.
1. Placement Evaluation
2. Formative evaluation
Tests used for formative evaluation are most frequently teacher made.
Observational techniques are also useful in monitoring pupils’ progress and
identify successes and challenges in learning.
3. Diagnostic Evaluation
4. Summative Evaluation
11
Summative evaluation comes at the end of course (or unit) of instruction. It
aims at determining how students have attained instructional objectives and
providing information to grade students and/or to evaluate teacher
effectiveness. Its general purpose is grading students or certification of
student achievement or providing information for judging the
appropriateness of the course objectives and effectiveness of instruction.
Purposes of Assessment
Assessment shapes what students learn and how they learn. It may serve the
following functions:
Introduction
Objectives
1. Understands the multiplication of fractions.
2. Fold papers.
3. Derive rule for multiplication. 12
4. Have students solve problems on page 25.
“What is this?” a friend asks, pointing to the lesson plan about folding papers
in the book.
“Oh, I have them fold papers into thirds, we write the fractions on the board,
and then I have them take a half of the third, write it again, and they see that
one-half of one-third is one-sixth and then fold the papers into some other
fractions. They see a pattern and we derive the rule for multiplying fractions.
Then I have them practice, and on their tests they have to solve problems like
those.”
Activity 1.4
Look at Mr. Chakubanga’s objective # 1. Is the objective acceptably written?
Resource 1.3
13
The action verb is the most important element of an objective and can never
be omitted. The action verb states precisely what the student will do
following instruction.
Example
After teaching the topic on the elements of weather we might expect students
to be able to:
list the elements of weather.
identify the elements of weather.
distinguish among the elements of weather.
Remember
Don’t state objectives in terms of:
teacher performance (e.g. teach pupils elements of weather)
learning process (e.g. pupil learns elements of weather)
course content (e.g. student studies elements of weather).
two objects (student lists and explains elements of weather)
Introduction
Dear student, in Learning Unit 1.3 you learned about instructional objectives
and their importance in the teaching and learning process. In the present unit,
you are going to learn about what is called Taxonomy of Educational
Objectives (TEO). This is a useful guide for developing comprehensive list of
instructional objectives. It attempts to identify and classify all possible
educational outcomes. The taxonomy is classified into three major domains:
Cognitive domain (Bloom et al., 1956)
Affective domain (Krathwohl, 1964)
Psychomotor domain (Simpson, 1972).
Resource 1.4
14
1. Knowledge: This is defined as remembering of previously learned
material. Represents the lowest level of learning outcomes.
Sample verbs: write, list, label , name, state, define.
Example: List down 6 levels of Bloom’s taxonomy of the
cognitive domain.
4. Analysis: This level refers to the ability to break down material into its
component parts so that its organisational structures are known. Also it
may refer to breaking down complex information into simpler parts.
Sample verbs: analyse, categorise, compare, distinguish,
contrast, separate.
Example: Students will compare and contrast the knowledge
and comprehension levels of cognitive domain.
15
Major categories in the affective domain
The affective domain (Krathwohl, Bloom, & Masia, 1973) includes aspects
such as feelings, values, appreciation, enthusiasm, motivation, and attitudes.
The five major categories are listed from the simplest behaviour to the most
complex.
1. Receiving: This refers to being aware or attending to something in the
environment or students willingness to attend to particular stimuli or
phenomena. Represents lowest level of learning outcomes in the
affective domain.
Sample verbs: asks, chooses, follows, gives, replies, uses.
Example: Listen to others with respect.
16
Major categories in the psychomotor domain
The domain includes physical movement, co-ordination and use of motor
skills. Development of the skills requires practice and is measured in terms of
speed, precision, distance, procedures, or techniques in execution. It contains
seven categories listed in order from simplest behaviour to the most complex.
2. Set: Readiness to act. It includes mental, physical and mental sets. The
three sets are dispositions that predetermine a person’s response to
different situations (sometimes called mindsets). It requires the learner
to demonstrate an awareness or knowledge of the behaviours needed
to carry out skill.
Sample verbs: demonstrate, show, displays, explains, reacts,
shows, moves, etc.
Example: Shows desire to type efficiently.
5. Complex over response: This level involves the ability to perform the
complete psychomotor skill correctly. In this case, proficiency is
indicated by a quick, accurate and highly coordinated performance,
requiring a minimum of energy. This category includes performing
without hesitation and automatic performance.
Sample verbs: carry out, operate, perform, assembles, builds,
manipulates, measures, mends, etc.
Example: Operates a machine quickly and accurately.
17
6. Adaptation: In this case skills are well developed and a student can
modify motor skills to fit new situation or spcial requirements.
Sample verbs: adapt, changes, modify, rearranges, reorganises,
revises, alters, etc.
Example: Modifies instructions to meet the needs of learners.
Activity 1.5
Dear student attempt the following questions so see whether you have been
able to understand the Taxonomy of Educational Objectives.
(i) The student volunteers her answer by raising her hand in class. Which
domain is best illustrated by this students’ behavior?
(ii) You are a physical education teacher. You want your students to think
that health and fitness are important and should be a part of their life
style. In which domain of Taxonomy of Educational objectives would
your goal would be best classified?
Overview
We know that tests shape what students learn and how they learn. The
construction of good tests requires specific skills and experience, which are
neither easy to acquire nor widely available. This module serves as an
introduction to principles of test construction. It offers you knowledge and
skills in planning the tests and use of tables of specifications in constructing
test items.
Learning outcomes
18
construct tests that measure the extent to which students have achieved
the learning objectives of the course.
design test items that evaluate the appropriate level of learning
outcomes.
discriminate among testing methods and choose appropriate measures.
develop meaningful formative communications with students.
evaluate classroom tests for reliability and validity.
Resource 2.1
A good test reflects the goals of the course. It is congruent with the skills that
you want students to develop and with the content you emphasize in the
class. A test that covers a much broader range of material than that covered in
the class will be regarded as unfair by your students, even if you tell them
that they are responsible for material that has not been discussed in class.
This is a two-way chart which relates the instructional objectives to the course
content and specifies the relative emphasis to be given to each type. The
purpose of the table of specification is to provide assurance that the test will
measure a representative sample of the learning outcomes and the subject
matter topics to be measured.
19
Preparing a list of instructional objectives: This describes the type of
performance the pupils are expected to demonstrate.
Outlining the instruction content. The amount of detail to include in the
content outline is somewhat arbitrary but is should be detailed enough
to ensure adequate sampling during test construction and proper
interpretations of results.
Preparing the two way chart by listing the major content areas down the
left side of the table.
You need to determine what proportion of the test items should b devoted to
each objective and each content area. Here a number of factors should be
considered:
Activity 2.1
Select a topic from your subject of specialization and prepare a Table of
specification that will guide you to assess the achievement of your students
on the topic.
Resource 2.2
Validity
Meaning of validity
Validity refers to the appropriateness of the interpretations made from test
scores and other evaluation results, with regard to a particular use. For
example, if a test is to be used to describe pupil achievement, we would like
to be able to interpret the scores as relevant and representative sample of the
achievement domain to be measured, to predict future performance, should
be an accurate estimate of future performance. Validity refers to the degree to
which test scores serve their intended use. A valid assessment procedure is
one which actually tests what it sets out to test, that is, one which accurately
measures the behaviour described by the learning outcomes under scrutiny.
Obviously, no one would deliberately construct an assessment item to test
trivia or irrelevant material, but it is surprising just how often non-valid test
items are in fact used. As we will see later in the review of assessment
methods, validity-related problems are a common weakness of many of the
20
more widely-used methods. For example, a simple science question given to
14-year-old schoolchildren ('Name the products of the combustion of carbon
in an adequate supply of oxygen') produced a much higher number of correct
answers when the word 'combustion' was replaced by 'burning'. The original
question had problems of validity in that it was, to some extent, testing
language and vocabulary skills rather than the basic science involved.
Test validation
This refers to the procedures used to establish the validity of a test. Validity is
viewed as a unitary concept, instead of talking of various types of validity, we
talk of the various kinds of evidence. Validity is established by showing
evidence for it.
This is concerned with how well the sample of test items represents the
domain (area) to be measured. This can be done by comparing test items to
the test specifications describing the task domain under consideration.
21
Test too short – a test is only a sample of the many questions that might be
asked. Thus a test which is too short fails to provide a representative
sample of performance.
Reliability
Meaning of reliability
The reliability of an assessment procedure is a measure of the consistency
with which the question, test or examination produces the same results under
different but comparable conditions.
A reliable test should give similar results even through different testers
administer it, different people score it, different forms of the test are given,
and the same person takes the test at two or more different times. It is
obviously important to have reasonably reliable assessment procedures when
a large number of individual markers assess the same question (e.g. in
national school examinations, or with many postgraduates marking lab
work). A student answer which receives a score of 75 per cent from one
marker and 35 per cent from another, for example, reveals a patently
unreliable assessment procedure.
Note
Reliability refers to the results obtained with an evaluation instrument and
not to the instrument and not to the instrument itself – Any particular
instrument may have a number of different reliabilities, depending on the
group involved and the situation in which it is used.
Reliability is a necessary but not a sufficient condition for validity. A test
that is highly consistent test results may be measuring the wrong thing or
may be used in inappropriate ways.
Reliability is primarily statistical. The logical analysis of a test will provide
little evidence concerning the reliability of scores; computations must be
made to establish reliability.
22
The test-retest method is essentially a measure of examiners
reliability, an indication of how consistently examinees perform
on the same set of tasks.
To estimate reliability by means of the test-retest method, the
same test is administered twice to the same group of pupils with a
given time interval between two administration. The resulting test
scores are correlated, and this correlation coefficient provides a
measure of stability that is, it indicates how stable the result are
over the given period of time.
(i) K – R 20
k pq
r 1
k 1 s 2
23
s 2 = variance of the scores on the test.
(ii) K – R 21
r 1
k X kX
k 1 ks2
r 1
k X kX
k 1 ks2
1. Length of test - The longer the test is, the higher its reliability will be.
This is because longer test will provide a more adequate sample of
behaviour being measured, and the scores apt to be less distorted by
chance factors such as guessing.
2. Spread of Scores: Other things being equal, the larger the spread of
scores is the higher the estimate of reliability coefficient result when
individual and to stay in the same relative position in a group from one
testing to another, it naturally follows that anything that reduces the
possibility of shifting positions in the group also contributes to larger
reliability coefficients.
3. Difficulty of test: Test that are too easy or too difficult for the group
members taking it will tend to produce scores of low reliability. This is
because both easy and difficult test result in a restricted spread of
scores.
4. Objectivity: The objectivity of a test refers to the degree to which
equally competent scores obtain the same results. Most standardised
24
tests of aptitude are high in objectivity. The test items are of the
objective type and resulting scores are not influenced by scores
judgement or opinion.
Practicability
Fairness
Usefulness to students
In principle, both selection and supply items can be used to test a wide range
of learning objectives. In practice, most people find it easier to construct
selection items to test recall and comprehension and supply items to test
higher-level learning objectives. Selection items that require students to do
such things as classify statements as fact or opinion go beyond rote learning,
and focused essay questions can easily stay at the recall level.
25
Multiple choice items
True false items
Matching items
Multiple-choice items offer the most versatility of all item types. Their uses
include testing for factual recall as well as measuring level of understanding
and the application of concepts. Multiple-choice items may be used to test all
levels of thinking, although higher level items are more difficult to compose.
As a result, the ease of constructing items that only require lower levels of
thinking often leads to tests that address only these levels.
The term stem refers to the part of the item that asks the question. The terms
responses , choices , options , and alternatives refer to the parts of the item that
are used to answer the question.
26
Rules of Response Writing
Advantages
Limitations
27
True/false items may not give a true estimate of the students' knowledge
since half can be correct by chance.
They are extremely poor for diagnosing student strengths and
weaknesses.
They are generally considered to be "tricky" by students.
They tend to be either very easy or very difficult.
They do not distinguish between varying degrees of learning well
Keep the language simple and clear. Avoid ambiguous and trick items.
Use a relatively large number of items (75 or more when the entire test is
T/F).
Be aware that extremely long or complicated statements will test reading
skill more than content knowledge.
Avoid the use of negatives, especially double negatives.
Make sure that the statements used are entirely true or entirely false.
Use certain key words sparingly since they may provide clues to the
correct answers. The words all, always, never, every, none, and only usually
indicate a false statement, while the words generally, sometimes, usually,
maybe , and often are often used in true items.
Use precise terms, such as 50% of the time, rather than less precise terms,
such as several, seldom , and frequently .
Use more false than true items, but not more than 15% more. (False items
tend to discriminate more than true items.)
Matching Items
Advantages
28
Disadvantages
Use only homogeneous material in a set of matching items (i.e, dates and
places should not be in the same set) to reduce the possibility of guessing
the correct answers.
Place the more involved expressions in the stem and keep the responses
short and simple.
Supply directions that clearly state the basis for the matching, indicating
whether or not a response can be used more than once, and stating
where the answer should be placed.
Make sure that there are never multiple correct responses for one stem
(although a response may be used as the correct answer for more than
one stem).
Avoid giving grammatical clues to the correct response.
Arrange items in the response column in some logical order—
alphabetical, numerical, chronological—so that students can find them
easily.
Avoid breaking a set of items (stems and responses) over two pages.
Use no more than 15 items in one set.
Provide more responses than stems to make process-of-elimination
guessing less effective.
Use capital letters for the response signs rather than lower-case letters.
Completion Items
On the whole, they offer little advantage over other item types unless the
need for specific recall is essential.
Advantages
Disadvantages
29
Suggestions for Constructing Completion Items
Providing clear and concise cues about the expected response in the
statement.
When possible, providing explicit directions as to what amount of
variation will be accepted in the answers.
Avoiding using a long quote with multiple blanks to complete.
Requiring only one word or phrase in each blank.
Facilitating scoring by having the students write their responses on lines
arranged in a column to the left of the items.
Asking students to fill in only important terms or expressions.
Avoiding providing grammatical clues to the correct answer by using
a/an, etc., instead of specific modifiers.
These are suitable for measuring a wide variety of relatively simple learning
outcomes: They can be used for measuring knowledge of terminology,
knowledge of specific facts, knowledge of principles and knowledge of
procedure or method and Simple interpretations of data.
Limitations
Word the item so that the required answer is both brief and specific.
Do not take statement directly from textbooks to use as a basis for short
answer items.
A direct question is generally more desirable than an in complete
statement.
If the answer is to be expressed in numerical its, indicate the type f
answer wanted.
Blanks for answers should be equal in length and in column to the right
of the question.
When completion are used, do not include too may blanks.
30
Essay Items
The distinctive feature of essay question is the freedom of response. Pupils are
free to select, relate and present ideas in their own words. Although this
freedom enhances the value of essay questions as a measure of complex
achievement, it introduces scoring difficulties that make them inefficient as a
measure of factual knowledge. Essay items are primarily used to measure
these learning outcomes that cannot be measured by objective test items.
Many instructors consider essay questions the ideal form of testing, since
essays seem to require more effort from the student than other types of
questions.
Essay responses allow us to see our students' thought processes that lead to
the answers. We may be testing at some higher level of Bloom's taxonomy of
thinking—perhaps within the level of synthesis—but discover in a student's
answer that he/she lacked the knowledge required to begin synthesis.
Limitations
Read a few papers before you actually start grading in order to get an
idea of the range of quality.
Some instructors select "range finder" papers—middle range A, B, C and
D papers to which they refer for comparison.
Stop grading when you get too tired or bored. When you start again,
read over the last couple of papers you graded to make sure you were
fair.
Conceal the student's name while you grade the response. If you know
the identity of the student, your overall impressions of that student's
work will inevitably influence the scoring of the test.
31
If there is more than one essay question on the test, grade each essay
separately rather than grading a student's entire test at once. Otherwise,
a brilliant performance on the first question may overshadow weaker
answers in other questions (or vice-versa).
Remain open to legitimate interpretations of the questions different from
your own. If students misinterpret the intent of your question, or if your
standards are unrealistically high or low, you should alter your model
response in light of this information.
Restrict the use of essay question to those learning outcomes that cannot
be satisfactorily measured by objective items
Formulate question that will call forth the behaviour specified in the
learning outcomes.
Phrase each questions so that the pupils tasks is clearly indicated
Indicate an appropriate time limit for each question
Avoid the use of optional questions.
Note possible test questions throughout the term ; writing a few after
each class is ideal.
Avoid trick questions ; if the majority of students get a question wrong,
it was likely a poor question.
Keep questions as brief as possible to eliminate the need for speed
reading and writing.
Use a variety of question types .
Group similar type questions so students don't constantly shift
response patterns.
If using a series of questions in which answering successfully depends
on knowing the correct answer to a previous item , grading should take
early errors into account and students should be informed of this.
32
Arrange items in order of difficulty to avoid discouraging students at
the exam beginning.
Weight points according to question type, amount of learning assessed
and time students should spend answering.
Avoid patterns in the response key.
Strive to include items which test the higher levels of thinking and
learning.
Minimize such qualifying words as ‘always’ and ‘never’.
Use the positive statement whenever possible.
Don't give too many clues to answers in preceding or subsequent items.
33