0% found this document useful (0 votes)
283 views

Copy Module 5 8 Final Term Assessment of Learning

Uploaded by

Nicole Hordejan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
283 views

Copy Module 5 8 Final Term Assessment of Learning

Uploaded by

Nicole Hordejan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Module in Ed 310: Assessment in Learning I

Module 5: DEVELOPMENT OF VARIED ASSESSMENT TOOLS

LEARNING OUTCOME
 Construct paper-and-pencil tests in accordance with the guidelines in test construction.

INTRODUCTION
5.1 Types of Objective Tests
We are concerned with developing objective test for assessing the attainment of educational
objectives based on Bloom’s taxonomy in this Chapter. For, this purpose, we restrict our attention to the
following types of paper-and-pencil test: ( a ) true-false items , ( b ) multiple-choice type test , ( c )
matching items , ( d ) enumeration ( e ) completion and (f ) essays.
Development of paper-and-pencil tests requires careful planning and expertise in terms of actual
test construction. The more seasoned teachers can produce true-false items that can test even higher
order thinking skills and not just rote memory learning. Essays are easier to construct than the other type
of tests but the difficulty with paper-and-pencil tests grades are derived from essay examinations often
discourage teachers from using this particular form of examination in actual practise.
5.2 Planning a Test and Construction of Table of Specifications (TOS)
The important steps in planning for a test are:
 Identifying test objectives/ lesson outcomes
 Deciding on the type of objective test to be prepared
 Preparing a Table of Specifications (TOS)
 Constructing the draft items
 Try-out and validation
Identifying Test Objectives. An objective test, if it is to be comprehensive, must over the
various levels of Bloom’s Taxonomy. Each objective consists of a statement of what is to be achieved
preferably by the students.
Example. We want to construct a test on the topic: “Subject-Verb Agreement in English” for a
Grade V class. The following are typical objectives.
Knowledge / Remembering. The students must be able to identify the subject and the verb in a
given sentence.
Comprehension/Understanding. The students must be able to determine the appropriate form
of a verb to be used given the subject of a sentence.
Application/Applying. The students must be able to write sentences observing rules on subject-
verb agreement.
Analysis/Analyzing. The students must be able to break down a given sentence into its subject
and predicate.
Evaluation/ Evaluating. The student must be able to formulate rules to be followed regarding
subject-verb agreement.
Synthesis/Creating. The students must be able to formulate rules to be followed regarding
subject-verb agreement.
Deciding on the type of objective test. The test objectives guide the kind of objective tests that
will be designed and constructed by the teacher. For instance, for the first four (4) levels, we may want
to construct a multiple-choice type of test while for application and judgment, we may opt to give an
essay test or a modified essay test.
Preparing a table of specifications (TOS). A Table of Specifications or TOS is a test map that
guides the teacher in constructing a test. The TOS ensures that there is balance items that test lower level
thinking skills (or alternatively, a balance between easy and difficult items) in the test. The simplest
TOS consists of four columns: (a) level of objective to be tested (b) statement of objective, (c) item
numbers where such an objective is being tested, and (d) Number of items and percentage out of the
total for that particular objective. A prototype table is shown below:
Table of Specifications Prototype
LEVEL OBJECTIVE ITEM NUMBERS NO. %
1. Knowledge Identify subject -verb 1,3,5,7,9 5 14.29%
1
Module in Ed 310: Assessment in Learning I
2. Comprehension Form appropriate verb forms 2,4,6,8,10 5 14.29 %
3. Application Write sentences observing rules 11,13,15,17,19 5 14.29%
on subject-verb agreement
4. Analysis Determine subject and predicate 12,15,18,21,23 5 14.29%
5. Evaluation Evaluate whether or not a 13,16,19,22,24 5 14.29%
sentence observes rules on
subject-verb agreement
6. Synthesis Formulate rules on subject-verb Part II 10pts 28.57%
agreement
TOTAL 35 100%

In the table of specifications we see that there are five items that deal with knowledge and these
items are items 1, 3, 5, 7. 9. Similarly, form the same table we see that five items represents analysis,
namely: 11, 15, 18, 21, 23. The first five levels of Bloom’s taxonomy are equally represented in the test
while synthesis (tested through essay) is weighted equivalent to ten (10points) or double the weight
given to any of the first four levels. The table of Specifications guides the teacher in formulating the test.
As we can see, the TOS also ensures that each of the objectives in the hierarchy of educational
objectives is well presented in the test. As such, the resulting test that will be constructed by the teacher
will be more or less comprehensive. Without the Table of Specifications, the tendency for the test maker
is to focus too much on facts and concepts at the knowledge level.
Constructing the test items. The actual construction of the test items follows the TOS. As a
general rule , it is advised that the actual number of items to be constructed in the draft should be double
the desired number of items, For instance , if there are five (5) knowledge level items to be included in
the final test form , then at least ten (10) knowledge level items should be included in the draft. The
subsequent test try-out and item analysis will most likely eliminate many of the constructed items in the
draft (either they are too difficult, too easy or non-discriminatory), hence, it will be necessary to
construct more items than will actually be included in the final test form.
Most often, however, the try-out is not done due to lack of time.
Item analysis and try-out. The test draft is tried out to a group of pupils or students. The purpose
of this try out is to determine the: (a.) item characteristics through item analysis, and (b.) characteristics
of the test itself-validity, reliability, and practically.
5.3 Constructing a True-False Test
Binomial-choice or alternate response test are tests that have only two (2) options such as true or
false, right or wrong, yes or no, good or better, check or cross out and so on. A student who knows
nothing of the content of the examination would have 50% chance of getting the correct answer by sheer
guess work. Although correction-for-guessing formulas exist, it is best that the teacher ensure that a
true-false item is able to discriminate properly between those who know and those who are just
guessing. A modified true-false test can offset the effect of guessing by requiring students to explain
their answer and to disregard a correct answer if the explanation is incorrect. Here are some rules of
thumb constructing true-false items.
Rule 1. Do not give hind (inadvertently) in the body of the question.
Example. The Philippines gained its independence in 1898 and therefore celebrated its centennial
year in 2000._______
Obviously the answer is FALSE because 100 years from 1898 is not 2000 but 1998.
Rule 2. Avoid using the words “always”, “never”, “often” and other words that tend to be either
always true or always false.
Example: Christmas always falls on Sunday because it is Sabbath day. ________
Statements that use the word “always” are almost always false. A test-wise student can easily
guess his way through a test like these and get high scores even if he does not know anything about the
test.
Rule 3. Avoid long sentences as these tend to be “true”.Keep sentences short.

2
Module in Ed 310: Assessment in Learning I
Example: Test need to be valid, reliable, useful, although, it would require a great amount of
time and effort to ensure that test possesses this test characteristics._______
Notice that the statement is true .However, we also not sure which part of the sentence is deemed
true by the student. It is just fortunate that in this case, all parts of the sentences are true and hence, the
entire sentence is true. The following example illustrates what can go wrong in long sentences:
Example: Test need to be valid, reliable and useful since it takes very little amount of time, money and
effort to construct tests with these characteristics.____________
The first part of the sentence is true but the second part is debatable and may, in fact, be false.
Thus a “true” response is correct and also, a “false” response is correct.
Rule 4. Avoid trick statements with some minor misleading word or spelling anomaly, misplaced
phrases etc. A wise student who does not know the subject matter may detect this strategy and thus get
the answer correctly.
The Raven was written by Edgar Allen Poe.
Allen is misspelled and the answer would be false!
This is an example of a tricky but utterly useless item.
Rule 5. Avoid quoting verbatim from reference materials or textbooks. This practices sends the wrong
signal to the students that is necessary to memorize the textbook word for word and thus acquisition of
higher level thinking skills is not given due importance.
Rule 6. Avoid specific determines or give-away qualifiers. Students quickly learn that strongly worded
statements are more likely to be false than true, example, statements with “never” “no” “all” or
“always”. Moderately worded statements are more likely to be true than false. Statements that are
moderately worded use “many” “often” “sometimes” “generally” “frequently “or “some” usually
should be avoided. e.g. Executives usually suffer from hyperacidity. The statement tends to be correct.
The word “usually” leads to the answer.
Rule 7. With true or false questions , avoid a grossly disproportionate number of either true or false
statements or even patterns in the occurrence of true and false statements.
1.T 6.F 1.T 6.F
2.F 7.F 2.F 7.T
3.F 8.F or 3.T 8.F
o4.F 9.F 4.F 9.T
5.F 10.F 5.T 10.F

For ease of correction, teachers sometime create a pattern of True or False answers. Students will
sense it and may arrive at a correct answer not because he/she really knows the answer but because
he/she senses the pattern.
5.4. Multiple Choice Tests
A generalization of the true-false test, the multiple choice type of test offers the student
with more than two (2) options per item to choose from. Each” item in a multiple choice test consists of
two parts: (a) the stem, and (b) the options. In the set of options, there is a “correct” or “best” option
while all the others are considered “distracters”. The distracters are chosen in such a way that they are
attractive to those who do not know the answer or are guessing but at the same time, have no appeal to
those who actually know the answer. It is this feature of multiple choice type tests that allow the teacher
to test higher order thinking skills even if the options are clearly stated. As in true-false items, there are
certain rules of thumb to be followed in constructing multiple choice tests.
Guidelines in constructing Multiple Choice Items
1) Do not use unfamiliar words, terms and phrases. The ability of the item to discriminate or its level of
difficulty should stem from the subject matter than from the wording of the question.
Example: What would be the system reliability of a computer system whose slave and
peripherals are connected in parallel circuits and each one has a known time to failure probability
of 0.05?

3
Module in Ed 310: Assessment in Learning I
A student completely unfamiliar with the terns “slave” and “peripherals” may not be able
to answer correctly even if he knew the subject matter of reliability.
2) Do not use modifiers that are vague and whose meanings can differ from one person to another to the
next such as: much, often, usually, etc.
Example:
Much of the process of photosynthesis takes place in the
a. bark
b. leaf
c. stem
The qualifier “much” is vague and could have been replaced by more specific qualifiers like:
90% of the photosynthetic process” or some similar phase that would be more precise.
3) Avoid complex or awkward word arguments. Also, avoid use of negatives in the stem as this may
unnecessary comprehension difficulties.
Example:
(Poor) As President of the Republic of the Philippines, Corazon Cojuanco Aquino would stand next to
which President of the Philippine Republic subsequent to the 1986 EDSA Revolution?
(Better) Who was the President of the Philippines after Corazon C. Aquino?
4) Do not use negatives or double negatives as such statements tend to be confusing. It is best to use
simpler sentences rather than sentences that would require expertise in grammatical construction.
Example:
(Poor) Which of the following will not cause inflation in the Philippine economy?
(Better) Which of the following will cause inflation in the Philippine economy?
(Poor) What does the statement “Development patterns acquired during the formative years are NOT
Unchangeable “imply?
A. B. C. D.
Better: What does the statement “Development patterns acquired during the formative years are
changeable” imply?
A. B. C. D.
5) Each item stem should be as short as possible; otherwise you risk testing more for reading and
comprehension skills.
6) Distracters should be equally plausible and attractive.
Example:
The short story: May Day’s Eve , was written by which Filipino author?
a.Jose Garcia Villa b.Nick Joaquin
c. Genoveva Pedrosa Matute d.Robert Frost e.Edgar Allan Poe
If distracters had all been Filipino authors, the value of the item would be greatly increased .In
this particular instance, only the first three carry the burden of the entire item since the last two can be
essentially disregarded by the students.

7) All multiple choice options should be grammatically consistent with the stem.
Example:
As compared to the autos of the 1960s autos in the 1980s __________.
A. travelling slower C. to use less fuel
B. bigger interiors D. contain more safety measures
8) The length, explicitness, or degree of technically of alternatives should not be the determinants of the
correctness of the answer. The following is an example of this rule:
Example:

4
Module in Ed 310: Assessment in Learning I
If the three angles of two triangles are congruent, then the triangles are:
a. congruent whenever one of the sides of the triangles are congruent
b. similar
c. equiangular and therefore , must also be congruent
d. equilateral if they are equiangular
The correct choice, “b” may be obvious from its length and explicitness alone. The other choices are
long and tend to explain why they must be the correct choices forcing the students to think that they
are, in fact, not the correct answers!
9) Avoid stems that reveal the answer to another item.
Example:
a. Who will most strongly disagree with the progressivist who claims that the child should be
taught only that which interests him and if he is not interested, wait till the child gets
interested.
A. Essentialist C. Progressivist
B. Empiricist D.Rationalist

b. Which group will most strongly focus its teaching on the interest of the child?
A. Progressivist C.Perrenialist
B. Essentialist D. Reconstructionist
One may arrive at a correct answer (letter b) by looking at item a, that gives the answer to b.
10) Avoid alternatives that are synonymous with others or those that include or overlap others.

Example:
What causes ice to transform from solid state to liquid state?
a. Change in temperature
b. Changes in pressure
c. Change in the chemical composition
d. Change in heat levels
The options a and d are essentially the same. Thus a student who spots these identical
choices would right away narrow down the field of choices to a, b, and c. The last
distractor would play no significant role in increasing the value of the item.
If this happens then the items has two answers, which is not possible.
11) Avoid presenting sequenced items in the same order as in the text.
12) Avoid use of assumed qualifiers that many examinees may not be aware of.
13) Avoid use of unnecessary words or phrases, which are not relevant to the problem at hand (unless
such discriminating ability is the primary intent of the evaluation).The item’s value is particularly
damaged if the unnecessary material is designed to distract or mislead. Such items test the student’s
reading comprehension rather than knowledge of the subject matter.
Example:

5
Module in Ed 310: Assessment in Learning I
The side opposite of the thirty degree angle in a right triangle is equal to half the length of
hypotenuse. If the sine of a 30-degree is 0.5 and its hypotenuse is 5, what is the length of the side
opposite the 30-degree angle?
a. 2.5 b. 3.5 c. 5.5 d. 1.5
The sine of a 30-degree angle is really quite unnecessary since the first sentence already gives
the method for finding the length of the side opposite the thirty-degree angle. This is a case of a
teacher who wants to make sure that no student in his class gets the wrong answer!
14) Avoid use of non-relevant sources of the difficulty such as requiring a complex calculation when
only knowledge of a principle is being tested.
Note in the previous example , knowledge of the sine of the 30-degree angle would have led
some students to use the sine formula for calculation even if a simpler approach would have sufficed.
15) Pack the question in the stem. Here is an example of a question has no question. Avoid it by all
means.
Example:
The Roman Empire _________________.
a. had no central government c. had no definite territory
b. had no heroes d. had no common religion
16) Use the “None of the Above” option only when the keyed answer is totally correct. When choice of
the “best” response is intended, “none of the above” is not appropriate, since the implication has
already been made that the correct response may be partially accurate.
17) Note that use of “all of the above” may allow credit for partial knowledge. In a multiple option
item ,( allowing only one option choice) if a student only knew that two (2) options were correct, he
could then deduce the correctness of “all of the above”. This assumes you are allowed only one correct
choice.
18) Better still use “none of the above “and “all of the above” sparingly. But best not to use them at all.
19. Having compound response choices may purposefully increase difficulty of an item.
The difficulty of a multiple choice item may be controlled by varying the homogeneity or degree of
similarity of responses. The more homogeneous, the more difficult the item because they all look like the
correct answer:
Example:
(Less Homogeneous)
Thailand is located in:
a. Southeast Asia c. Eastern Europe e. South America
b. East Africa d. Central America
(More Homogeneous)
Thailand is located next to:
a. Laos and Kampuchea c. China and Malaya e. India and Malaya
b. India and China d. Laos and China

5.5 Matching Type

6
Module in Ed 310: Assessment in Learning I
The matching type items may be considered modified multiple choice type items where the
choices progressively reduce as one successfully matches the items on the left with the items on the
right.
Guidelines in Constructing Matching Type of Test
Here are some guidelines to observe in the formulation of good matching type of test.
1. Matching homogeneous and heterogeneous items. The items to match must be homogeneous. If
you want your student to match authors with their literary works, in one column will be authors
and in the second column must be literary works. Don’t insert nationality for instance with
names of authors. That will not be good item since it is obviously wrong.
Example of homogeneous items. The items are all about Filipino heroes, nothing more.
Match items in Column A with the items in Column B.
Perfect Matching Type
Example: Match the items in column A with the items in column B.
A B
_____ 1.First President of the Republic a.Magellan
_____ 2.National Hero b.Mabini
_____ 3.Discovered the Philippines c.Rizal
_____ 4.Brain of Katipunan d.Lapu-Lapu
_____5.The great painter e.Aguinaldo
_____6.Defended Limasawa Island f.Juan Luna
g.Antonio Luna

2. The stem (longer in construction than the options) must be in the second column while the
options (usually shorter) must be in second column.
3. The options must be more in number than the stems to prevent the student from arriving at the
answer by mere process of elimination.
4. To help the examinee find the answer easier, arrange the options alphabetically or
chronologically.
Mental Exercise
Analyze the matching type of test below. Is this perfect (an answer may not be repeated)
matching type of test written in accordance with the guidelines given?
Exercise – Matching Type Test
Column A Column B
1.Poly A.sides
2.Triangle B.Eight-sided polygon
3.Pentagon C.Ten-sided polygon
4.Square D.Close plane figure
5.Decagon E.Irving
6.Hexagon F.James
7.Isosceles Triangle G.Melville
8.Octagon H. Mark Twain (Clemens)
9.Gons I. Wharton
Circle J.Many

Matching Type items, unfortunately, often test lower order thinking skills (knowledge level) and are
unable to test higher order thinking skills such as application and judgement skills.
Another type of a matching type of test is the imperfect type.
Below is an example of an imperfect matching type of test. Imperfect because an answer may be
repeated and so like an unfaithful husband or wife can pair with one other than his wife/her husband.
In column 1 are works and writing in American literature and in Column B are their authors. Write
the letter of the author which corresponds to his work on the blank provided before each author. In some
cases, an answer may be repeated.
Column A Column B
1. The Alhambra A. Cooper
2. The Pioneers B. Dana
7
Module in Ed 310: Assessment in Learning I
3. The Guardian Angel C. Emerson
4. Two Years Before the Mast D. Holmes
5. Moby Dick E. Irving
6. The World in a Man of War F. James
7. The last of the Mohicans G. Meliville
8. The American Scholar H. Mark Twains (Clemens)
9. The Autocrat of the Breakfast Table I. Wharton
10. Tom Sawyer

If you intend to make use of this imperfect matching test, make sure you indicate so in the
“Direction” to caution the students who usually think that an answer may not be repeated.
5.6 Supply Type or Constructed-Response Test
Another useful device for testing lower order thinking skills is the supply type of test. Like the
multiple choice test, the items in this kind of test consist of a stem and a blank where the students would
write the correct answer.
Example: The study of life and living organisms is called ___________________.
Supply the tests depend heavily on the way the stems are constructed. These tests allow for one and
only answer and, hence, often test only the students’’ knowledge.
5.6.1 Completion Type of Test
It is however, possible to construct supply type of tests that will test higher order thinking as the
following example shows:
Example: Write an appropriate synonym for each of the following. Each blank corresponds to a letter:
Metamorphose: _ _ _ _ _ _
Flourish: _ _ _ _
The appropriate synonym for the first is CHANGE with six (6) letters while the appropriate synonym
for the second is GROW with four (4) letters. Notice that these questions require not only mere recall of
words but also understanding of these words.
Guidelines in the Formulation of a Completion Type of Test
The following guidelines can help you formulate a completion type of test, the fill –in-the blank type.
Avoid over mutilated sentences like item #1 below. Give enough clue to the student.
The ______ produce by the ______ is used by the green ____ to change the _____ and ____ into
______. This process is called ______.
1. Avoid open- ended item. There should be only one acceptable answer. This item is open-ended hence,
no good test item.
Ernest Hemingway wrote _________.
2. The blank should be at the end or near the end of the sentence. The question must first be asked before
an answer is expected. Like the matching type of test, the stem (where the question is packed) must be
in the first column.
3. Ask question on more significant item not on trivial matter.
Jose Rizal was born on June _____, 1861.
There are other more significant item to ask other than specific birthdates.
4. The length off the blanks must not suggest the answer. So better to make the blanks uniform in size.
A part of speech that names persons, places or things is _______.
A word use to connect clauses or sentences to coordinate words in the same clause is called _______.

4.6.2 Essays
Essays, classified as non-objective tests, allow for the assessment of higher order thinking skills.
Such tests require students to organize their thoughts on a subject matter in coherent sentences in order to
inform an audience. In essay tests, students are required to write one or more paragraphs on a specific
topic.
Essay questions can be used to measure attainment of a variety of objectives?

8
Module in Ed 310: Assessment in Learning I
1. Comparing
-Describe the similarities and differences between…
-Compare the following methods for …
2. Relating cause-and-effect
-What are the major causes of …
-What would be the most likely effects of …
3. Justifying
-Which of the following alternatives would you favor and why?
- Explain why you agree or disagree with the ff. statement
4. Summarizing
-State the points included in …
-Briefly summarize the contents of …
5. Generalizing
-Formulate several valid generalizations from the following data.
-State a set of principle that can explain the following events.
6. Inferring
-In the light of the facts presented, what is most likely to happen when…
-How would Senator X be the most likely to react the bomb explosion after the bar examination
last September?
7. Classifying
-Group the following items according to …
-What do the following items have in common?
8. Applying
-Using the principles of ___ as guide, describe how you would solve the following problem
situation.
-Describe a situation that illustrates the principle of _____.
9. Analyzing
-Describe the reasoning errors in the following paragraphs
-List and describe the main characters of …
10. Evaluating
-Describe the strengths and weaknesses of the following…
-Using the criteria develop in class, write evaluation of …
11. Creating
-Make up a story describing what would happen if …
-Design a plan to prove that…
-Write a well-organized report that shows…

5.6.2.1 Types of Essay


Restricted Essay
It is also referred to as short focused response. Examples are asking students to “write an example “,
“list three reasons”, or “compare and contrast two techniques”.
Sample Short Response Question
(10th Grade Reading)
How are the scrub jay and the mockingbird different? Support your answer with details and information
from the article.

Non-restricted / Extended Essay


Extended responses can be much longer and complex than short responses, but students are
encouraged to remain focused and organized.
Sample Extended Response Question
(5th Grade Science)
Robert is designing a demonstration to display at his school’s science fair. He will show how changing the
position of a fulcrum on a lever changes the amount of force needed to lift an object. To do this , Robert will use
a piece of wood for a lever and a block of wood to act as a fulcrum, to different places on the lever to seen how
its placement affects the force needed to lift an object.

Part A. Identify at least two other actions that would make Robert’s demonstration better.

Part B. Explain why each action would improve the demonstration.


Note that all these involve the higher-level skills mentioned in Bloom’s Taxonomy.
Source: https://fcit.usf.edu/assessment/constructed/constructb.html
9
Module in Ed 310: Assessment in Learning I
The following are rules of thumb which facilitate the scoring essays:
Rule 1: Phrase the direction in such a way that the students are guided on the key concepts to be
included. Specify how the students should respond.

Example:
Using details and information from the article (hundred Islands), summarize the main points of
the article. For a complete and correct response, consider these points:
 Its history (10pts)
 Its interesting features (10pts)
 Why it is a landmark (5pts)
Non-example
Using details and information from the article (hundred Islands), summarize the main
points of the article.
Source: https://fcit.usf.edu/assessment/constructed/constructb.html

Rule 2: Inform the students on the criteria to be used for grading their essays. This rule allows the
student to focus on relevant and substantive materials rather than a peripheral and unnecessary
facts and bits of information.
Example: Write an essay on the topic: “Plant Photosynthesis” using the keywords indicated. You will be
graded according to the following criteria: (a) coherence, (b) accuracy of the statements, (c) use
of keywords, (d) clarity and (e) extra points for innovative presentation of ideas.
Rule 3: Put a time limit on the essay test.
Rule 4: Decide on your essay grading system prior to getting the essays of your students.
Rule5: Evaluate all the students’ answers to one question before proceeding to the next question.
Scoring or grading essay tests questions by question , rather than student by student , makes it
possible to maintain a more uniform standard for judging the answers to each question. This
procedure also helps offset the halo effect in grading. When all of the answers on one paper are
read together, the graders impression of the paper as a whole is apt to influence the grades he
assigns to the individual answers. Grading question, of course, prevents the information of this
oral impression of a student’s paper. Each answer is more apt to be judge on its own merits when
it is read and compared with other answers to the same question, that when it is read and
compared with other answers by the same student.
Rule 6. Evaluate answers to essay questions without knowing the identity of the writer. This is another
attempt to control personal bias during scoring .Answer to essay questions should be evaluated in
terms of what is written, not in terms of what is known about the writers from other contacts with
them. The best way to prevent our prior knowledge from influencing our judgement is to
evaluate each answer without knowing the identity of the writer. This can be done by having the
students write their names on the back of the paper or by using code numbers in place of names.
Rule 7. Whenever possible, have two or more persons grade each answer. The best way to check on the
reliability of the scoring of the essay answer is to obtain two or more independent judgments.
Although this may not be a feasible practise for routine classroom testing, it might be done
periodically with a fellow teacher (one who is equally competent bin the area. Obtaining two or
more independent ratings becomes especially vital where the results are to be used for important
and irreversible decisions, such as in the selection of students for further training or for special
awards. Here the pooled ratings of several competent persons may be needed to attain level of
reliability that is commensurate with the significance of the decision being made.
Some teachers use the cumulative criteria i. e adding the weighs given to each criterion,
as basis for grading while others use the reverse. In the latter method, each student begins with a
score of 100. Points are then deducted every time a teacher encounters a mistake or when a
criterion is missed by the student in his essay.
Rule 8. Do not provide optional questions. It is difficult to construct questions and equal difficulty and
so teacher cannot have valid comparison of students’ achievement.
10
Module in Ed 310: Assessment in Learning I
Rule 9: Provide information about the value/weight of the question and how it will be scored.
Rule 10: Emphasize higher level thinking skills.
Example : Scientists have found that oceans can influence the temperature of nearby landmasses.
Coastal landmasses tend to have moderate temperature in summer and winter than inland
landmasses of the same latitude.
Non-example:
Considering the influences of ocean temperatures, explain why inland temperatures vary in
summer and winter to a greater degree than coastal temperatures. List three coastal landmasses.

EXERCISES
EXERCISE 1
A. Give non-example of each of the following rules of thumb in construction of a true-false test.
Improve on the non-examples for them to become good examples of test.
1. Avoid giving hints in the body of the questions.
2. Avoid using the words “always”, “never” and other such adverbs which tend to be always true or
always false.
3. Avoid long sentences which tend to be true .Keep sentences short.
4. Avoid a systematic pattern for true and false statements.
5. Avoid ambiguous sentences which can be interpreted as true and at the same time false.
B. Give non examples of each of the following rules of thumb in the construction of multiple choice
tests. Improve on the non-examples for them to become good examples of tests.
1. Phrase the stem to allow for only one correct or best answer.
2. Avoid giving away the answer in the stem.
3. Choose distracters appropriately.
4. Choose distracters so that they are all equally plausible and attractive.
5. Phrase questions so that they will test higher order thinking skills.
6. Do not ask subjective questions or opinions for which there are no right or wrong answers.
Exercise II
A. Construct a 10-item matching type to test competency:
Identify the computer system- i.e parts, other components.
B. Construct a 10-item supply test to assess this competency: Identify farm tools according to use (Grade 7-
8 Curriculum Guide; Agriculture Fishery.)
C. Justify each rule used in constructing an essay type of test.
D. Construct a 10-item data sufficiency test.
E. In a 100-item test, what types of objective tests will you include? Justify your answer.
F. In the sample essay “Plant Photosynthesis” given in this section, why would you give a zero (0) score to
the student writing this essay? Justify your answer.
Example: Write an essay on the topic. ”plant Photosynthesis” using the following keywords and phrases:
chlorophyll, sunlight, water, carbon dioxide, oxygen, by-product, stomata.

Plant Photosynthesis

11
Module in Ed 310: Assessment in Learning I
Nature has its own way of ensuring the balance between food producers and consumers. Plants
are considered producers and consumers. Plants are considered producers of food for animals. Plants
produce food for animals through a process called photosynthesis. It is a complex process that combines
various natural elements on earth into the final product which animals can consume in order to survive.
Naturally, we all need to protect plants so that we will continue to have food on our table. We should
discourage burning of grasses, cutting of trees and illegal logging. If the leaves of plants are destroyed,
they cannot perform photosynthesis and animals will also perish.

G. Give an example of a supply type of test that will measure higher order thinking skills (beyond mere
recall of facts and information.)
H. In what sense is a matching type test a variant of a multiple choice type of test? Justify your answer.
I. In what sense is a supply type of a test considered variant of multiple choice type of test? (Hint: In
supply type, the choices are not explicitly given). Does this make the supply type of test more difficult
than closed multiple choice type of test? How?
J. Choose learning competencies from the K to 12 Curriculum Guide. Construct aligned paper-and-pencil
tests observing guidelines in test construction.

REFLECTIONS
Make reflections on the different assessment tools.
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________

ENRICHMENT

Compilations of different literature, researches, articles and notes about Module 5 and its relevant topics.

12
Module in Ed 310: Assessment in Learning I
Module 6: ITEM ANALYSIS AND VALIDATION
LEARNING OUTCOMES
 Explain the meaning of item analysis, item validity, reliability, item difficulty, discrimination index.
 Determine the validity and reliability of given test items.
INTRODUCTION
The teacher normally prepares a draft of the test. Such a draft is subjected to item analysis and validation
in order to ensure that the final version of the test would functional. First, the teacher tries to reach out the
draft test to a group of students of similar characteristics as the intended test takers (try-out phase). From
the try-out group, each item will be analyzed in terms of its ability to discriminate between those who
knows and those who do not know and also its level of difficulty (item analysis phase). The item analysis
will provide information that will allow the teacher to decide whether to revise or replace an item (item
revision phase). The, finally, the final draft of the test is subjected to validation if the intent is to make use
of the test as a standard test for the particular unit or grading period. We shall be concerned with these
concepts in this Chapter.

6.1. Item Analysis

Item analysis is a statistical technique which is used for selecting and rejecting the items of the test
on the basis of their difficulty value and discriminated power.(https://www.slideshare.net)

Is a process which examines student responses to individual test items (questions) in order to assess the
quality of those items and of the test as a whole. (https://www.Washington.education>reports.com)

(https://www.slideshare.net)

Objectives of Item Analysis


 To select appropriate items for the final draft.
 To obtain the information about the item difficulty (I.D) of all items.
 To provide the discrimination index(D.I) to differentiate between capable and less capable examinees for
the items.
 To provide modification to be made in some of the items.
 To prepare the final draft properly (easy to difficult items).
Steps of Item Analysis
 Arranges the scores in descending order.
 Separate two sub groups of the test papers.
 Take 27% of the scores out of the highest scores and 27% of the scores falling at bottom.
 Count the no. of right answer in highest group (R.H) and count the no. of right answer in lowest group
(R.L)
 Count the non-response (N.R) examinees
There are two important characteristics of an item that will be of interest to the teacher. There are: (a)
item difficulty, and (b) discrimination index. We shall learn how to measure these characteristics and
apply our knowledge in making a decision about the item in question.

a. Item Difficulty

is defined as the number of students who are able to answer the item correctly divided by the total
number of students. “difficulty value of an item is defined as the proportion or percentage of the
examinees who have answered the item correctly.” -J.P.Guilford

“Thus a high index indicates a difficult item, and a low index indicates an easy item. Some item analysis
prefers an index of difficulty which is the proportion of the total group who got an item right. This index
may be obtained by marking the PROPORTION RIGHT.
Thus:

13
Module in Ed 310: Assessment in Learning I
Item difficulty = number of students who with correct answer total number of students

The item difficulty is usually expressed in percentage.

Example: What is the item difficulty index of an item if 25 students are unable to answer it correctly
while 75 answered it correctly?

Here, the total number of students is 100, hence the item difficulty index is 75/100 or 75%.

One problem with this type of difficulty index is that it may not actually indicate the item is difficult
or (easy). A student does not know the subject matter will naturally be unable to answer the item correctly
even if the question is easy. How do we decide on the basis of this index whether the item is too difficult
or too easy? The following arbitrary rule is often used in the literature:

Range of Difficulty Index Interpretation Action


0 – 0.20 Very Difficult Revise or Discard
0.21 – 0.80 Right Difficult Retain
0.81 – 1.00 Very Easy Revise or Discard
 

b. Discrimination index

it is the measure that will tell us whether an item can discriminate between those two groups of students,
the higher and the lower.

“is that ability of an item on the basis in which the discrimination is made between superiors and
inferiors.” -Blood and Budd (1972)

Item discrimination refers to the ability of an item to differentiate among students on the basis of how
they well know the material being tested. Various hand calculation procedures have traditionally been
used to compare item responses to total test scores using high and low scoring groups of students.

Computerized analysis provides more accurate assessment of the discrimination power of items because
they take into account responses of all students rather than just high and low scoring groups.

The item discrimination index provided by ScorePak® is a Pearson Product Moment correlation
between student responses to a particular item and total scores on all other items on the test. This index
equivalent item is measuring the same things as the rest of the items.

Types of Discrimination Index

1. Zero discrimination or no discrimination


The item of the test is answered correctly or all the examinee know the answer. An item is not
answered of all the examinee.
2.Positive discrimination
An item is correctly answered by upper and not answered by lower.
3.Negative discrimination
An items is correctly answered by lower and is not answered by higher.
Difficult items tend to discriminate between those who know and whose who do not know the
answer. Conversely, easy items cannot discriminate between those two groups of students. We are
therefore interested in deriving the measure that will tell us whether an item can discriminate between
those two groups of students. Such a measure is called an index of discrimination.
An easy way to derive such a measure is to measure how difficult an item is with respect to the upper
25% of the class and how difficult it is with respect to those in the lower 25% of the class. If the upper
25% of the class found an item easy yet the lower 25% found difficult, the the item can be discriminate
properly between these two groups. Thus:
14
Module in Ed 310: Assessment in Learning I
Index of discrimination = DU = DL
Example: Obtain the index of discrimination of an item if the upper 25% of the class had difficulty
index of 0.60 (I.e. 60% of the upper 25: got the correct answer) while the lower 25% of the class had a
difficulty index of 0.20.
Here, DU= 0.60 while DL = 0.20, thus index of discrimination = 60 - 20 = 40.
Theoretically, the index of discrimination can range from -10 (when DU = 0 and DL =1) to 1.0 (when DU
= 1 and DL = 0). When the index of discrimination is equal to -1, then this means that all of the lower
25% of the students got the correct answer while all the upper 25% got the wrong answer. In a sense,
such an index discriminates correctly between the two groups but the item itself is highly questionable.
Why should the bright ones get the wrong answer and the poor ones get the right answer? On the other
hand, if the index of discrimination is 1.0, then means that all of the lower 25% failed to get the correct
answer while all the upper 25% got the correct answer. This is a perfectly discriminating item and is the
deal item should be included in the test. From these discussions, let us agree to discard or revise all items
that have negative discrimination index for although they discriminate correctly between upper and lower
25% of the class, the content of the item itself may be highly dubious.
As in the case of the index of difficulty, we have the following rule of thumb:

Index Range Interpretation Action


-1.0 – -0.50 Can discriminate but item is questionable Discard
-0.51 – 0.45 Non-discriminating Revise
0.46 – 1.00 Discriminating item Include
 

Example: Consider a multiple choice type of test of which the following data were obtained:

Item Option

A B* C D
0 40 20 20 Total
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%
The correct answer is B. Let us compute the difficulty index and index of discrimination:
Difficulty Index = no. of students getting the correct response/total
= 40/100 = 40%, within range of a *good item*

The discrimination index can be similarly computed:

DU=no. Of students in upper 25% with correct response/no. Of students in the upper 25%
= 15/20 = 75 or 75%
DL=no. Of students in lower 75% with correct response/no. Of students in the lower 25%
= 5/20 = .25 or 25%
Thus, the item has a “good discriminating power”.
It is also instructive to note that the distracter A is not an effective since this was never selected by
the students. Distracters C and D to have good appeal as distracters.

Basic Item Analysis Statistics


The Michigan State University Measurement and Evaluation Department reports a number of item
statistics which aid in evaluating the effectiveness of an item. The first of these is the index of difficulty
which MSU (http//www.msu.edu/dept) defines as the proportion of the total group who got the item
15
Module in Ed 310: Assessment in Learning I
wrong. “Thus a high index indicates a difficult item, and a low index indicates an easy item. Some item
analysis prefers an index of difficulty which is the proportion of the total group who got an item right.
This index may be obtained by marking the PROPORTION RIGHT. Opinion on the item header sheet.
Whichever index is selected is shows as the INDEX OF DIFFICULTY on the item analysis print-out. For
classroom achievement test, most test desire items with indices of difficulty no lower than 20 or higher
than 80, with an average index of difficulty from 30 to 40 to a maximum of 60.
The INDEX OF DISCRIMINATION is the difference between the proportion of the upper group
who got an item right and the proportion of the lower group who got the item right. This index is
dependent upon the difficulty of an item. It may reach a maximum value of 100 for an item with an index
of difficulty of 50, that is, when 100% of the upper group and none of the lower group answer the item
correctly. For item of less than or greater than 50 difficulty, the index of discrimination has maximum
value of less than 100. Interpreting the Index of Discrimination document contains a more detailed
discussion of the index of discrimination.” (http//www.msu.educ/dept).
More Sophisticated Discrimination Index
Item discrimination refers to the ability of an item to differentiate among students on the basis of
how they well know the material being tested. Various hand calculation procedures have traditionally
been used to compare item responses to total test scores using high and low scoring groups of students.
Computerized analysis provides more accurate assessment of the discrimination power of items because
they take into account responses of all students rather than just high and low scoring groups.
The item discrimination index provided by ScorePak® is a Pearson Product Moment correlation
between student responses to a particular item and total scores on all other items on the test. This index
equivalent item is measuring the same things as the rest of the items.
Because the discrimination index reflects the degree to which an items and the test as a whole are
measuring a unitary ability or attribute, values of the coefficient will tend to be lower for the tests
measuring a wide range of content areas than for more homogeneous test. Item discrimination indices
must always be interpreted in the context of the type of test which is being analyzed. Items with low
discrimination indices are often ambiguously worded and should be examined. Item with negative
indices should be examined to determine why a negative value was obtained. For an example, a negative
value may indicate that the item was mis-keyed, so that students who knew the material tended to choose
an unkeyed, but correct, response option.
Test with high consistency consist of items with mostly positive relationships with total test score. In
practice, value of the discrimination index will seldom exceed .50 because of the differing shapes of item
and total score distributors.ScorePak® classifies item discrimination as “ good” if the index is above .30
“fair” it if is between .10 and .30; and “poor” if it is below .1o.
A good item is one of that has good discriminating ability and has sufficient level of difficult (not too
difficult nor too easy). In the two tables presented for the levels of difficulty and discrimination there is a
little area of intersection where the two indices will coincide (between 0.56 to 0.67) which represent the
good items in a test. (Source: Office of Educational Assessment, Washington D, USA
http.www.washington.edu/oea/services/scanning_scoring/scoring/item_analysis.html)
At the end of the of the Item Analysis report, test item is listed according to their degrees of difficulty
(easy, medium, hard) and discrimination (good, fair, poor). These distributions provide a quick overview
of the test, and can be used to identify items which are not performing well and which can perhaps be
improved or discarded.

SUMMARY

The Item-Analysis Procedure for Norm-Provides the following information


1. The difficulty of item
2. The discriminating power of the item
3. The effectiveness of each alternative
Benefits derived from Item Analysis
16
Module in Ed 310: Assessment in Learning I
1. It provides useful information for class discussion of the test.
2. It provides data which helps students to improve their learning.
3. It provides insights and skills that leads to the preparation of better tests in the future
Index of Difficulty
Ru+ R L
P= x 100
T
Where
Ru - The number in the upper group who answered the item correctly
RL -The number in the lower group who answered the item correctly
T – The total number who tried the item.
Index of item Discriminating Power
Ru+ R L
D=
1
T
2
Where:
P = percentage who answered the item correctly (index of difficulty)
R = number who answered the item correctly
T = total number who tried the item
8
P= x 100=40 %
20
The smaller the percentage figure the more difficult the item
Estimate the item discriminating power using the formula below:
Ru−RL 6−2
D= = =0.40
1 10
T
2
The discriminating power of an item is reported as a decimal fraction; maximum discriminating
power is indicated by an index of 100
Maximum discrimination is usually found at the 50 percent level of difficulty.
0.0 - 0.20 = Very difficult
0.21- 0.80 = Moderately difficult
0.81 – 1.00 = Very easy
6.2 Validation

After performing the item analysis and revising the item which need revision, the next step is to validate
the instrument. The purpose of validation is to determine the characteristics of the whole test itself,
namely, the validity and reliability of the test. Validation is the purpose the meaningfulness and
usefulness of the test.
Validity. Validity is the extent to which a test measures what it purports to measure or as referring to the
appropriateness, correctness, meaningfulness and usefulness off the specific decisions a teacher makes
based on the test results. These two definitions of validity differ in the sense that the first definition refers
to the test itself while the second refers to the decisions made by the teacher on the test. A test is valid
when it is aligned to the learning outcome.A teacher who conducts test validation might want to gather
different kinds of evidence.
A teacher who conducts test validation might want to gather different kinds of evidence.

Three Main Types of evidence that may be collected:


1. Content-related evidence of validity
2. Criterion-related evidence of validity
3. Construct-related evidence of validity

1.Content-related evidence of validity


17
Module in Ed 310: Assessment in Learning I
refers to the content and format of the instrument. How appropriate is the content? How
comprehensive? Does it logically get at the variable? How adequately does the sample of items or
questions represent the content to be assessed?

2. Criterion-related evidence of validity

refers to the relationship between scores obtained using the instrument and scores obtained using
one or more other tests (often called criterion). How strong is this relationship? How well do such scores
estimate present or predict future performance of a certain types?

3. Construct-related evidence of validity

refers to the nature of the psychological construct or characteristics being measured by the test.
How well does a measure of the construct explain differences in the behavior of the individuals or their
performance on a certain task?

The usual procedure for determining content validity may be described as follows:

 The teacher writes out the objectives of the test based on the table of specifications and then gives these
together with the test to at least two (2) experts along with a description of the intended test takers.
 The experts look at the objectives, read over the items in the test that they feel does not measure one or
more objectives. They also place a check mark in front of each objective not assesses by any item in the
test.
 The teacher then rewrites any item so checked and resubmits to the experts and /or writes new items to
cover those objectives not here to fore covered by the existing test.
 This continues until the experts approve of all items and also until the experts agree that all of the
objectives are sufficiently covered by the test.
In order to obtain evidence of criterion-related validity,

The teacher usually compares scores on the test in question with the scores on some other
independent criterion test which presumably has already high validity. For example, if a test is designed
to measure mathematics ability of students and it correlates highly with a standardized mathematics
achievement test (external criterion), then we say we have high criterion-related evidence of validity.
This type of criterion-related validity is called its concurrent validity.

Another type of criterion-related validity is Predictive Validity wherein the test scores in the instrument
are correlated with scores on a later performance (criterion measure) of the students. For example, the
mathematics ability test constructed by the teacher may be correlated with their later performance in a
Division wide mathematics achievement test.

Gronlund suggested using the so-called expectancy table. For example, suppose that a mathematics
achievement test is constructed and the scores are categorized as high, average and low. The criterion
measure used in the final average grades of the students in high school: Very Good, Good and Needs
Improvement.

1. Apart from the use of correlation coefficient in measuring criterion-related validity Gronlund
2. This table is easy to construct and consists of the test (predictor) categories listed on the left hand side
and the criterion categories listed horizontally along the top of the chart
3. The two-way table list down the number of the students falling under each of the possible pairs as
shown. (test, grade).

Grade Point Average

18
Module in Ed 310: Assessment in Learning I
Test Score Very Good Good Need
Improvement

High 20 10 5
Average 10 25 5
Low
1 10 14

The expectancy table shows that there were 20 students getting high test scores and subsequently rated
excellent in terms of their final grades; and finally, 14 students obtained low test scores and were later
graded as needing improvement. The evidence for this particular test tends to indicate that students
getting high scores on it would be graded excellent; average scores on it would be rated good later; and
students getting low scores on the test would be graded as needing improvement later.
We will not be able to discuss the measurement of construct-related validity in this book since the
method to be used require sophisticated statistical techniques falling in the category of factor analysis.

6.3. Reliability

Reliability refers to the consistency of the scores obtained – how consistent they are for each individual
from one administration of an instrument to another and from one set of items to another. We already
gave the formula for computing the reliability of a test; for internal consistency; for instance, we could
use the split-half method or the Kuder-Richardson formulae (KR-20 or KR-21)

Reliability and validity are related concepts. If an instrument is unreliable, it cannot yet valid outcome.
As reliability improves, validity may improve (or it may not). However, if an instrument is shown
scientifically to be valid then it is almost certain that it is also reliable.

The following table is a standard followed almost university in educational test measurement.

Reliability Interpretation
90 and above Excellent reliability; at the level of the best standardized tests
80 – 90 Very good for a classroom test
70 – 80 Good for a classroom test; in the range of the most.
There are probably a few items which could be improved.
60 – 70 Somewhat low. This test must be supplemented by other measures (e.g. more
tests) to determine grades. There are probably some items which could be
improved.
50 – 60 Suggests need for revision test, unless it is quite short (ten or fewer items). The
test definitely needs to be supplemented by other measures (e.g. more tests) for
grading.
50 or below Questionable reliability. This test should not contribute heavily to the course
grade, and it needs revision.

Exercises

A. Find the index of difficulty in each of the following situations:

1. N = 60, number of wrong answers : upper 25% = 2 lower 25% = 6


2. N = 80, number of wrong answers : upper 25% = 2 lower 25% = 9
3. N = 30, number of wrong answers : upper 25% = 1 lower 25% = 6
4. N = 50, number of wrong answers : upper 25% = 3 lower 25% = 8
5. N = 70, number of wrong answers : upper 25% = 4 lower 25% = 10

19
Module in Ed 310: Assessment in Learning I
B. Which of the items in Exercise A are found to be the most difficult?

C. Compute the discrimination index for each of the items in Exercise A.

D. Answer the following questions:

1. A teacher constructed a test which would measure the student’s ability to apply previous
knowledge to certain situations. In particular, the evidence that a student is able to apply previous
knowledge are:
 Draw correct conclusions that are based on the information given;
 Identify one or more logical implications that follow from a given point of view;
 State whether two ideas are identical, just similar, unrelated or contradictory.
 Write test items using the multiple choice type of test that would cover these concerns of the teacher.
Show your test to an expert and ask him to judge whether the items indeed cover these concerns.

2. What is an expectancy table? Describe the process of constructing an expectancy table. When
do we use an expectancy table?

3. Enumerate the three types of validity evidence. Which of these types of validity is the
most difficult to measure? Why?

4. What is the relationship between validity and reliability? Can a test be reliable and yet
not valid? Illustrate.

5. Discuss the different measures of reliability. Justify the use of each measure in the
context of measuring reliability.

REFLECTIONS
Make reflections on the validity and reliability of test items.
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________

ENRICHMENT

Compilations of different literature, researches, articles and notes about Module 6 and its relevant topics.

20
Module in Ed 310: Assessment in Learning I

Module 7: PERFORMANCE- BASED TESTS


LEARNING OUTCOMES
 Develop performance-based test to asses selected learning competencies from the K to 12 curriculum
guide
 Construct appropriate scoring rubrics for given students’ products/performances

INTRODUCTION
Over the past few years, there has been a general dissatisfaction over the results of traditional
standardized objective test. Concerted efforts have, therefore, been expended to find alternative assessment
mechanisms of measuring educational outcomes and processes and measure more complex processes in
education. For example, multiple choice test have been criticized because they, purportedly, are unable to
measure complex problem solving skills, are hopeless in measuring processes that occur in daily classroom
activities, gauge the processes involved in accomplishing the task performance and examine learners’
application skills rather than superficial learning material. Educators have therefore focused their attention to
finding alternative assessment methods that would hopefully address these difficulties with the traditional
methods of objective assessment technique that has been proposed.

Performance-based assessment procedures believe that the best way to gauge a student or pupil
competency in a certain task is through observation en situ or on site. Such as a belief appears consistent with
the constructivist philosophy in education often taught in courses on Philosophy of Education. A performance-
based test is designed to assess students on what they know, what they are able to do and the learning strategies
they employ in the process of demonstrating it.

Many people have noted serious limitations of performance-based test and their vulnerability toward
subjectivity in scoring and creating or providing the real or closer-to-the task environment for assessment
purpose. However, the concerns for subjectivity may be addressed simply by automating the test. The second
issue is obviously a bigger problem, and there is no guarantee that ideas from one domain will apply to another.

7.1 Performanced-Based Tests


There are many testing procedures that are classified as performance test with a generally agreed upon
definition that these test are assessment procedures that require students to perform a certain task or activity or
perhaps, solve complex problems. For example, Bryant suggested assessing portfolios of a students’ work over
time, students’ demonstrations, hands-on execution of experiments by students, and a student’s work in
simulated environments. Such an approach fall under the category of portfolio assessment (i.e. keeping records
of all tasks successfully and skilfully performed by a student). According to Mehrens performance testing is not
new. In fact, various types of performance-based tests were used even before the introduction of multiple-
choice testing. For instance, the following are considered performance testing procedures: performance tasks,
rubrics scoring guides and exemplars of performance.

7.2 Performance Tasks


In performance tasks, students are required to draw on the knowledge and skills they possess and to
reflect upon them for use in the particular task at hand. Not only are the students expected to obtain knowledge
from a specific subject or subject matter but they are in fact required to draw knowledge and skills from other
disciplines in order to fully realize the key ideas needed in doing the task. Normally, the tasks require students
to work on projects that yield a definite output or product, or perhaps, following a process which test their
approach to solving a problem. In many instances, the tasks require a combination of the two approaches. Of
course, the essential idea in performance tasks is that students or pupils learn optimally by actually doing
(Learning by Doing) the task which is a constructivist philosophy.

21
Module in Ed 310: Assessment in Learning I
As in any other test, the tasks need to be consistent with the intended outcomes of the curriculum and the
objectives of instruction; and must require students to manifest (a) what they know and (b) the process by which
they came to know it. In addition, performance-based test require that tasks involve examining the processes as
well as the products of student learning.

7.3 Rubrics and Exemplars


Modern assessment methods tend to use rubrics to describe student performance. A rubric is a scoring
method that lists the criteria for a piece of work, or “ what counts” (for example, purpose, organization, details,
voice, and mechanics are often what count in a piece of writing); it also articulates gradations of quality for each
criterion, from excellent to poor. Perkins et al (1994) provide an example of rubrics scoring for student’s
inventions and lists the criteria and gradations of quality for verbal, written, or graphic reports on students
inventions. This is shown in the succeeding figure as a prototype of rubrics scoring. This rubrics lists the criteria
in the column on the left: the report must explain (1) the purposes of the invention (2) the features or parts of the
invention and how they help it serve its purposes, (3) the pros and cons of the design, and (4) how the design
connects to other things past, present, and future. The rubric could easily include criteria related to presentation
style and effectiveness, the mechanics of written pieces, and the quality of the invention itself. The four
columns to the right of the criteria describe varying degrees of quality, from excellent to poor.

There are many reasons for the seeming popularity of rubrics scoring in the Philippines school system.
First, they are very useful tools for both teaching and evaluation of learning outcomes. Rubrics have the
potential to improve student performance, as well as monitor it, by clarifying teachers’ expectations and by
actually guiding the students how to satisfy these expectations.

Secondly, rubrics seem to allow students to acquire wisdom in judging and evaluating the quality of their
own work in relation to the quality of the work of other students. In several experiments involving the use of
rubrics, students progressively became more aware of the problems associated with their solution to a problem
and with the problems inherent in the solutions of other students. In other words, rubrics increase the students’
sense of responsibility and accountability.

Third, rubrics are quite efficient and tend to require less time for the teachers in evaluating student
performance. Teachers tend to find that by the time a piece has been self-and peer-assessed according to a
rubric, they have little left to say about it. When they do have something to say, they can often simply circle an
item in the rubric, rather than struggling to explain the flaw or strength they have noticed and figuring out what
to suggest in terms of improvements. Rubrics provide students with more informative feedback about their
strengths and areas in need of improvement.

Finally, it is easy to understand and construct a rubrics scoring guide. Most of the items found in the rubrics
scoring guide are self-explanatory and require no further help from outside experts.

Rubrics for an Invention Report

Criteria Quality
(3) (2) (1) (0)
Most acceptable Acceptable Less acceptable Not acceptable
The report The report The report explains The report does
Purposes explains the explains all some of the not refer to the
key purposes of the key purposes of the purposes of the
of the purposes of invention but invention.
invention and the invention. misses key
points out purposes.
less obvious
ones as well.

22
Module in Ed 310: Assessment in Learning I
The report The report The report The report does not
Features details details the neglects some detail the features
both key features of features of the of the invention
and hidden the invention invention or or the purposes
features of and explains the purposes they they serve.
the invention the purposes serve.
and explains they serve.
how they
serve several
purposes.
The report The report The report The report does not
Critique discusses the discusses the discusses either mention the
strengths and strengths and the strengths or strengths or the
weaknesses of weaknesses of weaknesses of weaknesses of
the invention, the the invention the invention.
and suggests invention. but not both.
ways in which
it can be
improved.
Connections The report The report makes The report makes
The report makes makes unclear or no connections
appropriate appropriate inappropriate between the
connections between connections connections invention and
the purposes and between the between the other things.
features of purposes and invention and
the invention features of other phenomena.
and many the invention
different kinds and one or
of phenomena. two
phenomena.
SUB-TOTALS
Average: ________________ Figure 14. Prototype of Rubric Scoring

7.4. Creating Rubrics


In designing a rubric scoring guide, the students need to be actively involved in the process. The
following steps are suggested in actually creating a rubric:
1. Survey models – Show students example of good and not-so-good work. Identify the characteristics that
make the good ones good and the bad ones bad.
2. Define criteria – From the discussions on the models, identify the qualities that define good work.
3. Agree on the levels of quality – Describe the best and worst levels o quality, then fill in the middle levels
based on your knowledge of common problems and the discussion of not-so-good work.
4. Practise on models – Using the agreed criteria and levels of quality, evaluate the models presented in
step 1 together with the students.
5. Use self- and peer- assessment – Give students their task. As they work, stop them occasionally for self-
and peer- assessment.
6. Revise – Always give students time to revise their work based on the feedback they get in Step 5.
7. Use teacher assessment – Use the same rubric students used to assess their work yourself.

7.5. Writing and Selecting Effective Rubrics


Two main defining aspects of rubrics are the criteria that describe the qualities that you and students
should look for as evidence of students’ learning and the descriptions of the levels of performance.

7.5.1. Desired Characteristics of Criteria for Classroom Rubrics


23
Module in Ed 310: Assessment in Learning I

Characteristics
The criteria are: Explanation

Appropriate Each criterion represents an aspect of a standard, curricular goal,


or instructional goal or objective that students are intended to
learn.
Definable Each criterion has a clear, agreed-upon meaning that both
students and teachers understand.
Observable Each criterion describes a quality in the performance that can be
perceived (seen or heard, usually) by someone other than the
person performing.
Distinct from one another Each criterion identifies a separate aspect of the learning
outcomes the performances is intended to assess.
Complete All the criteria together describe the whole of the learning
outcomes the performances is intended to assess.
Able to support descriptions Each criterion can be described over a range of performance
along a continuum of quality level.
Figure 15. Desired Characteristics of Criteria for Classroom Rubrics

Figure 15 shows a teacher-made rubric prepared to assess the videotaped: Reading Rainbow-style” book
talks. (AnnTanona, a second grade teacher, as lifted from Heide Andrade, 2007, http: www.yahoo.com).

ORAL READING FLUENCY RUBRIC

Name __________________________ Date ____________________

1 2 3 4
Expression No A little Same Lots of
Expression Expression Expression Expression

1 2 3 4
No A little Same Very good
Phrasing Expression Expression Phrasing Phrasing

1 2 3 4
Way too slow A little Almost Just
Speed or way too bit too slow or perfect but Right!
fast! a little bit too still needs
Fast practice...

Source: Used with permission from Katrina D. Kimmel, West Hills Primary School, Kittaning, PA.

Figure 16. Oral Reading Fluency Rubric

24
Module in Ed 310: Assessment in Learning I

Characteristics Explanation
The descriptions of levels
of performance are…
Descriptive Performance is described in terms of what is observed in the work.

Clear Both students and teachers understand what the descriptions mean.

Cover the whole range of Performance is described from one extreme what the descriptions
performance mean.
Distinguish among levels Performance is described are different enough from level to level
that work can be categorized unambiguously. It should be possible
to match examples of work to performance descriptions at each
level.

Center the target performance The descriptions of performance at the level expected by the
(acceptable, mastery, passing) standard, curriculum goal, or lesson objective is placed at the
at the appropriate level intended level on the rubric.
Feature parallel descriptions Performance descriptions at each level of the continuum for a
from level to level given standard describe different quality levels for the same
aspects of the work.

Figure 17. Desired Characteristics of Descriptions of levels of Performance for Classroom Rubrics

Criteria Quality
Creative beginning Boring beginning No beginning
Did I get my
Audience’s
Attention?
Tells exactly what Not sure, not clear Didn’t mention it
Did I tell what type of book it is
Kind of book?
Included facts Slid over character Did not tell anything
Did I tell about character about main character
Something
About the main
Character?
Tells when and Not sure, not clear Didn’t mention set-
Did I mention the where story takes ting
Setting? place
Made it sound Told part and Forgot to tell
Did I tell one interesting-I want to skipped on to
Interesting part? buy it! something else
Did tell Skipped over it Just-got0-out-of-bed
Did I tell who look, head down
Might like this
Book?
Hair combed, neat, Lazy look
How did I look? clean clothes, smiled,
looked up, happy

How did I sound? Clear, strong, cheerful No expression in Difficult to


voice voice understand—6- inch
voice or screeching

25
Module in Ed 310: Assessment in Learning I
Figure 18. Book Talk Rubric

7.6. Tips in Designing Rubrics


Perhaps the most difficult challenge is to use clear precise and concise language. Terms like “creative”,
“innovative” and other vague terms need to be avoided. If a rubric is to teach as well as evaluate, terms like
these must be defined for students. Instead of these words, try words that can convey ideas and which can be
readily observed. Patricia Crosby and Pamela Heinz, both seventh grade teachers (from Andrade, 2007), solved
the same problem in a rubric for oral presentations by actually listing ways in which students could meet the
criterion (fig. 19). This approach provides valuable information to students on how to begin a talk and avoid the
need to define elusive terms like creative.`

Criterion Quality

Gains attention of Gives details Does a two- Does not attempt


audience. Or an amusing Sentence To gain attention of
Fact, a series of Introduction, then Audience, just starts
Questions, a short Starts speech. Speech.
Demonstration, a
Colourful visual or
a personal reason Gives one-
Why they picked Sentence
This to Introduction, then
Starts speech
Figure 19. Rubric for an Oral Presentation

Specifying the levels of quality can often be very challenging also. Spending a lot of time with the
criteria helps but in the end, what comes out re often subjective. There is clever technique often used to define
the levels of quality. It essentially graduates the quality levels through the responses: “Yes,” “Yes but,” “No
but,” and “No.” for example, Figure 20

Shows part of a rubric for evaluating a scrapbook that documents a story.

Criterion Quality

Gives enough Yes, I put Yes, I put in No. I didn’t No, I had
details in enough Some details, put in enough Almost no
details to give but some key details. But I details.
the reader a details are did include a
sense of time, missing. few.
place,
and events.

Figure 20. Rubric for Evaluating a Scrapbook (lifted from Andrade, 2007)

Rubrics are scales that differentiate levels of student performance. They contain the criteria that must be
met by the student and the judgement process that will be used to rate how well the student has performed. An
exemplar is an example that delineates the desired characteristics of quality in ways students can understand.
These are important parts of the assessment process.

Well-designed rubrics include:

 performance dimensions that are critical to successful task completion;


 criteria that reflect all the important outcomes of the performance task;
 a rating scale that provides a usable, easily-interpreted score;
26
Module in Ed 310: Assessment in Learning I
 criteria that reflect concrete references, in clear language understandable to students, parent, and
other teacher; and other teachers; and others.

In summary, we can say that to design problem based tests, we have to ensure that both processes and end-
results should be tested. The test should be designed carefully enough to ensure that proper scoring rubrics can
be designed, so that the concerns about subjectivity in performance based test are addressed. Indeed, this needs
to be done anyway in order to automate the test, so that a performance based testing .is used widely.

7.7. Automating Performance-Based Test


Going by the complexity of the issues that need to be addressed in designing performance-based test, it
is clear that automating the procedure is no easy task. The sets of tasks that comprise a performance based test
have to be chosen carefully in order to tackle thee design issues mentioned. Moreover, automating the
procedure imposes another stringent requirement for the design of the test. In this section, we summarize what
we need to keep in mind while designing an automated performance based test.

We have seen that in order to automate a performance-based test, we need to identify a set of tasks
which all lead to the solution of a fairly complex problem. For the testing software to be able to determine
whether a student has completed any particular task, the end of the task should be accompanied by a definite
change in the system, to determine whether the student has completed the task. Indeed, a similar condition
applies to every aspect of the problem solving activity that we wish to test. In this case, a set of changes in the
system can indicate that the student has the desired competency.

Such tracking is used widely by computer game manufactures, where the evidence of a game player’s
competency is tracked by the system, and the game player is taken to the next ‘level’ of the game.

In summary, the following should be kept in mind as w design a performance- based test.

 Each performance task/problem that is used in the test should be clearly defined in terms of performance
standards not only for the end result but also for the strategies used in various stages of process.
 A user need not always end up accomplishing the task; hence it is important to identify important
milestones that the taker reaches while solving the problem.
 Having defined the possible strategies, the process and milestone, the selection of task that comprise a
test should allow the design of good rubrics for scoring.
 Every aspect of the problem-solving activity that we wish to test has to lead to a set of changes in the
system, so that the testing software can collect evidence of the student’s competency.

EXERCISES
A. Construct a checklist for a performance test which tests the students’ ability to perform the following:

1. using an inclined plane to illustrate the concept of a diluted free fall


2. using the low power objective and high power objective of a microscope
3. opening and using the MS WORD FOR WORD processing
4. using MS EXCEL to prepare a class record for a teacher
5. playing the major keys on a guitar
B. Construct a rubric scoring guide for the following:

1. an essay on the “ History of the Philippine Republic: 1898-1998”


2. poem reading: “The Raven” by Edgar Allan Poe
3. constructing three-dimensional geometric figures made of cardboard boxes
4. story telling: “May Day’s Eve” by Nick Joaquin
5. solving an algebraic verbal problem involving two linear equations in two unknowns
27
Module in Ed 310: Assessment in Learning I
6. writing the alphabet in cursive form
7. interpreting a poem of Robert Frost:
8. writing an autobiography
9. research report
C. Differentiate between a performance test and the traditional assessment method of cognitive testing.

REFLECTIONS
Make reflections on the performance based test
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________

ENRICHMENT

Compilations of different literature, researches, articles and notes about Module 7 and its relevant topics.

Module 8: GRADING SYSTEMS and the GRADING SYSTEMS


of the DEPARTMENT OF EDUCATION
LEARNING OUTCOMES
 Distinguish between norm-referenced and criterion-referenced grading; cumulative and averaging grading
system
 Compute grades of students in various grade level observing DepED guidelines

INTRODUCTION
Assessment of the student performance is essentially knowing how the student is progressing in a course
(and, incidentally, how a teacher is also performing with respect to the teaching process). The first step in
assessment is, of course, testing (either by some pencil-paper objective test or by some performance based
testing procedure) followed by a decision to the grade performance of the student. Grading, therefore, is
the next step after testing. Over the course of several years, grading systems have been involved in
different school systems all over the world. In the American system, for instance, grades were expressed
in terms of letter, A, B, B+, B-, C, C-, D or what is referred to as a seven point system. In Philippines
colleges and universities, the letter are replaced with numerical value: 1, 1.25, 1.50, 1.75, 2.0, 2.25, 3.0
and 4.0 or an eight-point system. In the basic education, grades are expressed as percentages (Of
accomplishment) such as 80% or 75%. With the implementation of the K to 12 Basic Education
Curriculum, however, student’s performance is expressed in terms of level of proficiency. Whatever be
the system of grading adopted, it is clear that there appears to be a need to convert raw score values into

28
Module in Ed 310: Assessment in Learning I
the corresponding standards grading system. This Chapter is concerned with the underlying philosophy
and mechanics of converting raw score values into standard grading formats.

8.1 Norm-Referenced Grading


The most commonly used grading system fall under the category of norm-referenced grading.
Norm-referenced grading refers to a grading system where in a student’s grade is placed in relation to the
performance of a group. Thus, in this system, a grade of 80 means that the student performed better than
or same as 80% of the class (or group). At this glance, there appears to be no problem with this type of
grading system as it simply describes the performance of a student with reference to a particular group of
learner. The following examples shows some of the difficulties associated with norm-referenced grading.

Example: Consider the following two set of scores in an English 1 class for low section of ten
students each:

A= {30, 40, 50, 55, 60, 65, 70, 75, 80, 85}
B= {60, 65, 70, 75, 80, 85, 90, 90, 95, 100}

In the first class, the student who got a raw score of 75 would get grade of 80% while in the second
class, the same grade of 80% while in the second class, the same grade of 80% would correspond to a raw
score of 90. Indeed, if the test used for the two classes are the same, it would be a rather “unfair’ system
grading. A wise student would opt to enrol in class A since it is easier to get higher grades in that class
than in the order class (class B).

The previous example illustrates one difficulty with using a norm-referenced grading system. This
problem is called the problem of equivalency. Does a grade of 80 in one class represent the same
achievement level as a grade of 80 in another class of the same subject? This problem is similar to the
problem of trying to compare a Valedictorian from some very popular University in the urban area. Does
one expect the same level of competence for these two valedictorians?

As we have seen, norm-referenced grading system are based on a pre-established formula


regarding the percentages or ratio of student within a whole class who will be assigned each grade or
mark. It is therefore known in advance what percent of the students would pass or fail a given course. For
this reason, many opponents to norm-referenced grading aver that such a grading system does not advance
the cause of education and contradicts the principle of individual differences.

In norm-referenced grading, the students, while they may work individually, are actually in
competition to achieve a standard of performance that will classify them into the desired grade range. It
essentially promotes competition among students or pupils in the same class. A student or pupil who
happens to enrol in a class of gifted students in Mathematics will find that the norm-referenced grading
system is rather worrisome. For example, a teacher may establish a grading policy whereby the top 15
percent of student will receive a mark of excellent or outstanding, which in a class of 100 enrolled
students will be 15 persons. Such grading policy is illustrated below:

1.0 (Excellent) = Top 15% of Class


1.50 (Good) = Next 15% of Class
2.0 (Average, Fair) = Next 45% of Class
3.0 (Poor, Pass) = Next 15% of Class
5.0 (Failure) = Bottom 10% of Class

The underlying assumption in norm-referenced grading is that the students have abilities (as
reflected in their raw scores) that obey the normal distribution. The objectives is to find out the best
performances in this group. Norm-referenced system are most often used for screening selected student
populations in condition where it is known that not all students can advance due to limitations such as
available places, jobs, or other controlling factors. For example, in the Philippines setting, since not all
high school students can actually advance to college or university level because of financial constraints,
the norm-referenced grading system can be applied.

29
Module in Ed 310: Assessment in Learning I
Example: In a class of 100 students, the mean score in a test is 70 with a standard deviation of 5.
Construct a norm-referenced grading table that would have seen-grade scales and such that student scoring
between plus or minus one standard deviation from the mean receives an average grade.

Raw Scores Grade Equivalent Percentage


Below 55 Fail 1%
55 – 60 Marginal Pass 4%
61 – 65 Pass 11%
66 – 75 Average 68%
76 – 80 Above Average 11%
81 – 85 Very Good 4%
Above 85 Excellent 1%

Only a few of the teachers who use norm-referenced grading apply it with complete consistency.
When a teacher is faced with a particularly bright class, most of the time, he does not penalize good
students for having the bad luck to enrol in a class with a cohort of other very capable students even if the
grading system says he should fail a certain percentage of the class. On the other hand, it is also unlikely
that a teacher would reduce the mean grade for as class when he observe a large proportion of poor
performing students just to save them from failure. A serious problem with norm-referenced grading is
that, no matter what the class level og knowledge and ability, and no matter how much they learn, a
predictable proportion of students will receive each grade. Since its essential purpose is to sort students
into categories based on relative performance, norm-referenced grading and evaluation is often used to
weed out students for limited places in selective educational programs.

Norm-referenced grading indeed promotes competition to the extent that students would rather not
help fellow students because by doing so, the mean of the class would be raised and consequently it would
be more difficult to get higher grades. Similarly, students would do everything (legal) to pull down the
scores of everyone else in order to lower the mean and thus assure him/her of higher grades on the curve.

A more subtle problem with norm-referenced grading is that a strict correspondence between the
evaluation methods used and the course instructional goals is not necessary to yield the required grades
distribution. The specific learning objectives of norm-referenced
Classes are often kept hidden, in part out of concern that instruction not “ give away” the test ot the
teacher’s priorities, since this might tend to skew the curve. Since norm-referenced grading is replete with
problems, what alternatives have been devised for grading the students?

8.2 Criterion-Referenced Grading


Criterion-refenced grading system are based on a fixed criterion measure. There is a fixed target and the
students must achieve that target in order to obtain a passing grade in a course regardless of how the other
students in the class perform. The scale does not change regardless of the quality, or lack thereof, of the
students. For example, in a class 100 students using the table below, no one might get a grade of excellent
if no one scores 98 abopve or 85 above depending on the criterion used. There is no fixed percentages of
students who are expected to get the various grades in the criterion-referenced grading system.

1.0 (Excellent) = 98-100 or 85-100


1.5 (Good) = 88-97 or 80-84
2.0 (fair) = 75-87 or 70-79
3.0 (Poor/Pass) = 65-74 or 60-69
5.0 (Failure) = below 65 or below 60

Criterion-referenced systems are often used in situations where the teacher are agreed on the meaning of
a “ standard of performance” in a subject but the quality of the students is unknown or uneven; where the
work involves student collaboration or teamwork; and where there is no external driving factor such as
needing to systematically reduce a pool of eligible students.

Note that in criterion-referenced grading system, students can help a fellow student in a group work
without necessarily worrying about lowering his grade in that course. This is because the criterion-
30
Module in Ed 310: Assessment in Learning I
referenced grading system does not require the mean (of the class) as basis for distribution grades among
the students. It is therefore an ideals system to use in collaborative group work. When the student
evaluated based on predefined criteria, they are freed to collaborate with one another and with the
instructor. With criterion-referenced grading, a rich learning environment is to everyone’s advantage, so
students are rewarded for finding ways to help each other, and for contributing to class and small group
discussions.

Since the criterion measure used in criterion-referenced grading is a measure that ultimately rests with
the teacher, it is logical to ask: What prevents teachers who use criterion-referenced grading from setting
the performance criteria so low that everyone can pass with ease? There are a variety of measure used to
prevent this situation from ever happening in grading system. First, criterion should not be based on only
one teacher’s opinion or standard. It should be collaboratively arrived at. A group of teacher teaching the
same subject must set the criterion together. Second, once the public scrutiny so that it does not become
arbitrary and subject to the whim and caprices of the teacher.

8.3 Four Question in Grading

Marinila D. Svinicki (2007) of the Center for Teaching Effectiveness of the University of Texas at
Austin poses four intriguing questions relative to grading. We reflect these questions here in this section
and corresponding opinion of Ms. Svinicki for your own reflection:

1. Should grades reflect absolute achievement level or achievement relative to others in the same class?
2. Should grades reflect achievement only or non-academic components such as attitude, speed and
diligence?
3. Should grades report status achieved or amount of growth?
4. How can several grades on diverse skills combine to give a single mark?

8.4. What Should Go Into a Student’s Grade

The grading system an instructor selects reflects his or her educational philosophy. There are no right or
wrong systems, only systems which accomplish different objectives. The following are question which an
instructor may want to answer when choosing what will go into a student’s grade.

1. Should grades reflect absolute achievement level or achievement relative to others in the same class?

This is often referred to as the controversy between norm-referenced versus criterion-referenced


grading. In norm-referenced grading systems the letter grade a student receives is based on his or her
standing in a class. A certain percentages of those at the top receive A’s, a specified percent of the next
highest grades receive B’s and so on. Thus an outside person, looking at the grades, can decide which
students in that group performed best under those circumstances beyond the students’ control which might
adversely affect grades, such as poor teaching, bad tests or unexpected problem arising for the entire class.
Presumably, these would affect all the students equally, so all performance would drop but the relative
standing would stay the same.

On the other hand, under such a system, an outside evaluator has little additional information about
what a student actually knows since that will vary with the class. A student who has learned an average
amount in a class of geniuses will probably know more than a student who is average in a class of low
ability. Unless the instructor provides more information that just the grade, the external user of the grade is
poorly informed.

The systems also assumes sufficient variability among student performances that the difference in
learning between them justifies giving different grades this may be true in large beginning classes, but is a
shaky assumption where the student population is homogeneous such as in upper division classes.

The other most common grading system is the criterion referenced system. In this case the
instructor sets a standards of performance against which the students’ actual performance is measured. All
student achieving a given level receive the grade assigned to that level regardless of how many in the class
receive the same grade. An outside evaluator, looking at the grade, knows only that the student has
reached a certain level or set of objectives. The useful of that informational to the outsider will depend on
31
Module in Ed 310: Assessment in Learning I
how much information he or she is however, will always mean the same thing and will not vary from the
class to class. A possible problem with this is that outside factors such as those discussed under norm-
referenced grading might influence the entire class and performance may drop. In such a case all the
student would receive lower grades unless the instructor made special allowances for the circumstances.

A second problem is that criterion-referenced grading does not provide “Selection” information.
There is no way to tell from the grading who the “best” students are, only that certain student have
achieved certain levels. Whether one view this as positive or negative will depend on one’s individual
philosophy.

An advantages of this system is that the criteria for various grades are known from the beginning.
This allows the student to take some responsibility for the level at which he or she is going to perform.
Although this might result in some students working below their potential, it usually inspires students to
work for a high grade. The instructor is then faced with the dilemma of a lot of students receiving high
grades. Some people view this as a problem.

A positive aspect of this foreknowledge is that much of the uncertainty which often accompanies
grading for student is eliminated. Since they can plot their own progress towards the desired grade, the
student have little uncertainly about where they stand.

2. Should grades reflect achievement only or non-academic components such as attitude, speed and
diligence?

It is a very common practice to incorporate such things as turning in assignment on time into the
overall grade in a course, primarily because the need to motivate students to get their work done is a real
problem for instructors. Also it may be appropriate to the selection function of grading that such values as
timeless and diligence be reflected in the grades. External user of the grades may be interpreting the mark
to include such factors as attitude and compliances in addition to competence in the material.

The primary problem with such inclusion is that it makes grades even more ambiguous than they
already are. It is very difficult to assess these nebulous traits accurately or consistently. Instructor must use
real caution when incorporating such value judgments into final grade assignment. Two steps instructor
should take are (1) to make students aware of this possibility well in advantage of grade assignment and
(2) to make clear what behaviour is included in such qualities as prompt completion of work and neatness
or completeness.

3. Should grades report status achieved or amount of growth?

This is a particularly difficult question to answer. In many beginning classes, the background of
the students is so varied that some students can achieve the end objectives with little or no trouble while
others with weak backgrounds will work twice as hard and still achieve only half as much. The dilemma
results from the same problem as the previous question, that is, the feelings that we should be rewarding
or punishing effort or attitude as well as knowledge gained.

A positive aspect of this foreknowledge is that much of the uncertainly which often accompanies
grading for students is eliminated. Since they can plot their own progress toward the desired grade, the
students have little uncertainly about where they stand.

There are many problems with “growth” measures as a basis for change, most of them related to
statistical artifacts. In some cases the ability to accurately measure entering and exiting levels is shaky
enough to argue against change as a basis. For grading. Also many courses are prerequisite to later courses
and therefore, are intended to provide the foundation for those courses. “Growth” scores in this case
would be disastrous.

Nevertheless, there is much to be said in favour of “Growth” as a component in grading. We would


like to encourage hard work and effort and knowledge the existence of different abilities. Unfortunately,
there is no easy answer to this question. Each instructor must review his or her own philosophy and
content to determine if such factors are valid component of the grade.

32
Module in Ed 310: Assessment in Learning I
4. How can several grades on diverse skills combine to give a single mark?

The basic answer is that they can’t really. The results of instruction are so varied that single mark
is really a “Rube Goldberg” as far as indicating what a student has achieved. It would be most desirable to
be able to give multiple marks, one for each of the variety of skills which are learned. There are of course,
many problems with such a proposal. It would complicate an already complicated task. There might not
be enough evidence to reliably grade any one skill. The “Halo” effect of good performance in one area
could spill over into others. And finally, most outsider are looking for only one overall classification of
each person so they can choose the “best”. Our system require that we produce one mark. Therefore, it is
worth our while to see how that can be done even though currently the system does not lend itself to any
satisfactory answer.

8.5. Standardized Test Scoring

Test standardized is a process by which teacher or researcher-made tests are validated and item
analysed. After a thorough process of validation, test characteristic are established. These characteristic
include: test validity, test reliability, test difficulty level and other characteristic as previous discussed.
Each standardized test uses its own mathematical scoring system derived by the publisher and
administrators, and these do not bear any relationship to academic grading systems. Standardized tests are
psychometric instrument whose scoring systems are developed by norming the test using national samples
of test-takers, centering the scoring formula to assure that the likely score distribution describes a normal
curve when graphed, and then using the resulting scoring system uniformly in a manner resembling a
criterion-referenced approach. If you are interested in understanding and interpreting the scoring system of
a specific standardized test, refer to the policies of the test’s producers.

8.6 Cumulative and Averaging Systems of Grading


In the Philippines there are two types of grading systems used: the averaging and cumulative
grading systems. In the averaging system, the grade of the students on a particular grading periods equals
the average of the grades obtained in the prior grading periods and the currents grading period . in the
cumulative grading system, the grade of the student in a grading period equals his current grading period
grade which is assumed to have the cummulative effects of the previous grading periods, in which grading
system would there be more fluctuations observed in the students’ grades? How do these system relate
with their norm or criterion-referenced grading?

8.7 Policy Guidelines on classroom assesment for the Kto12 Basic Education, DepEd
Order No. 8, s. 2015
Below are some of the highlights of the new K to 12 Grading System which was implemented starting SY
2015-2016. These are all lifted from DepEd order NO. 8, s. 2018

Weights of the components for the different Grade Levels and Subjects
The students grade is a function of three components: 1)written work, 2)performance tasks and
3)quarterly assessment. The percentages vary across clusters of subjects. Languages, Araling Panlipunan
(AP) and Edukasyon sa pagpapahalaga (ESP) belong to one cluster and have the same percentages for
written work, performance tasks and quarterly assessment . sience and Math are another cluster with the
same another component percentages. Music, Arts, Physical Education and Health (MAPEH) make up
the third cluster with the same component percentages. Among the three components, performance tasks
are given the largest

percentages. This means that the emphasis on assessments is on application of the concepts learned.

Table 4. Weight of the components for Grades 1-10

33
Module in Ed 310: Assessment in Learning I
COMPO LANGU A E SCIE M MA E
NENTS AGES P s NCE AT PE P
P H H P
/
T
L
E
Written 30% 40% 20%
1 work
Performa 50% 40% 60%
T nce tasks
O Quarterly 20% 20% 20%
assessme
1 nt
0

Table 5. Presents the weights of the components of the Senior High School subjects which are grouped
into 1) core subjects, 2) all other subjects ( applied specialization) and work immersion of the academic
track , and all 3) all other subjects (applied specialization ) and work immersion / research/ exhibit /
performance. An analysis of the figures reveal that among the components, performance tasks have the
highest percentage contribution to the grade. This means that DepEd’s grading system consistently puts
the most emphasis on application of learned concepts and skills.

Table 5. Weight of the Components for SHS.

Academic Track Technical-Vocational and


Livelihood (TVL/ Sports/
Arts and design Track
Core
subjec Work Work
ts All Immersion All Immersion
other / Research/ other / Research/
subjec Business subjec Exhibit/
ts Enterprise ts Performan
Simulation ce
/ Exhibit
Performan
ce
Written 25% 25% 35% 20%
1 works
1 Performan 50% 45% 40% 60%
T ce tasks
o Quarterly 25% 30% 25% 20%
1 assessment
2

8.8 Steps in Grade Computation


Based on the same DepEd Order ( 8, s. 2015). Here are the steps to follow in computing grades.

Table 6. steps in computing grades.


STEPS EXAMPLE
Learners raw score Highest Possible Score

34
Module in Ed 310: Assessment in Learning I
Written work 1 18
20
Written work 2 22
25
Written work 3 20
20
1. Get the total
Written work 4 17
score for
20
each
Written work 5 23
components.
25
Written work 6 26
30
Written work 7 19
20

TOTAL 145
160

Learner’s Raw Score Highest Possible


Score
Performance task1 12 15
Performance task2 13 15
Performance task3 19 25
Performance task4 15 20
Performance task5 16 20
Perfotmance task6 25 25

TOTAL 100
120

Learner’s Raw Score


Highest Possible Score
Quarterly 40 50
assessment

Percentage Score (PS)= [145]X 100%


160
Ps OF Written Work is 90.63.
2. Divide the Percentage Score (PS) = [100]X 100%
total raw 120
score by the PS of Performance Task is 83.33.
highest
possible score Percentage Score (PS) = [40] x 100%
then multiply 50
the qoutient PS of Quarterly Assessment is 80. 00.
by 100%.

STEPS EXAMPLE

3. Convert percentage Written work for English Grade 4 is 30%


scores to weighted Weighted score (WS) = 90.63 x 0,30
scores . The Weighted score of Written Work in 27.19.
Multiply the
percentage score by Performance Tasks for English Grade 4 is 50%
the weight of the Weighted Score (WS) = 83.33 x 0.50.
component The Weighted Score of Performance Tasks is 41.67
indicated in table 4
or 5. Quarterly Assessmnet for English Grade 4 is 20%
Weighted Score(WS) = 80.00 x 0.20
35
Module in Ed 310: Assessment in Learning I
The Weighted Score of Quarterly Assessment is 16.
( The scores can be found in the sample class record on Table 6.)
Component weighted Scores
4. Add the weighted Written Works =27.19
scores of each Performance Tasks= 41.67
component. Quarterly Assessment = 16.00
The result will be TOTAL 84.86
the initial grade.
The Initial Grade is 84.86

5. Transmute the The Initial Grade is 84. 86


Initial Grdaeusing
the Transmutation The Transmuted Grade is 90.
Table in Appendix
B. The Quarterly Grade in Englsh for the first Quarter is 90.

This is reflected in the Report Card.

For MAPEH, individual Grades are given to each area namely, Music, Arts, Physical Education,and
Health. The quarterly grade for MAPEH is the average of the quarterly grades in the four areas.

Quarter Grade = QG for Music + QG for Arts + QG for PE + QG for Health.


(QG) for MAPEH

8.9. GRADE COMPUTATION


What follows is a description of how Grades are computed based on DepED Order 8, s .2015.

For kindergarten
There are no numerical grades in Kindergarten. Description of the Learners’ progress on the various
learning areas are presented using checklist and students portfolios. These are presented to the the parents
at the end of each quarter for discussion. Additional guidelines on the Kindergarten program will be
issued.

For Grade 1-10


The average of the Quarterly Grades(QG) produces the Final Grade.
Final
Grade by = 1st Quarter Grade + 2nd Quarter Grade + 3rd Quarter Grade + 4th Quarter
Grade
Learning Area 4

The general average is computed by dividing the sum of all final grades by the total number of learning
areas. Each learning area has equal weight.

General = Sum of Finals Grades of all Learning Areas


Average total number of Learning Areas in a Grade Level

36
Module in Ed 310: Assessment in Learning I
The final Grade in each learning area and the General Average are reported as whole numbers. Table 7
shows an example of the Final Grade of the different learning areas and General; Average of a Grade 4
student.

Table 7. Final Grades and General Average

Learning Area Quarter Final


Grade
1 2 3 4
Filipino 80 89 86 84 85
English 89 90 92 87 90
Mathematics 82 85 83 83 83

Science 86 87 85 84 86

Araling Panlipunan 90 92 91 89 91

Edukasyon sa 89 93 90 88 90
Pagkakatao
Edukasyong Pantahan 80 81 84 79 81
at Pangkabuhayan
MAPEH 85 86 85 84 85
GENERAL 86
AVERAGE

For Grades 11- 12

The two quarters determine the Final Grade in a semester. Table 9. Shows an example for Grade 11,
second semester for the Accounting, Business, and Managaement (ABM) strand.

Quarter Second
Subjects 3 4 semester
Final
grade
Core subjects
Reading and Writing Skills 80 83 82
Pagbasa at Pagsusuri ng Iba’t ibang 86 85 86
Teksto tungo sa Pananaliksik.
Statistics and Probability 82 87 85
Physical Science 88 87 88
Physical Education and Health 90 88 89
Applied and Specialized Subjects
Empowerment technologies: ICT 80 83 82
for professional Tracks
Business Math 87 86 87
Organization and Management 85 81 83
Fundamentals of Accounting 84 81 83
Business and Management 1
General 85
Average for
the Semester
Table 8. Grade 11, 2nd Semester ABM Strand

8. 10 Reporting the Learner’s Progress


37
Module in Ed 310: Assessment in Learning I
The summary of learner progress is shown quarterly to parents and guardians through a parent-teacher
conference, in which the report card is discussed. The grading scale, with its corresponding descriptors are
in table 9. Remarks are given at the end of the Grade level.

Table 9. Descriptors

Descriptor Grading Scale Remarks


OUTSTANDING 90-100 PASSED
VERY SATISFACTORY 85-89 PASSED
SATISFACTORY 80-84 PASSED
FAIRLY 75-79 PASSED
SATISFACTORY
DID NOT MEET BELOW 75 PASSED
EXPECTATION

Using the sample class record in Table 9, LEARNER A received an Initial Grade of 84.86 in English for
the first Quarter, which, when transmuted to a grade of 90, is equivalent to Outstanding. LEARNER B
recieved a transmuted grade of 88, which is equivalent to a Very Satisfactory. LEARNER C recieved a
grade of 71, which means that the Learner Did Not Meet the expectations 9in the First Quarter of Grade 4
English.

When a learner’s raw scores are consistently below expectations in Written Work and Performance Task,
the learner’s parents or guardians must be informed not later than the fifth week of that quarter. This will
enable them to help and guide their child to improve and prepare for the Quarterly Assessment. A learner
who receives a grade below 75 in any subjects in a quarter must be given intervention through remediation
and extra lessons for the teacher/s of that subject.

8.11 Promotion and Retention at the end of the School Year


These are what DepED Order 8, s. 2015 say.

A final grade of 75 or higher in all learning areas allows the student to be promoted to the next year level.
Table 10 specifies the guidelines to be followed for learner promotion and retention.

Table 10. Learner Promotion and Retention

Requirements Decisions
1. Final Grade of at least 75 in Promoted to the next level
all learning areas
For 2. Did not meet expectation in Must pass remedial classes for
Grades 1 not more than two learning learning areas with failing mark to be
to 3 areas. promoted to the next grade level.
Learners Otherwise the learner is retained in the
same grade level.
3. Did not make expectation in Retained in the same grade level
three or more learning areas
1. Final Grade of at least 75 in Promoted in the same grade level
all learning areas
For 2. Did not meet Expectations in Must pass remedial classes for
Grades 4 not more than two learning learning areas with failing mark to be
to 10 areas promoted to the next grade level.
Learners Otherwise the learner is retained in the
same grade level.
3. Did not meet expectations in Retained in the same grade level.
38
Module in Ed 310: Assessment in Learning I
three or more learning areas

8.12. Alternative Grading System


Pass-Fail Systems. Other colleges, and universities, faculties, schools, and institution use pass-fail
grading systems in the Philippines, especially when the student’s works to be evaluated is highly
subjective ( as in the fine arts and music), there are no generally accepted standard gradations(as with
independent studies), or the critical (as in some professional examinations and practicum).

Non-Graded Evaluation. While not yet practiced in Philippines Schools, and institutions, non-
graded evaluations do not sign numeric or letter grades as a matter of policy. This practice is usually based
on a belief that grades introduce an inappropriate and distracting element of competition into the learning
process, or that they are not as meaningful as measures of intellectual growth and development as are
carefully crafted faculty evaluation. Many faculty, schools and institutions that follow a no-grade policy
will, if requested, produce grades or convert their student evaluations into formulae acceptable to
authorities who require traditional measures of performance.

The process of deciding on grading system is a very complex one. The problems faced by an instructor
who tries to design a system which will be accurate and fair are common to any manager attempting to
evaluate those for whom he or she is responsible. The problems of teachers and students with regard to
grading are almost identical to those of administrators and faculty with regard to evaluation for promotion
and tenure. The need for completeness and objectivity felt by teachers and administrators must be
balanced against the need of fairness and clarity felt by students and faculty in their respective situations.
The fact that the faculty member finds himself or herself in both the position in all evaluator and evaluated
should help to make him or her more thought about the needs of each position.

8.13. Exercises
1. Define a norm-referenced grading. What are some of the issues that confront a teacher using a norm-
referenced grading system? Discuss.

2. The following final grades are obtained in a class of Grade- VI pupils:

80,81, 82,83,84,82,79,77,88,83,89,90,91,90,78,79,82,91,92,90,
88,85,88,87,85,88,83,82,80,79,77,76,77,78,83,89,91,90,83,88,86,83,80
a. Using a norm-referenced grading with a seven-point scale, determine the scores that would get a failing
mark. What is your general impression on this?
b. Using a norm-referenced grading with an 8 point grading-scale, determine the scores that would get a
failing mark. Compare this with the previous grading system above.
3. Define a criterion-referenced grading. What are some of the issues that confront the teacher using a
norm-referenced grading system? Discuss.

4. Using the data in problem b, set a passing criterion of 78 and set equal intervals for all other grades
above the passing criterion. How does your result compare with those of norm-referenced grading? In
which grading system do you feel more comfortable?

5. In a class of 100 pupils, the mean score in a test was determine to be 82 with a standard deviation of 7.
Construct an 8-point grading scale using the standard normal curve in a norm-referenced grading.

6. Discuss in your own words, he four essential questions in grading provided by Svinicki. Do you agree
or disagree with he own points of view? Justify.

7. Would you use the norm-referenced grading system in your own class? Why or Why not?

39
Module in Ed 310: Assessment in Learning I
8. When would a norm-referenced grading system be most appropriate to use? Similarly, when would a
criterion –referenced grading system be most appropriate to use?

9. Compute the grade of the student in:

a) Grade 9 English with the following raw scores


Written work – 80 out of 100
Performance task – 60 out of 100
Score in quarterly – 50 out of 100
b) Grade 11 student in Introduction to the Philosophy of the Human Person, a core subject in SHS with the
following raw scores:
Written work – 30 out of 50
Performance task – 42 out of 60
Score in quarterly – 28 out of 40
c) Grade 3 student in the Mother Tongue subject: with the following raw scores:
Written work – 80 out of
Performance task – 60 out of 100
Score in quarterly – 50 out of 100

REFLECTIONS
Make reflections on the grading system in the Philippine education.
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________
___________________________________________________________________________________________

ENRICHMENT

Compilations of different literature, researches, articles and notes about Module 8 and its relevant topics.

40

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy