0% found this document useful (0 votes)
209 views

Section 3.2 Improving A Classroom-Based Assessment Test

Uploaded by

Mayraniel Ruizol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
209 views

Section 3.2 Improving A Classroom-Based Assessment Test

Uploaded by

Mayraniel Ruizol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
section Improving a Classroom-Based Assessment Test = Pretest Discuss and answer the following questions based on the given distracter analysis table. errs a) ei DUT w=ao 5% index (0) ® Upper | 2 CE a 1 038 0.35 tower | 2 0 2 6 Upper | 2 “4 [0 | 4 2 0.45 0.50 er Lower | 5 4 1 0 1. Inllem 1, what could a frequency of 10 in option 8 possibly suggest? 2. Inltem 1, which allematives need to be revised? Cite your reason for each one and sugges! whal could be done. 3. Initem 2, which altematives need to be revised? Cite your reason for each one: and suggest what could be done? 4, Draw @ distracter analysis table giving hypothetical frequencies for four alternatives for a positively discriminating item, Class size is 30. Learning Outcome Al the end of Section 3.3, students are expected to: = Acquire procedures for improving a classroom-based assessment test. ‘Assessment in Learning | [ag Time Frame Week 14 aa Materials Needed Atticles and journals Writing materials ae Suggested Activities Below are descriptions of procedures done to review and improve items. On the space provided, write J if ajudgmental approach is used and E if empirically-bosed. The Math Coordinator of Grade Vil classes examined the periodical tests prepared by the Math teachers to see if their items are aligned to the target outcomes for the first quarter. 2. The ollematives of the multiple-choice items of the Social Studies test were reviewed to discover if they have only one corect answer. 3, To delermine if the iems are efficiently discriminating between the more able students from the less able ones, a Biology teacher obtained the discrimination index (D) of the items. 4, A Technology Education teacher was interested to see if the criteion- referenced test he has devised shows a difference in the items' post- test and pre-test p-values. 5. An English teacher conducted a session with his students to find out if there are other responses acceptable in their literature test. He encouraged them to rationalize their answers. Assessment in Learning Content Overview By the time you reach Section 3.3, itis assumed that you have known how to plan a classroom test by specifying the purpose for constructing it, the instructional outcomes to be assessed, and preparing a test blueprint to guide the construction process. The techniques and strategies for selecting and constructing different item formats fo match the intended instructional outcomes make up the second phase of the lest development process which is the content of the preceding section. The process however is not complete without ensuring that the classroom instrument is valid for the purpose for which it is intended. Ensuring requires reviewing and improving the items which is the third phase of the process. This section therefore deals with providing practical and necessary ways for improving teacher-developed assessment tools. Popham (2011) suggests two ‘approaches to undertake item improvement: the judgmental approach and the empirical approach, JUDGMENTAL ITEM-IMPROVEMENT This approach basically makes use of human judgment in reviewing the items. The judges are the teachers themselves who know exactly what the test is for, the instructional outcomes to be assessed, and the items’ level of difficulty appropriate to his/her class; the teacher's peers or colleagues who are familiar with the curriculum standards for the target grade level, the subject matter content, and the billy of the leamers; and the students themselves who can perceive difficulties based on their past experiences, Teachers’ Own Review Itis always advisable for teachers to take a second look at the assessment lool s/he has devised for a specific purpose. To presume perfection right away after its construction may lead to failure to detect shortcomings of the test or assessment tosk, There are five suggestions given by Popham (2011, p. 253) for the teachers to follow in exercising judgment: 1, Adherence to item-specific guidelines. and general _item-writing commandments. The preceding section has provided specific guidelines in writing various forms of objective and non-objective constructed-response types and the selectedvesponse types for measuring lower-level and higher-level thinking skils. These guidelines should be used by the teachers to check how Assessment in Learning 0 good the items have been planned and witten particularly in their alignment to intended instructional outcomes. 2. Contribution to score-based inference. The teacher examines if the exoected scores generated by the test can contribute fo making valid inference about the leamers. Can the scores reveal the amount of learning achieved or show what have been mastered? Can the score infer the student's capably to move on fo the nex! instructional level? Or rather the scores obtained do not make any difference at all in describing or differentiating various abilties. 3, Accuracy of content. This review should especially be considered when tests have been developed after a certain period of ime. Changes that may occur due to new discoveries or development can redefine the test content of a summative test. If this happens, the items or the key to correction may have to be revisited. 4, Absence of content gaps. This review criterion is especially useful in strengthening the score-based inference capability of the test. If the current tool misses out on important content now prescribed by a new curriculum standard, the score will likely not give an accurate description of what is expected to be assessed. The teacher always ensures that the assessment tool matches what is currently required fo be learned, This is a way fo check on the content validily of the test. 5. Falmess. The discussions on item-writing guidelines always give waming on unintentionally favoring the uninformed students obtain higher scores. These are due inadvertent grammatical clues, unattractive disiracters, ambiguous problems, and messy test instructions. Sometimes, unfaimess can happen because of due advantage received by a particular group like those seated in front of the classroom or those coming from a particular socio-economic level. Gelling rid of faully and biased items and wailing clear instructions definitely add to the fairness of the test Peer Review There ate schools that encourage peer or collegial review of assessment instruments among themselves, Time is provided for this activity and it has almost always yielded good results for improving tests and performance-based assessment tasks. During these teacher dyad or triad sessions, those teaching the same subject area can ‘openly review together the classroom tests and tasks they have devised against some consensual criteria. The suggestions given by test exoerts can actually be used collegially as basis for a review checklist: Assessment in Learning 2. Do the items follow the specific and general guidelines in writing items especially on: "being aligned to instructional objectives? "= making the problem clear and unambiguous? + providing plausible options? + avoiaing unintentional clues? = having only one correct answer? b. Are the items free from inaccurate content? cc. Are the items free from obsolete content? d. Are the fest instructions clearly written for students to follow? e. Is the level of difficulty of the test appropriate to level of leamers? f. Is the test fair to ail kinds of students? Student Review Engagement of students in reviewing items has become a laudable practice for improving classroom tests. The judgment is based on the students’ exoerience in taking the test, their impressions and reactions during the testing event. The process con be efficiently carried out through the use of a review questionnaire. Popham (2011) illustrates a sample questionnaire shown in Table 1. It is better to conduct the review activity a day after faking the test so the students stil remember th ‘experience when they see a blank copy of the test. Table 1. Item-Improvement Questionnaire for Students If any of the items seemed confusing, which ones were they? Did any items have more than one correct answer? If so, which ones? Did any items have no correct answers? If so, which ones? Were there words in any items that confused you? If so, which ones? Were the directions for the test, or for particular subsections, unclear? If so, which ones? Another technique of eliciting student judgment for item improvement is by going ‘over the test with his/her students before the results are shown. Students usually enjoy this activity since they can get feedback on the answers they have written. As they tackle each item, they can be asked to give their answer, and if there is more than ‘one possible correct answer, the teacher makes notations for item-alterations. Having more than one correct answer signals ambiguity either in the stem or in the given options. The teacher may also take the chance to observe sources of contusion especially when answers vary. During this session, it is important for the teacher to maintain an atmosphere that allows students to question and give suggestions. It also follows that after an item review session, the teacher should be willing fo modify the incorrect keyed answers. Assessment in Learning | fi EMPIRICALLY -BASED PROCEDURES. ltem-improvement using empiticaly-based methods is aimed at improving the uaity of an item using students' responses to the test. Test developers refer fo this, technical process as item analysis as it utilizes data obtained separately for each item. An item is considered good when its quality inaices, i.e. difficulty index and discrimination index, meet certain characteristics. For a norm-referenced test, these two indices are related since the level of difficulty of an item contributes to its Giscriminability. An item is good if it can discriminate between those who perform well in the test and those who do not. However, an exiremely easy item, that which can be answered correctly by mere than 85% of the group, or an extremely difficult tem, that which can only be answered correctly by 15%, is not expected to perform. well as a “discriminator”. The group will appear fo be quite homogenous with items of this Kind. They are weak items since they do not contribute to “score-based inference’. The difficulty index, however, takes a different meaning when used in the context of criterion-referenced interpretation or testing for mastery. An item with a high difficulty index will not be considered as an “easy item” and therefore a weak item, bu! rather ‘an item that displays the capability of the leamers to perform the expected ‘outcome. It therefore becomes an evidence of mastery. Particularly for objective tests, the responses are binary in form, i.e. right or wrong, translated into numerical figures as 1 and 0, for obtaining nominal data tke frequency, percentage, and proportion. Useful data then are in the form of: ‘©. Total number of students answering the item (7) b. Total number of students answering the item right (R) Difficulty Index An item's difficulty index is obtained by calculating the p value (p) which is the proportion of sludents answering the item correctly. = SIs where p is the difficulty index ‘otal number of students answering the item right jotal number of students answering the item Assessment in Learning 3 Here are two illustrative samples: Item 1: There were 45 students in the class whe responded to Item 1 and 30 answered it correctly. Item | has a p valve of 0.67. Sixty-seven percent (67%) got the item right while 33% missed it. pa? B P= 067 Mem 2: In the same class, responded correctly in Item 2. only 10] Item 2 has a p value of 0.22. Out of 45 only 10 or 22% got the item right while 35 oF 78% missed it. p- 2 B P=022 For Normative-referenced test: Between the Iwo items, Item 2 appears lo be 0 much more difficult item since less than a fourth of the class only was able to respond correctly, For Criterion-referenced test: The class shows much better performance in Item 1 than in Item 2, Itis stil a long way for many to master Item 2. The p-value ranges from 0.0 fo 1.0, which indicates from exiremely very difficult os no one gol it correctly lo exiremely very easy os everyone got it correct. For binary- choice items, there's a 50% probability of getting the item correct simply by chance. For multiple-choice items of four altematives, the chance of obtaining a correct onswer by guessing is only 25%. This is an advantage of using multiple-choice questions over binary-choice items. The probability of getting a high score by chance is reduced Discrimination Index As earlier mentioned, the power of an item to aiscriminate between informed and uninformed groups or between more knowledgeable and less knowledgeable learners is shown using the item-discrimination index (D). This is cn item statistics that can reveal useful information for improving an item. Basically an item-aiscrimination index shows the relationship between the student's performance in anitem (i. right ‘or wrong} and his/her total performance in the tes! represented by the tolal score. Iterr-total correlation is usually part of a package for itern analysis. Getting high item- total correlations indicate that the items contribute well to the total score so that responding correctly to these ilems gives a better chance of obtaining relatively high total scores in the whole test or subtest. Assessment in Learning For classroom tests, the discrimination index shows if a difference exisis between the performance of those who scored high and those who scored low in an ilem. As a general rule, the higher the discrimination index (D), ihe more marked the magnitude of the difference is, and thus, the more discriminating the item is. The nature of the difference however, can take different directions: . Positively discriminating item — Proportion of high scoring group is greater than that of the low scoring group. b. Negatively discriminating item — Proportion of high scoring group is less than that of the low scoring group, ¢. Not discriminating - Proportion of high scoring group is equal to that of the low scoring group. Calculation of the discrimination index therefore requires obtaining the difference between the proportion of the high-scoring group gelling the item correctly and the proportion of the low-scoring group getting the item correctly using this simple formula pu Bu oT. Tr where D is item discrimination index R,, = Number of upper group getting the item correct T, = Number of upper group lumber of lower group getting the item correct \umber of lower group Another calculation can bring about the same result as (Kubiszyn & Borich, 2010): where Ry, = Number of upper group getting the item correct R, = Number of lower group getting the item correct T = Number of either group ‘As you can see “is actually getting the p value of an item. So to get D is to get the difference between the p value involving the upper half and the p value involving the lower half. $0 the formula for discrimination index (D) can also be given as (Popham, 2011) Assessment in Learning D=Py-P ere as the p value for upper aroun (#) pris the pate torlower group (8) To obtain the proportions of the upper and lower groups responding to the item correctly, the teacher follows these steps: 2. Score the test papers using a key to correction to obtain the totel scores of the students. Maximum score is the total number of objective items. b. Order the test papers from highest fo lowest score. ¢. Spit the test papers into halves: high group and low group. > For a class of 50 orless students, do a 50-50 split. Take the upper hall as the HIGH Group ond the lower half as the LOW Group. F a big group of 100 or so, take the upper 25 — 27% and the lower 25 — 27%, > Maintain equal numbers of test papers for Upper and Lower groups. d. Obtain the p valve for the Upper group and p value for the Lower group. fu & ™ Paower) = 5, Peupper) e. Gel the discrimination index by getting the difference between the p-values. For purposes of evaluating the discriminating power of items, Pooham (2011) offers the guidelines proposed by Ebel & Frisbie (1991) shown in Table 2, The teachers can be guided on how to select the satisfactory items and what to do to improve the rest Table 2. Guidelines for Evaluating the Discriminating Efficiency of Items Pecan cca 0.40 and above Very good items Reasonably good items, but possibly subject to 030-039 improvement 0.20 - 0.29) Marginal items, usually needing improvement 0.19 and below Poor items, to be rejected or improved by revision Items with negative discrimination indices, although significantly high, are subject right away to revision if not Assessment in Learning cr 6 deletion. With multiple-choice items, negative D is a forensic evidence of errors in lem writing. Il suggests the possibilty of: > Wrong key. More knowledgeable students selected a distracter which is the correct answer but is not the keyed option Unclear problem in the stem leading to more than one correct answer. > Ambiguous distracters leading the more informed students be divided in choosing the attractive options, > Implausible keyed option which more informed students will not choose, v ‘As you can see, awareness of item-writing guidelines can provide cues on how to improve items bearing negative or non-significant discrimination indices. Distracter Analysis Another empirical procedure to discover areas for item-improvement utilizes an. ‘analysis of the distribution of responses across the aistracters. Especially when the difficully index and discrimination index of the item seem to suggest its being candidate for revision, distracter analysis becomes a useful follow-up. it can detect differences in how the more able students respond to the distracters in a multiple- choice item compared to how the less able ones do it. I can also provide an index of the plausibility of the altematives, thats if they are functioning as good distracters. Distracters not chosen at all, especially by the uninformed students need to be revised lo increase their attractiveness. To illustrate this process, consider the frequency distribution of the responses of the upper group and the lower group across the alternatives for two items. Separate counts are done for the upper and lower group who chose A. B, C, and D. The data is organized in a distracter analysis table. Table 3. Distracter Analysis Table Pree a es ee Index index (D) baad ) Upper | 2 Oe 3 0.38 0.35 ee lower | 2 0 2 6 Upper | 2 -4 | 10 4 045, 050 lower |__5 4 1 0 "What kinds of items do you see based on their D? + What does their respective D indicate? Cite the data supporting this. = Which of the two items is more discriminating? Why? "Which items need to be revised? Assessment in Learning cr i] Sensitivity to Instruction Index The techniques earlier discussed make use of responses obtained from single administration of a test. The indices obtained for difficulty, discrimination, and option plausibility ore seen as helpful statistics for item-improvement of norm+eferenced or summative tests given after a period of instruction, Another empirical approach for reviewing test items is to infer how sensitive an item has been to instruction. This is referred to as sensitivity fo instruction index (Si) and it signifies a change in student's performance as a result of instruction. The information. is useful for criterion-referenced tests which aim at determining it mastery learning has been attained, after a designated or prescribed instructional period. the basic question being addressed is a directional one, i... is student performance better after instruction is given. in the context of item performance, Si will indicate itp value ‘obtained for the item in the post-test is greater than the p value in the pre-test. Consider an item where in a class of 40, 80% answered it corectly in the post-test while only 10% did it in the pre-test. lis p value for the post-test is 0.80 while for pre-test is 0.10, thus Si = 0.70 following this calculation: Sensitivity to instruction (Sé) = Pipose) ~ Pipre) Si = 0.80 ~ 0.10 Si= 0.70 Notice that the calculation for Si carries the same concept as the discrimination index except that the difference in proportion is obtained between post-test and pre-test given to the same group. Similar to D interpretation, the higher Si value is the more sensitive the item is in showing the change as a result of instruction. This item statistics gives adaitional information regarding the efficiency and validity of the tem. There could however, be reasons if Si of a test item does not register a meaningful difference between post-test and pre-test. Especially for a knowledge level item, it is possible that it was not taken at all during instruction so the students did not have. the chance to leam it or they already know the item prior fo instruction. The teacher should take note of these items for reviewing content coverage for the period [E. taterence De Guzman, £. & Adamos, J. (2015). Assessment of Leaming 1. Adriana Publishing Company: Quezon City Assessment in Learning 8 ee Posttest This component will test your ability to apply empirical procedures for item- improvement. 1, A final test in Science was administered to a Grade IV class of 50. The teacher wants to improve further the items for next year's use, Calculate a quality index that can be used using the given data and indicate the possible revision needed by some items. rd Pern) Peru nce CE ory Coy 2. Below are additional data collected for the same items. Calculate another quality index and indicate what needs to be improved with the obtained index 3. A distracter anaiysis table is given for test item given to a class of 60. Obtain the necessary item statistics using the given data. Alternatives a) a DUC) ee et rN a oy Upper | 2 18 5 0 1 2 ? Lower | a 1 | 2 | 0 Write your evaluation on the following aspects of the item Difficulty of the item — b. Discriminating power of the em — ©. Plausibility of options ~ d. Ambiguity of the answer — section Interpretation of Assessment Results Rea cae) Interpret a given set of assessment data for reporting purposes. ‘Assessment of student performance is to know how the student isimproving in a course implying its teacher's performance with respect to the teaching process. One of the

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy