Multiple Choice Questions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

EDP 3501K

Testing and Assessment in Education

Report on Item Analysis:

Multiple Choice Questions

Name (Matric Number):

Anis Suraiya binti Mat Naji (1412298)


Nur Syafiqah Syahirah binti Indra Gunawan (1412186)
Nur Zulaika binti Abu Shah (1418242)
Nurul Amani binti Hassan (1415832)
1.0 INTRODUCTION

1.0.1 Background

According to Terry Overton (2006), “test is a method to determine a student’s ability


to complete certain tasks or demonstrate mastery of a skill or knowledge of content” (p.1). The
definition shows that test is a way to examine someone's knowledge of something to determine
what he or she knows or has learned. Test and assessment seems like the same in their meaning
but they totally bring different definition. Assessment is the process of gathering information
to monitor progress and make educational decisions if necessary. The goal of assessment is to
make improvements. In an educational context, assessment is the process of describing,
collecting, recording, scoring, and interpreting information about learning.
Assessment and testing are important in education. Do you ever wonder why they are
important? The educators use ongoing assessments to determine their students’ ability levels
in various academic areas and to guide their instruction. In the realm of special education, the
assessment process is absolutely essential. Parents, teachers, specialists and counsellors depend
on multiple assessments to identify a student’s strengths, weaknesses and progress.
Hence, we have constructed the item analysis to experience the process of making a test
or assessment. In this paper, we focus on the item analysis on multiple choice questions on
mathematics subject in form 4 and explore the problems happens while constructing a test for
student. Item Analysis is an important (probably the most important) tool to increase test
effectiveness. It is necessary to examine whether they are measuring the fact, idea, or concept
for which they were intended in order to write effective items. This is done by studying the
student's responses to each item (multiple choice questions).
We choose Sekolah Menengah Kebangsaan Ismail Petra As our destination to distribute
the exam question that we have construct based on the table of specification. With full co-
operation from teachers, we are able to pass our test to two different classes which are form 4
Ibnu Sina and form 4 Ibnu Hayyan. The total students that we have tested are 28, where they
are actually from upper class level.
1.0.2 Objectives
The main objective of this paper is to determine the quality of multiple choice question
based on item analysis. From this main objective, we are able to investigate the performance
of items considered individually either in relation to some external disturbance or in relation to
the remaining items on the test.

1.0.3 Significance
Item analysis is important in constructing exam questions. There are many significance
that we can abstract from item analysis especially for students, teacher and school.
In term of students, item analysis brings much significance for them. Item analysis
provides a basis for the general improvement of classroom instruction. As we know, all
students in a classroom have different ability to understand different subject, method of
teaching and environment. There exist some students who are having difficulty in
understanding a subject matter and sometimes take about 2 month to adapt the method of
teaching given by the teachers. Ergo, by having the item analysis, we can identify the problem
that the student having during the learning because the result can be seen in exam questions.
Item analysis is useful in helping teacher to determine which items to keep, modify, or
discard on a given test; and how to finalize the score for a student. If you improve the quality
of the items on a test, you will improve the overall quality of the test as well as improving both
reliability and validity. Moreover, item analysis data provide a basis for efficient class
discussion of the test results. We can use the data to present it to students, so that they know
their strength and weakness. Item analysis data also provide a basis for remedial work. A
remedial action is intended to correct something that is wrong or to improve a bad situation.
So, the exam instructor or teacher can get immediate remedial work in improving the exam
question, assessment and test. Furthermore, item analysis procedures provide a basis for
increased skill in test construction. If the experience in making item analysis is more, teacher
will eventually improve their skills in making a test.
The significance of item analysis for school is the school are able to store the all the
information about student learning and achievement in the orderly manner. The marking
system of the school will be more neatly arranged in the software of computer (usually
Microsoft Excel). In addition, the item analysis creates an effective way for school to identify
the low, middle and high achiever. If the number of high achiever is high, the school have the
ability to upgrade their title from Bestari to Cluster school or from Cluster to High Performance
School.
1.0.4 Literature Review
Perhaps “item difficulty” should have been named “item easiness” because it indicates
the proportion of examinees that got the item right; also referred to as the p-value. The range
is from 0% to 100%, or more typically written as a proportion of 0.0 (none of the students
answered the item correctly) to 1.00 (all of the students answered the item correctly). The
higher the value, the easier the item. A high percentage indicates an easy item/question and
low percentage indicates a difficult item. In general, items should have values of difficulty no
less than 20% correct and no greater than 80%. Very difficult or very easy items contribute
little to the discriminating power of a test.
The suitable average level of difficulty proposes by experts for a four-option multiple
choice test should be between 60% and 80%. The average level of difficulty within this range
can be acquired, when the difficulty of individual items falls outside of this range. For example;
a) On a four-alternative multiple-choice item, the random guessing level is 1.00/4 =
0.25; therefore, the optimal difficulty level is 0.25 + (1.00 - 0.25) / 2 = 0.62.
b) On a true-false question, the guessing level is (1.00/2 = 0.50) and, therefore, the
optimal difficulty level is 0.50 + (1.00-0.50)/2 = 0.75.

Both types of question show a normal difficulty of item. p-values above 0.90 are very easy
items and should be carefully revised by instructor.
Let say an item having a low difficulty value which is less than 0.20. There must be a
several possible causes that happen before, during and after doing the test. p-values below 0.20
are very difficult items and should be reviewed for possible confusing language, removed from
subsequent exams, and/or identified as an area for re-instruction. Other than that, the item may
be too challenging because it involve all level of ability of the class. If almost all of the students
get the item wrong, there is either a problem with the item or students were not able to learn
the concept. However, if an instructor want to discover the top percentage of students, this
highly difficult item may be necessary.
Hence, what should we aim for? Experts suggest that the best approach is to aim for a
mix of difficulties. That is, a few very difficult, some difficult, some moderately difficult, and
a few easy.
The item discrimination is a measure of how well the item is able to distinguish or
discriminate between examinees who are knowledgeable and those who are not, or between
masters and non-masters. The common ways to compute item discrimination is the point-bi-
serial correlation which shows the degree of relationship between scores on the item correct or
incorrect) and total test scores.
The examinees who responded to the item correctly are highly discriminating and did
well on the test, while the examinees who responded to the item incorrectly did poorly on the
test. The possible range of the discrimination index is -1.0 to 1.0; however, if an item has
discrimination below 0.0, it suggests a problem. When an item is discriminating negatively,
overall the most knowledgeable examinees are getting the item wrong and the least
knowledgeable examinees are getting the item right. A negative discrimination index may
indicate that the item is measuring something other than what the rest of the test is measuring.
It may also occur as a result of several possible causes such as a miskeyed item, an item that is
ambiguous, or an item that is misleading. However, items with negative discrimination values
should be reviewed.
It is important to be aware that there is a relationship between an item’s difficulty and
its discrimination index when interpreting the value of discrimination. If an item has a very
high (or very low) p-value, the potential value of the discrimination will be much less than if
the item has a mid-range p-value. Theoretically, this makes sense. Students who know the
content and who perform well on the test overall should be the ones who know the content.
There’s a problem if students are getting correct answers on a test and they don’t know the
content.
Hence, what should we aim for? It is recommended that item discrimination should be
at least 0.20. It’s best to aim even higher. In all cases, action must be taken! So, items with
negative item difficulty must be given. Items with discrimination indices less than 0.20 must
be revised or eliminated. Be certain that there is only one possible answer, that the question is
written clearly, and that your answer key is correct.
2.0 METHOD

2.0.1 Preparation
We choose Sekolah Menengah Kebangsaan Ismaill Petra, one of the cluster school
located at Kelantan. The school consists of students from form 1 until form 6 where only
students who has a good achievement in UPSR or PT3 are chosen to be enrolled to the school.
After discussing with the team members, we decided to construct multiple choice question
(MCQ) in Mathematics subject for form 4 which contain 20 number of questions.
Before the table of specification is constructed, we had contact the Mathematics
teacher, Puan Normawati who did teach the subject at that school to find out on how much she
allocates for each topic in Mathematics Form 4 syllabus. All of the topics in the table of
specification in Appendix B has the same hour that she allocates in teaching the form 4 students
at Sekolah Menengah Kebangsaan Ismail Petra.
Appendix A is the exam questions that we had constructed. Based on the exam paper,
we have set a moderate level of hardness where we are sure that at least more than ¾ from 28
students can score at least 17 marks out of 20 questions. Based on the table of specification in
Appendix B the exam questions include Standard Form, Quadratic Expressions and Equations,
The Straight Line and Trigonometry II.
Since the school is quite far from Kuantan, one of our group member went to the school
to distribute the question papers with the help of the teacher from that school. We choose to
distribute the question to class number 1; Ibnu Sina, and class number 2; Ibnu Hayyan.
Supposedly, if we added the total number of both class, it should exceed the number of students
that we want which is only 28 students. Thus, we decide to choose them randomly between the
two classes. The examination is conducted in one class where tables and chairs are arranged by
4 tables in a column and 7 tables in a row. The examination is conducted under supervision by
Puan Normawati and our member group, Anis Suraiya.
3.0 ITEM ANALYSIS

Scoring

Sorting the mark

Divide the pile in half

Using only the top half of the class, record the


number of students who responded to each
alternative. Repeat with each lower half group.
(Record for each question)

Calculate the "Difference" of the


stuents aswer it right by subtracting the
lower group from upper group.

Calculate the "Discrimination" by


dividing the Difference with the number
of students who got it right in higher
mark group.

Calculate the total of students got it


from both group.

Calculate the "Difficulty" by dividing the


number of student who got it right by the
total students and times 100.
3.0.1 Mark
The table below shows the marks and percentages of the students in the test. The total mark
for the test is 20.

NO. NAME MARKS PERCENTAGE

1 NURUL NATASHA BINTI MOHD ZULHILMI 20 100


2 WAN NUR ATIRAH BINTI CHE AB. GHANI 19 95
3 FATEHAH BINTI BURHANNUDIN 18 90
4 NURUL AMIRAH BINTI AMRI 18 90
5 FARAH NATASHA BINTI ZALIZAN 17 85
6 MUHAMMAD SYARIFF BIN AHAMAD KAMIL 17 85
7 NIK NORFAZLIN WAHYU BINTI NIK FAUZI 17 85
8 NUR AIFAA BINTI NOOR AZLAN 17 85
9 MOHD IQBAL ARIFF BIN MAZRI 16 80
10 MUHAMMAD ASLAM BIN MAT RAZANI 16 80
11 MUHAMMAD BAIHAQI BIN BAKAR 16 80
12 NIK WANI BINTI NIK MOHD RUZNI 16 80
13 NOOR BATRISYIA IZZATIE BINTI MOHD ASELI 16 80
14 NUR AMALIN MUSLIHA BINTI MOHAMMAD SHUKOR 16 80
15 NUR FADLIN IZZAH 16 80
16 NUR FATNIN FARINA BINTI ABDULLAH 16 80
17 NURUL INTAN MAISARAH BINTI HUSNI 16 80
18 TENGKU HANIS NABIHAH BINTI TENGKU MOKHTAR 16 80
19 AINA SYAKIRAH BINTI AHAMAD 15 75
20 NURUL NATASYA BINTI ZAHARI 15 75
21 SHARIFAH HANAN BINTI ABDULLAH 15 75
22 AMIRUL HAKIM BIN ABDULLAH 14 70
23 NUR AQILAH BINTI AZMAN 14 70
24 MUHAMMAD AFIQ FAHMI BIN KAMARUDDIN 13 65
25 CHE RUSYAIREEN FARISYA BINTI CHE RIZAL NARANI 12 60
26 NORAISYAH ELLYANA BINTI ANUAR 12 60
27 NUR ARISHA BINTI AHMAD FITRI 12 60
28 SITI ZUBAIDAH BINTI CHE ISA 11 55
Table 1
3.0.2 Analysis
Analysis

Mean 15.57143
Standard Error 0.406393
Median 16
Mode 16
Standard Deviation 2.150428
Sample Variance 4.624339
Kurtosis 0.076463
Skewness -0.35804
Range 9
Minimum 11
Maximum 20
Sum 436
Count 28

Table 2
Mean
The mean for the student’s mark is 15.57 which show the average of the student’s mark around
15.57. Most of the students get 16 mark which about 80%.
Standard Deviation
As stated by Triola, “First, we should clearly understand that the standard deviation measures
the variation among values. Values close together will yield a small standard deviation,
whereas values spread farther apart will yield a larger standard deviation.”(pg. 82).
Since the standard deviation is 2.15, and this value is quite small so the mark for the students
are close together. There is no big difference between the marks of the students.
Minimum and Maximum
The minimum mark for the student is 11 while the maximum is 20. The 20 mark is the full
mark. There is only a person that get full mark same as the lowest mark which is a person.
3.0.3 Item Difficulty

Upper Lower Item


Item Differences Discrimination Total
Group Group Difficulty
A 14 10
B 0 4
1 4 4 24 86
C 0 0
D 0 0
A 1 3
B 0 0
2 2 7 24 86
C 13 11
D 0 0
A 0 0
B 0 2
3 2 7 20 71
C 3 3
D 11 9
A 0 2
B 0 0
4 3 5 25 89
C 0 0
D 14 11
A 0 2
B 0 0
5 2 7 26 93
C 0 0
D 14 12
A 7 5
B 0 2
6 1 14 11 39
C 1 2
D 6 5
A 13 10
B 0 2
7 3 5 23 82
C 0 0
D 1 2
A 1 0
B 0 0
8 0 24 86
C 1 2
D 12 12
A 0 0
B 14 14
9 0 28 100
C 0 0
D 0 0
A 2 4
B 0 2
10 3 5 9 32
C 6 5
D 6 3
A 8 9
B 2 1
11 -1 -14 17 61
C 4 1
D 0 3
A 13 14
B 0 0
12 -1 -14 27 96
C 0 0
D 1 0
A 11 13
B 3 0
13 -2 -7 24 86
C 0 0
D 0 1
A 1 4
B 5 5
14 2 7 12 43
C 7 5
D 1 0
A 0 0
B 1 1
15 1 14 25 89
C 0 1
D 13 12
A 0 0
B 0 1
16 5 3 21 75
C 1 5
D 13 8
A 13 11
B 0 1
17 2 7 24 86
C 1 2
D 0 0
A 0 0
B 1 2
18 2 7 18 64
C 3 4
D 10 8
A 0 0
B 1 2
19 6 2 20 71
C 13 7
D 0 3
A 0 2
B 0 0
20 2 7 26 93
C 0 0
D 14 12
Table 3
The item difficulty is determined by the value of the difficulty in the Table 3. The lower
the value of the difficulty, the more difficult the question is. For our test item, there are three
questions with the value of difficulty less than 50. The test items are numbered with 6, 10 and
14.
The distractors for the number 6 are working well because the lower group students
scattered in answering the question (A-5; B-2; C-2; D-5). While almost half of higher group
student answer it right. (A-7; B-0; C-1; D-6)
The distractors for the number 10 also are working well since the lower group students
are scattered in answering the question which A-4; B-2;C-5;D-3 and 6 of the higher group
students answer it right the other 6 answer C and other 2 answer A.
The distractor for the number 14 also working well since the lower group students answer
A-4; B-5; C-5; D-0. This shows that they answer it with guessing it. For the higher group
students, half of them answer it right which C and other scattered which A-1; B-5; D-1.

3.0.4 Item Discrimination


The item discrimination here means that the higher the value of discrimination of each
item the better the test item is. The highest value of Discrimination for our test is 28 which the
item number 9.
In our item there are three test items that has negative value of discrimination. From the
negative values, it means that there is something wrong with the item itself. The test items are
numbered with 11, 12 and 13. This means the high group students try to guess the answer as
there is something ambiguity with the items, thus causing the higher group student o chooses
wrong alternative while the weaker group guess the correct alternative. So there should be an
editing on the test item.

3.0.5 Item Distraction


The item discrimination is a measure of how well the item is able to distinguish or
discriminate between examinees who are knowledgeable and those who are not, or between
masters and non-masters. The common ways to compute item discrimination is the point-
biserial correlation which shows the degree of relationship between scores on the item correct
or incorrect) and total test scores.
The examinees who responded to the item correctly are highly discriminating and did well
on the test, while the examinees who responded to the item incorrectly did poorly on the test.
The possible range of the discrimination index is -1.0 to 1.0; however, if an item has
discrimination below 0.0, it suggests a problem. When an item is discriminating negatively,
overall the most knowledgeable examinees are getting the item wrong and the least
knowledgeable examinees are getting the item right. A negative discrimination index may
indicate that the item is measuring something other than what the rest of the test is measuring.
It may also occur as a result of several possible causes such as a miskeyed item, an item that is
ambiguous, or an item that is misleading. However, items with negative discrimination values
should be reviewed.
It is important to be aware that there is a relationship between an item’s difficulty and its
discrimination index when interpreting the value of discrimination. If an item has a very high
(or very low) p-value, the potential value of the discrimination will be much less than if the
item has a mid-range p-value. Theoretically, this makes sense. Students who know the content
and who perform well on the test overall should be the ones who know the content. There’s a
problem if students are getting correct answers on a test and they don’t know the content.
Hence, what should we aim for? It is recommended that item discrimination should be at
least 0.20. It’s best to aim even higher. In all cases, action must be taken! So, items with
negative item difficulty must be given. Items with discrimination indices less than 0.20 must
be revised or eliminated. Be certain that there is only one possible answer, that the question is
written clearly, and that your answer key is correct.
4.0 DECISION
Basically our test items are balance in term of difficulty. There is one question that super
easy which 100% of the students managed to answer it right. The difficult question which the
value with less than 50 is only 15% percent. While the other include in the range of 61-96 value
of difficulty.
However, there are some question needs some editing since the items have discrimination
value of negative numbers. They are the items numbered with 11, 12 and 13.
Generally, our test item should be revise to obtain more accurate result in testing students’
ability.

5.0 CONCLUSION
In conclusion, we had chosen the items to be analysed based on every technique that we
had learned in this course. Even though we are merely just beginners in constructing exam
questions, through online researches and expertise interviews, we manage to come out with our
own items successfully. We hope that the techniques and skills that we had earned will help us
in managing our evaluation sooner as a teacher.
6.0 REFERENCES

Professional testing, Step 9. Conduct the Item Analysis, Building High Quality Exam
Program. Retrieved on 13 December 2016 from
http://www.proftesting.com/test_topics/steps_9.php

Terry Overton (n.d), Assessing Learners with Special Needs: 6th ED, Testing, Assessment,
Measurement and Evaluation Definition. Retrieved on 11 December 2016 from
http://www.slideshare.net/norazmi3003/testing-assessment-measurement-and-
evaluation-definition
Item difficulty and item discrimination. Retrieved on 14 December 2016 from
http://www.omet.pitt.edu/docs/OMET%20Test%20and%20Item%20Analysis.
pdf

Triola, M. F. (2004). Elementary Statistic (Ninth ed.). America: Pearson Education.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy