Enjoy1 130512161848 Phpapp02
Enjoy1 130512161848 Phpapp02
Assessment refers to the process of gathering ,describing or qualifying information about the student
performance.it includes paper and pencil test, extended responses (example essays) assessment
performance are usually referred to as “authentic assessment” tasks (example presentation of research
work).
Evaluation is refers to the process of examining of the performance of the student.it also determine the
whether or not the student has met the lesson instructional objective.
Test an instrument or systematic procedure design measure the quality, ability, skill or knowledge of
student by giving a set of question in a uniform manner. Since test is a uniform of assessment, test also
answers the question “How does individual student perform”.
Testing is a method used to measure the level of achievement or performance of the learners.it also
refers to the administration, scoring and interpretation of an instrument (procedure) designed to elicit
information about performance in sample of particular area of behavior.
Types of Measurement
There are two ways of interpreting the student performance in relation to classroom instruction. These
are the Norm-reference test and criterion.-referenced test.
Norm-reference test is a test design to measure the performance of the student compared with the
other student each individual is compared with other examinees and assigned a score-usually express as
percentile, a grade equivalent score or a stanine. The achievement of the student is reported for abroad
skill areas, although some norm-referenced tests do report student achievement for individual.
The purpose is to rank each student with respect to the achievement of others in the broad areas of
knowledge and to discriminate high and low achievers.
Criterion-references test is attest designed to measure the performance of the student with respect to
some particular criterion or standard. Each individual is compared with a predetermined set of standard
foe acceptable achievement. The performances of the other examinees are irrelevant. A student’s score
is usually expressed as a percentage and student achievement is a report for individual skills.
The purpose is to determine whether each student has achieved specific skills or concept. And to find out
how much students before instruction begins and after it has finished, other term le often used for
criterion-referenced are objective referenced, domain references, content referenced and universe
referenced.
According to Robert L. Linn and Norman E. Gronlund (1995) pointed out the common characteristics and
differences of norm-referenced test and criterion referenced test.
Common Characteristics of Norm-References Test and Criterion-Referenced Tests
TYPES OF ASSESSMENT
There are four types of assessment in terms of their functional role in relation to classroom instruction.
These are the placement assessment, diagnostic assessment, formative assessment, and summative
assessment.
A. Placement Assessment is concerned with entry performance of the student. The purpose of the
placement assessment is to determine the prerequisite skills, degree of mastery of the course
objectives and the best mode of learning.
B. Diagnostic Assessment is a type of assessment given before instruction. It aims to identify the
strengths weaknesses of the students regarding the topics to be discussed. The purpose of the
diagnostic assessment:
1. To determine the level of competence of the students;
2. To identify the students who have already knowledge about the lesson;
3. To determine the cause of learning problems and formulate a plan for remedial action.
C. Formative Assessment is a type of assessment used to monitor the learning progress of the
students during or after instruction. Purpose of formative assessment:
1. To provide feedback immediately to both student and teacher regarding the success and
failure of learning.
2. To identify the learning errors that is in need of correction.
3. To provide information to the teacher for modifying instruction and used for improving
learning and instruction.
D. Summative Assessment is a type of assessment usually given at the end of a course or unit.
Purposes of summative assessment:
1. To determine the extent to which the instructional objectives have been met;
2. To certify student mastery of the intended outcome and used for assigning grades;
3. To provide information for judging appropriateness of the instructional objectives;
4. To determine the effectiveness of instruction.
MODE OF ASSESSMENT
A. Traditional Assessment
1. Assessment in which student typically select an answer or call recall information to complete
the assessment. Test may be standardized or teacher made test, these tests may be multiple-
choice, fill in the blanks true or false, matching type.
2. Indirect measure of assessment since the test items are design to represent competence by
extracting knowledge and skills from their real life context.
3. Items on standardize instrument tend to test only the domain of knowledge and skill to avoid
ambiguity the test takers.
4. One-time measures to rely the on single correct answer to each item. there is a limited
potential for traditional test to measure higher order thinking skills.
B. Performance Assessment
1.Assessment in which the student are ask to perform real-world task that demonstrate
meaningful application of essential knowledge and skill(Jon Mueller).
2.Direct measure of student performance because tasks are design to incorporate context,
problem, and solution strategies that student would use in real life,
3.Design ill-structured challenges since the goal is to help students prepare for the complex
ambiguities in life.
4. Focus on process and rationales. There is no single correct answer ;instead students are led to
craft polished, thorough and justifiable responses ,performances products.
5. Involved long-range projects, exhibits and performances are linked to the curriculum
6. Teacher is an important collaborator in creating tasks, as well as in developing guidelines for
scoring and interpretation.
C. Portfolio Assessment
1. Portfolio is collection of students work specifically elected to tell a particular story about the
student.
2. A portfolio is not a pile of student work that accumulates over a semester or a year.
3 A portfolio contains a purposefully selected subset of student work
4. It measures the growth and development of students.
The specific statement of the aim of the instruction;it should express what the students should be
able to do or know as a result of taking the course;the objectives should indicate the cognitive level,
affective and psychomotor level of performance.
Instruction:
It consist all the elements of the curriculum designed to teach the subject including th lesson plan,
study guide, and reading and homework assignment; the instruction should correspond directly to
the objective.
Assessment:
The process of gathering, describing or qualifying information about the performance of the learner
testing components of the subject; the weight given to different subject matter areas on the test
should match with the objectives as well as he emphasis given to each subject area during
instruction.
Evaluation:
Examining the performance of the students and comparing and judging its quality. Determining
whether or not learner has the objective of the lesson and the extent of understanding.
INSTRUCTIONAL OBJECTIVES
Instructional objective play a very important role in the instructional process and the evaluation
process. It serves as guides for teaching and learning, communicate the intent of the instruction to
others and it provide a guideline for assessing of the learning of the student. Instructional Objectives
also known as behavioral objectives or learning objectives are statement which clearly described an
anticipated learning outcome.
4. Be sequentially appropriate
6. Be developmentally appropriate.
TABLE OF SPECIFICATION
Table of specification is a device for describing test items in terms of the content and the process
dimensions. That is, what a student is expected to know and what he or she is expected to do with
the knowledge. It is describe by combination of the content and the process in the table of the
specification.
ITEMS ANALYSIS
Items analysis refers to the process of examining the student’s response to each items of the
test. According to Abubakar S. Asaad and Willam M. Hailaya (measurement and evaluation concepts
and principles) Rex bookstore (2004 edition), there are two characteristics of an item. These are
desirable and undesirable characteristics. An item that has desirable characteristics can be retained
for subsequent use and that with undesirable characteristics is either be revised or rejected.
a. Difficulty of an item
b. Discriminating power of an item
c. Measures of attractiveness
Difficulty index (Df) refers to the proportion of the number of the students in the upper and lower groups
who answered an item correctly. In a classroom achievement test, the desired indices of difficulty not
lower than 0.80.The average index of difficulty from 0.30 or o.40 to a maximum of 0.60.
Index Range Difficulty Level
0.00 – 0.20 Very difficult
0.21 – 0.41 Difficult
0.41 – 0.60 Moderate Difficult
0.60 – 0.81 Easy
0.81 – 1.00 Very Easy
Discrimination Index is the difference between
the proportion of high performing students who got the item right and the proportion of low
performing students usually defined as the upper 27% of the students based on the total examination
score and the lower 27% of the students based on the total examination score. Discrimination index
is the degree to which the items discriminate between high performing groups in relation of scores
on the total test. Index of discrimination is classified into positive discrimination, negative
discrimination, and zero discrimination. Positive Discrimination if the proportion of students who got
an item right in the upper performing group is greater than the proportion of the low performing
group. Negative Discrimination if the proportion of the students who got an item right in the low
performing group is greater than the students in the upper performing group. And Zero
Discrimination if the proportion of the students who got an item right in the upper performing group
Maximum Discrimination is the sum of the proportion of the upper and lower groups who answered
the item correctly. Possible maximum discrimination will occur if the half or less of the sum of the
upper and lower groups answered an item correctly.
Notations:
Di = discrimination Index
DM = Maximum discrimination
DE = Discrimination Efficiency
Formula:
Di = PUG - PLG
Example: Eight students took an examination in algebra, 6 students in the upper group got the correct
answer and 4 students in the lower group got correct answer for item number 6.Find the discriminating
efficiency.
Given:
27% of 80=21.6 or 22 , which means that there are 22 students in the upper performing group
and 22 students in the lower performing group.
(insert)
This can be interpreted as on the average, the item is discriminating at 20% of the potential of its
difficulty.
Measures of Attractiveness
To measure the attractiveness of the incorrect option (distracters) in multiple choice tests, we
count the number of the students who selected the incorrect option in the both the upper and lower
groups. The incorrect option is said to be effective distracter if there are more students in the lower
group chose that in correction option than those students in the upper group.
Steps of Item
Df
analysis
1. Rank the scores of the student from highest score to lowest score.
2. Select 27% of the papers within the upper performing group and 27% of the papers within the
lower performing group.
3.Set aside the 46% of the papers because they will not be used for item analysis.
7.
Validity Test
Validity refers to the appropriate of score-based inferences; or decisions made based on the
student’s results. The extent to which a test measures what it’s supposed to measure.
Important thing to Remember about Validity
1. Validity refers to the decisions we make, and not to the test itself or to the
measurement.
2. Like reliability, validity, I not an all or nothing concept; it is never totally absent or
absolutely perfect.
3. A validity estimate, called a validity coefficient, refers to specific type of validity. It ranges
between 0 to 1.
4. Validity can never be finally determined; it is specific to each administration of the test.
TYPES OF VALIDITY
1. Content Validity –a type of validation that refers to the relationship between a test and the
instructional objectives, establishes content so that the test measures what it is supposed to
measure .things to remember about validity.
a. The evidence of the content validity of your test is found in the table of Specification.
b. This is the most important type of validity to you, as a classroom teacher.
c. There is no coefficient for content validity. It is determined judgmentally, not empirically.
2. Criterion-related Validity- a type of validation that refers to the extent to which scores from a
test relate to theoretically similar measures. It is a measure of how accurately a student’s current
tests score can be used to estimate a score on a criterion measure, like performance in courses,
classes or another measurement instrument. Example classroom reading grades should indicate
similar level of performance as standardized Reading Test scores.
A. Construct Validity-a type of validation that refers to measure of the extent to which a
test measures a hypothetical and unobservable variable or quality such as intelligence,
math achievement, performance anxiety, etc. It established through intensive study of
the test or instrument.
B. Predictive Validity-A type of validation that refers t measure other extent to which a
person’s current test result can be used to estimate accurately what persons
performance or other criterion, such as test scores will be at the later time
3. Concurrent Validity-a type of validation that require the correlation of the predictor or
concurrent measure with the criterion measure. Using this, we can determine whether a test is
useful to us as predictor or as substitute (concurrent) measure. The higher the validity coefficient,
the better the validity evidence of the test. In establishing the concurrent validity evidence no
time interval is involved between the administration of the new test and the criterion or
established test.
Reliability refers to the consistency of measurement; that is, how consistent test result
or other assessment results from one measurement to another. We can say that a
test is reliable when it can be used to predict practically the same scores when the
test administered twice to the sane group of student and with a reliability index of
0.50 or above. The reliability of attest can be determined by means of Correlation
Product Coefficient, Spearman Brown Formula and Kuder-Richadson Formula.
1. Length of a Test
2. Moderate item difficulty
3. Objective scoring
4. Heterogeneity of the student group
5. Limited time
3. Split- Half Method. Administer test once, score two equivalent halves of the test. To
split the test in two halves that are equivalent, the usual procedure is to score the
even number and odd numbers separately. This provides two score for each student.
The result of the test scores are correlated using the Spearman-Brown formula and
tis correlation Coefficient provides a measure of internal consistency it indicates the
degree to which consistent result are obtain from two halves of the test.
4. Kuder- Richardson Formula. Administered the test once. Score total test and apply the Kuder-
Richardson Formula. He Kuder-Richardson formula is applicable only in situation where students’
responses are score dichotomously and therefore and is most useful with traditional test items that
are scored as right or wrong. KR-20 estimate of reliability that provides information the degree to
which the items in the test measure the same characteristics it is an assumption that all items are of
equal of difficulty. ( A statistical procedure to estimate coefficient alpha, a correlation coefficient is
given.)
Statistics play a very important rule in describing the test of the students. Teachers should have a
background on the statistical techniques in order for them to analyze and describe the results of
measurements obtained in their own classroom ; understand the statistic used in the test and research
reports interpret the type of scores used in testing .
Descriptive Statistic – is concerned with collecting, describing, and analyzing a set of data without
drawing conclusions or inferences about a large group of data in terms if table, graphs, or single number
(example average score of the class in a particular test).
Inferential statistic – is concerned with the analysis of the subset of a data leading the prediction or
inferences about the entire set of data or population.
We shall discus different statistical techniques used d in describing and analyzing test result.
Measures of Central Tendency. It is single value that is used to identify the center f the data, it is
taught a the typical value in a set of scores. It tends to lie with in the center if it is range from
lowest to highest or vice versa. There are three measures of central tendency commonly, the
mean median and mode.
The mean
The mean is common measures of center and it is also known as arithmetic average.
N
Population mean = µ = ∑ Xi = X1 + X2 + …..Xn
i=1 N
Sample mean = ∑x
N
∑ = sum of the scores
X= individual score
N= number of scores
45
35
48
60
44
39
47
55
58
54
∑x = 485
N= 10
4. Mean = ∑x
N
= 485
10
Mean = 48.5
Properties of mean
1. Easy to compute
2. It may not be an actual observation in the data set
3. It can be subjected to numerous mathematical computation
4. Most widely used
5. Each data contributes to the mean value.
6. It is easily affected by the extremes values
7. .applied to interval level data
The median is a point that divides the scores in the distribution into two equal parts when the
scores are arrange according to magnitude, that is from lowest score to highest score or highest score to
lowest score. If the number of score is an odd number, the value of the median is the middle score. When
the number of scores is an even number, the median vale s the average of the two middle most scores.
Example 1: find the median of the cores of ten students in algebra quiz.
(x) scores of students in algebra
45
35
48
60
44
39
47
55
58
54
First, arrange the scores from lowest to highest and find the average of the two middle most
scores. Since the number of cases is an even .
35
39
44
45
47
48
54
55
58
60
Medan = 47 +48
2
= 47.5 is the median score
50% of the scores in the distribution fall below for the 47.5.
Properties of Median
1. It is not affected by extremes values.
2. It is applied to ordinal level of data.
3. The middle most score in the distribution.
4. Most appropriate when there are extreme scores.
The Mode
The mode refers to the score of the scores that occurred most in the distribution. There are
classifications of mode: a) unimodal is a distribution that consists of only one mode, b) bimodal is a
distribution of scores that consist of two modes, c) multimodal is a score distribution that consist of more
than two modes.
Properties of Modes
1. It is the score/s occurred most frequently.
2. Nominal average.
3. It can be used for qualitative and quantitative data.
4. Not affected by extreme values.
5. It may not exist.
Example 1: Find the mode of the scores of the students in algebra quiz: 34, 36, 45, 65, 34, 45, 55, 61, 34,
46.
Mode = 34, because it appeared three times. The distribution is called unimodal.
Example 2: Find the mode of the scores of students in algebra quiz: 34, 36, 45, 61, 34, 45, 55, 61, 34, 45.
Mode = 34 and 45, because both appeared three times. The distribution is called bimodal
Measures of Variability
Measures of Variability is a single value that is used to describe the spread out of the scores in a
distribution; that is above or below the measures of central tendency. There are three commonly used
measures of variability, the range, quartile deviation and the standard deviation.
The Range
Range is the difference between highest score and lowest score in the data set.
R = HS –LS
Properties of Range
Example: Scores of 10 students in Mathematics and Science. Find the range and in what subject has a
greater variability?
Mathematics Science
35 35
33 40
45 25
55 47
62 55
34 35
54 45
36 57
47 39
40 52
Mathematics Science
HS = 62 HS = 57
LS = 33 LS = 25
R = HS –LS R = HS -LS
R = 62- 33 = 57-25
R = 29 R = 32
Based from the computed value of the range, the scores in Science have greater variability.
Meaning, scores in Science are more scattered than in the scores in Mathematics.
Quartile Deviation is the half of the difference between the third quartile ( Q3) and the first quartile (Q1).
It is based on the middle 50% of the range, instead the range of the entire set of distribution. In symbol,
QD = Q3 – Q1 where,
2
QD = quartile deviation
Q3 = third quartile value
Q1 = first quartile value
Example: In a score of 50 students, the Q3 = 50.25 and Q1 = 25.45, Find the QD.
QD = Q3 –Q 1
2
= 50.25- 25.45
2
QD = 12.4
The value of QD = 12.4 which indicates the distance we need to go above or below the median to include
approximately the middle 50% of the scores.
Standard Deviation
The Standard Deviation is the most important and useful measures of variation, it is the square root of
the variance. It is an average of the degree to which each set of the scores in the distribution deviates
from the mean value. It is more stable measures of variation because it involves all the scores in the
distribution rather than the range and quartile deviation.
Rubrics
Rubrics is a scoring scale and instructional tool to assess the performance of student using a task- specific
set of criteria. It contains two essential parts: the criteria of the task and level of performance for each
criterion. It provides teachers an effective means of students- centered feedback and valuation of the
work of the students. It also enables teachers to provide detailed and informative evaluations of their
performance.
Rubrics is very important especially if you are measuring the performance of the students against a set of
standard or pre- determined set of criteria. Through the use of scoring rubrics or the teachers can
determine the strengths and weaknesses of the students, hence it enables the students to develop their
skills.
1. Identify your standards, objectives and goals of your students. Standard is a statement of what
the students should be able to know or be able to perform. It should indicate that your students
should met these standards. Know also the goals for instruction, what are the learning
outcomes?
2. Identify the characteristics of a good performance on that task, the criteria. When the students
perform or present their work, it should indicate that they performed well in the task given to
them: hence they met that particular standard.
3. Identify the levels of performance for each criterion. There is no guidelines with regards to the
number of levels of performance, it vary according to the task and needs. It can have as few as
two levels of performance or as many teacher can develop. In this case, the rater can sufficiently
discriminate the performance of the students in each criteria. Through this level of performance,
the teacher or the rater can provide more detailed feedback about the performance of the
students. It is easier also for the teacher and students to identify the areas needed for
improvement.
Types of Rubrics
1) Holistic Rubrics
In holistic rubrics does not list a separate levels of performance for each criterion. Rather, holistic
rubrics assigns a level of performance along with a multiple criteria as a whole, in other words
you put all the components together.
Advantage: quick scoring, provide overview of students’ achievement.
Disadvantage: does not provide detailed information about the students’ performance in specific
areas of the content and skills. Maybe difficult to provide one overall score.
2) Analytic Rubrics
In analytic rubrics the teacher or the rater identify and assess components of a finished product.
Breaks down the final product into component parts and each part is scored independently. The
total score is the sum of all the rating for all the parts that are to be assessed or to be evaluated.
In analytic scoring, it is very important for the rater to treat each part as separate to avoid bias
towards the whole product.
Advantage: more detailed feedback, scoring more consistent across the students and graders.
3 – Excellent Researcher
Included 10-12 sources
No apparent historical inaccuracies
Can easily tell which source information was drawn from
All relevant information is included
2- Good Researcher
Included 5-9 sources
Few historical inaccuracies
Can tell with difficulty where information came from
Bibliography contains most relevant information
1 – Poor Researcher
Included 1 – 4 sources
Lots of historical inaccuracies
Cannot tell from which source information came
Bibliography contains very little information
Source: jonathan.mueller.faculty.noctrl.edu/toolbox/howstep4.htm
Source: jonathan.mueller.faculty.noctrl.edu/toolbox/howstep4.htm
Performance based assessment is a direct and systematic observation of the actual performances of the
students based from the pre-determined performance criteria ( Zimmaro, 2003) as cited by (Gabuyo,
2011). It is an alternative form of assessing the students that represents a set of strategies for the
applications of knowledge, skills, and work habits through the performance of task that are meaningful
and engaging to students” (Hibbard, 1996) and (Brualdi, 1998) in her article ”Implementing Performance
Assessment in the Classroom”.
3. Portfolio is a purposeful collection of student works that exhibits a student’s efforts, progress
and achievements in one or more areas.
1. Checklist Approach are observation instruments that divide performance whether it is certain or
not certain. The teacher has to indicate only whether or not certain elements are present in the
performances.
2. Narrative/ Anecdotal Approach is a continuous description of student behavior as it occurs,
recorded without judgment or interpretation. The teacher will write narrative reports of what
was done during each of the performances. From these reports, teachers can determine how
well their students met their standards.
3. Rating Scale Approach is a checklist that allows the evaluator to record information on a scale,
noting the finer distribution that just presence or absence of a behavior. The teacher they
indicate to what degree the standards were met. Usually, teacher will use a numerical scale. For
instance, one teacher may rate each criterion on a scale of one to five with one meaning “skill
barely present” and five meaning “skill extremely well executed.”
4. Memory Approach the teacher observes the students when performing the tasks without taking
any notes. They use the information from their memory to determine whether or not the
students were successful. This approach is not recommended to use for assessing the
performance of the students.
PORTFOLIO ASSESSMENT
Portfolio assessment is the systematic, longitudinal collection of the student work created in
response to specific, known instructional objectives and evaluated in relation to the same criteria
(Ferenz, 2001). Students Portfolio is a purposeful collection of the student work that exhibits the
student’s efforts, progress and achievements in one or more areas. The collection most include student
participation in selecting contents, the criteria for selection, the criteria for judging merit and evidence of
students reflection. (Paulson, Paulson, Meyer 1991) as cited by Ferenz (2001) in her article ”using student
Portfolio for Outcomes Assessment.
Uses of Portfolios
1. It can provide formative and summative opportunities for monitoring progress toward reaching
identified outcomes.
2. Portfolios can communicate concrete information about what is expected to students in terms of
the content and quality of performance in specific curriculum areas.
3. A portfolio is that it allows the students to documents aspects of their learning that do not show
up well in traditional assessments.
4. Portfolios are useful to showcase periodic or end of the year accomplishments of students such
as poetry, reflections on growth, samples of best works, etc.
5. Portfolios may also be used to facilitate communicates between teacher and parents regarding
their child’s achievements and progress in a certain period of time.
6. The administrators may use portfolios for national competency testing to grant high school
credit, to evaluate educational programs.
7. Portfolios may be assembled for combination of purposes such s instructional enhancement and
progress documentation. A teachers review of students’ portfolios periodically and make notes
for revising instruction for next year used.
According to Mueller (2010) there are seven steps in developing portfolios of students. Below are
the discussions of each step.
1. Purpose: What is the purpose(s) of the portfolio?
2. Audience: For what audience(s) will the portfolio be created?
3. Content: What samples of students work will be included?
4. Process: what processes (e.g., selection of work to be included, reflection on work, conferencing)
will be engaged in during the development of the portfolio?
5. Management: How will time and materials be managed in the development of the portfolio? The
6. Communication: How and when will the portfolio be shared with pertinent audiences?
7. Evaluation: If the portfolio is to be used for evaluation, when and how should it be evaluated?
Direction: encircle the best answer that makes the statements true.
1. Teacher Adrian will construct an achievement test. Which of the following he will
accomplish first?
A. Construct relevant test items.
B. Prepare table of specification.
C. Determine the number of items to be constructed.
D. Identify the intended learning outcomes.
Rationalization: D- the first step in constructing test items is to identify the learning outcomes or
go back to the instructional objectives.
Direction: Column A describes events associated with U.S presidents, inventor, civil right leader.
Indicate which name in column B matches each event by placing the appropriate letter to the left
of the number to column A. Each name may be used once only.
Column A Column B
1. President of the 20th Century A. Lincoln
2. Invented the telephone B. Nixon
3. Delivered the Emancipation Proclamation C. Whitney
4. Recent president to resign from office D. Ford
5. Civil rights leader E. Bell
6. Invented the cotton gin F. King
7. Our first president G. Washington
8. Only president elected for more than two terms H. Roosevelt
Rationalization: D- the descriptions and options are not homogeneous, it consist of three
groups: name of presidents, name of inventors and name of civil right leader. In constructing
matching type of test the descriptions and options must be homogeneous.
3. Using the data in situation A, how would you improve the options to avoid ambiguity?
A. Arrange the options in alphabetical order.
B. Add two more options to avoid guessing.
C. Write the complete names in the options.
D. Remove two options to have valid options.
Rationalization: C- to avoid ambiguity write the complete name of persons among the options.
Rationalization: B- statements II, III and IV are the guidelines in constructing matching type of
test. While statement I is a violation in the guidelines of test construction of matching type.
6. Which of the following statements are characteristics of imperfect type of matching test?
I. The minimum item is three.
II. The item has no possible answer.
III. More options than descriptions.
IV. Items not necessarily homogeneous.
A. I, II and IV
B. I, II and III
C. II, III and IV
D. II and IV only
7. Which statement best describes the limitation of true or false type of test?
A. Useful for outcomes with two possible alternatives.
B. Scoring is easy, objective and reliable.
C. Can measure complex outcomes.
D. Scores are more influence by guessing.
Rationalization: D- scores are more influence by guessing because of 50% probability guessing
the correct answer. There are only two alternatives.
Rationalization: A- in constructing true or false test item you should avoid verbal clues and
specific determiner and avoid taking directly from the book. Avoid also: terms denoting
indefinite degree or amount, placing items in systematic order, use of always or never use of
negatives in stating the item.
9. Which of the following test item can best effectively measure higher order of cognitive
learning objectives?
A. Objective test
B. Achievement test
C. Completion test
D. Extended essay test
Rationalization: D- extended essay test is the best way to measure higher order of cognitive
learning objectives. It requires student to plan their own answer and express them in their own
words. Students have freedom to express their individuality in the answers given and to present
more realistic answers.
Rationalization: B- a short-answer item is easy to write and can measure a broad range of
learning outcomes. It is not adaptable in measuring complex learning outcomes; it is tedious and
time consuming to score.
SITUATION B. The data on the table below are results of test which was administered to four
subjects in which Ritz Glenn belong. Using the said data answer the questions.
(11-15).
11. In which subject did Ritz Glenn performed best in relation to the performance of the
group?
A. English
B. Music
C. Mathematics
D. PE
Rationalization: A- Compute the z-score of each subject, z-score in English = 0.86, z-score in
Mathematics = .40, z-score in music = -1.23, z-score in PE = 0.75. The highest value of z-score is
English, hence Ritz performed best in English.
Rationalization: D- Ritz Glenn performed best in English, hence he is a linguistic type of learner.
13. In which subject did Ritz Glenn performed poorly in relation to the group performance?
A. English C. Mathematics
B. Music D.PE
Rationalization: B- Ritz Glenn performed very poor in Music since the z-score = -1.23 which is
the lowest among the four subjects.
14. In which subject the scores most dispersed?
A. English C. Mathematics
B. Music D. PE
Rationalization: B- The scores in music is the most dispersed. Compute the CV of each subject.
The larger the value of CV, the more the dispersed the scores are. The smaller the value of CV,
the scores is less dispersed. CV of English = 4.12%, CV of Music = 6.63%, CV of Mathematics =
5.15%, CV of PE = 4.40%.
Rationalization: B- The mean and median are equal when the scores are normally distributed.
Rationalization: D- Mode is a kind of measures of central tendency. The other two kinds are
mean and median.
18. What type of measure of variation easily affected by the extreme scores?
A. Range C. Inter- quartile range
B. Mean D. Standard deviation
Rationalization: A- Range is a measure of variation that easily affected by the extreme score
because if there is a change in either the highest score or lowest score the value of the range also
changes.
19. Which measure/s of central tendency easily affected by the extreme scores?
A. Median C. Mode
B. Mean D. Mean and Median
Rationalization: B- Mean easily affected by the extreme score. When the lowest score becomes
lower, the mean value will be pulled down and when the highest score becomes higher the mean
value will be pulled up. Hence, changes in either lowest or highest score cause a change in the
mean value.
20. Adrian’s score in Statistics quizzes are as follows: 96, 90, 85, 89, 65, 99, 84, 82. What is the
mean value?
A. 83.25 C. 85.25
B. 84.25 D. 86.25
Rationalization: D- The sum of all the scores is 690 divided by 8 equal to 86.25.
21. Given the following scores: 88, 83, 89, 78, 89, 85, 85, 89, 75, 90, 95, and 95. What
characteristics best described the distribution?
A. Normally distributed C. Bimodal
B. Unimodal D. Multi-modal
Rationalization: B- Unimodal distribution because there is only one mode that is 89 which
appeared three times.
22. A type of error committed in grading the performance of the students by the rater who
avoids both extremes of the scale and tends to rate everyone as average.
A. generosity error C. logical error
B. severity error D. central tendency error
Rationalization: D- Central tendency error was committed by the rater if the rater tends to give
average scores by avoiding scores at the lower end of upper end of the scale.
23. What error committed by the rater if he overate the performance of the student/s?
A. generosity error C. logical error
B. severity error D. central tendency error
Rationalization: A- Generosity error was committed by the rater if tends to use the high end of
the scale only or the rater tends to give a high grades to the performance of high performing
student/s.
24. What error committed by the rater if the lower end of the scale is favored?
A. generosity error C. logical error
B. severity error D. central tendency error
Rationalization: B- Severity error was committed by the rater if favored the low performing
student/s. Tends to give additional grades to the low performing student/s.
25. Which of the following assessment techniques best assess the objective” plans and design
an experiment to be performed”.
A. Paper and pencil test C. Checklist
B. Rating scale D. Essay
Rationalization: C- Checklist is the best assessment techniques to assess the objective,” Plans
and designs an experiment to be performed”. Checklist is useful in assessing the performance
where the activities best assess by observation rather than testing.
28. Which of the following statements is NOT included in constructing table of specification?
A. Decide on the content areas to be included.
B. Decide on the number of test item per content.
C. Decide the skills to measure in each content.
D. Decide on the number of answer sheets needed.
Rationalization: D- Decide on the number of answer sheets needed in the test is not included in
constructing table of specification. Options A,B, and C are needed in constructing table of
specification.
29. Teacher Gina is talking about “grading on the curve“in a teacher’s assembly. This means
that she’s referring to what type of grading system?
A. Cumulative method of grading.
B. Norm- reference grading
C. Criterion- reference grading
D. Combination of B and C
30. The computed value of r= 0.95 in Mathematics and English. What does this imply?
A. Mathematics score is not related to English score.
B. English score is moderately related to Mathematics score.
C. Mathematics score is highly positive related to English score.
D. English score is not anyway related to Mathematics score.
Rationalization: C- The correlation coefficient of r= 0.95 is very high positive correlation. Which
means that Mathematics score is highly positive related to English score.
31. Teacher Jean will conduct a test “to measure her student’s ability to organize thoughts
and present original ideas”. Which type of test is most appropriate?
A. Modified true-false test item C. Short answer test
B. Completion type of test D. Essay test
32. Teacher Hyacinth conducted 25 items test in Algebra. Students’ scores were as follows:
20, 12, 13, 14, 15, 14, 20, 22, 20, 22, 23, 23, 24, 25, 25. Which measure/s of central tendency
does score 20 represent?
A. Mode only C. Median and Mode
B. Mean D. Mean, Median and Mode
Rationalization: D- the score 20 represent the mean, median and mode.
5: 15 as 4: _____. A. 12 B. 15 C. 16 D. 18
A. Completion type
B. Analogy
C. Solving problem
D. Short answer test
34. Teacher Jay constructed a matching type of test. In this column A of description are
combination of current issues, government agencies, data of events and government
officials. Which guidelines of constructing matching type of test NOT FOLLOWED?
A. Arrange the descriptions in alphabetical order.
B. Make the descriptions equal in length.
C. Make the descriptions homogeneous.
D. Make the descriptions heterogeneous.
Rationalization: D- normal curve means that the mean value is equal to the median value or the
scores are normally distributed.
36. Which characteristics best described the given score distribution? The scores are: 22, 23,
24, 24, 24, 25, 26, 26, 35, 36, 37, 38, 39, 39, 39, 40, 40, 45.
A. Multi-modal C. Normally distributed
B. Bimodal D. skewed to the left
Rationalization: B- the score distribution is bimodal because there are two modes. The modes
are 24 and 39.
Rationalization: A- when the value of the standard deviation is small, on the average the scores
are closer to the mean.
38. All of the given statements are best practices of preparing multiple-choice test items
EXCEPT:
A. Stem should be stated in positive form.
B. Use the stem that could serve as a short-answer item.
C. Underline words or phrases in the stem to give emphasis.
D. Shorten the stem so that options can be written longer.
Rationalization: A- Lord’s test score is higher than 89% of the class is an example of norm-
reference interpretation. Norm-reference interpretation means that performance of the learner is
compared with the performance of others in the class.
41. Which of the following learning outcomes is the most difficult to assess objectively?
1. A concept 3. An appreciation
2. An application 4. None of the above
Rationalization: C- change the option “none of the above” with “an interpretation”. Avoid using
none of the above option when asking for the best answer.
42. What is the main advantage of using table of specification (TOS) when constructing
periodic test.
A. It increases the reliability of the test result.
B. It reduces the scoring time.
C. It makes test construction easier.
D. It improves the sampling of content areas.
Rationalization: D- using table of specification in constructing a test items can improve the
sampling of contents of the entire test. You can assure your students that the test has content
validity when using table of specification in constructing test.
Rationalization: A- the main purpose of testing in teaching is to evaluate the learning progress
of the students and assess whether the instruction was effective or not.
44. The instructional objectives is very important in test construction when they are stated in
terms of:
A. Teacher activities. C. stated in general terms.
B. Learning activities. D. student performance.
45. Which of the following statement is the main reason why should negative words be
avoided in constructing multiple-choice test?
A. Increase the difficulty of the test item.
B. More difficult to construct options.
C. Might be overlooked.
D. Stems tend to be longer.
Rationalization: C- avoid using negative words in constructing multiple-choice test they might
be overlooked by the test takers. If can’t be avoided bold it or type it in capital letters to give
emphasis.
47. Which of the following statement is an advantage of multiple-choice test items over an
essay questions?
A. Provide assessment of more complex learning outcomes.
B. It emphasis more on the low level of learning outcomes.
C. Provide more extensive sampling of the content area.
D. Requires more time in preparing the test items.
Rationalization: A- to increase the reliability of the test, the teacher should give enough number
of test items.
49. All of the following best describe interpreting norm-reference scores EXCEPT:
A. Percentile rank C. Grade Equivalent scores
B. Standard scores D. raw scores
51. Teacher Adrian conducted item analysis and he found out that more from the lower
group got the test item number 6 correctly. This means that the test item_________.
A. Has a low reliability
B. Has a high validity
C. Has a positive discriminating power
D. Has a negative discriminating power
Rationalization: D- negative discrimination means that ore students from the lower group got
an item correctly than those students from the upper group.
Rationalization: A- positively skewed distribution means most of the scores are low and below
the mean value.
53. Most of the students who took the examination got scores above the mean. What is the
graphical representation of the score distribution?
A. Skewed to the left C. scores are normally distribute
B. Skewed to the right D. positively skewed
Rationalization: A- skewed to the left means that most of the scores are above the mean. Hence
the graphical representation of the scores is skewed to the left.
54. Which statement best describes a negatively skewed score distribution?
A. The value of mean and median are equal.
B. Most examinees got scores above the mean.
C. The value of mode corresponds to a low score.
D. The value of median is higher than the value of mode.
Rationalization: D- using the formula T score = 10z + 50 and solve for z by substitution. The
value of z = 3, which means that the distance above the mean is three times the standard
deviation.
56. The distribution of a class with academically poor students is more likely______.
A. Normally distributed C. skewed to the right
B. Skewed to the left D. leptokurtic
Rationalization: C- skewed to the right means most of the students got scores below the mean
which means that the examinees performed very poor or most of the scores are low.
57. Teacher Paul conducted item analysis and he found out that significantly greater number
from the upper group of the class got test item number 10 correctly. This means that the
test item_____________.
A. Has a negative discriminating power
B. Has a positive discriminating power
C. Has low reliability
D. Has high validity
Rationalization: B- positive discriminating power means that more students in the upper group
got the item correctly.
58. Mary Anne obtained a NAT percentile rank of 93. This imply that________.
A. She surpassed in performance 7% of the group.
B. She got a score of 93.
C. She answered 93 items correctly.
D. She surpassed in performance 93% of her fellow examinees.
Rationalization: D- percentile rank of 93 means that 93% of the examinees got a score below an
indicated score. Thus, Mary Anne surpassed in performance 93% of those who took the
examination.
59. Which instructional objective below is the highest level of Bloom’s Taxonomy?
A. Define fraction
B. Explain the different rules of addition of fractions
C. Add fractions correctly
D. Determine the steps in solving fractions
Rationalization: C- add fraction correctly is an instructional objective under application’. Hence,
it is the highest level of Bloom’s Taxonomy.
61. Which of the following statements best describes the incorrect options in item analysis?
A. Determining the percentage equivalent of the cut off score.
B. Determining the highest score
C. Determining the effectiveness of distracters
D. Determining the cut of score
62. When points in a scatter diagram are spread evenly in all directions this means that:
A. The correlation between two variables is positive.
B. The correlation between two variables is low.
C. The correlation between variables is high.
D. There is no correlation between two variables.
63. Roel’s score in Science test is 89 which is equal to 95th percentile. What does this mean?
A. 95% of Roel’s classmates got scores lower than 89.
B. 95% of Roel’s classmates got scores higher than 89.
C. Roel’s score is less than 89% of his classmates.
D. Roel’s score is higher than 95% of his classmates.
Rationalization: A- the statement can be transformed to P95=89. This means that, there are 95% of
those who took the exam in Science got scores lower than Roel’s score which is 89.
Rationalization: D- the area of -1SD to +1SD of a normal distribution is 68.26%. That is, from
-1SD to mean is 34.13% and from mean to +1SD is also 34.13%. Thus, the sum is 68.26%.
66. Teacher Kristy gave a chapter test, in which competency did her students find greatest
difficulty? In the item with a difficulty index of _____________.
A. 0.25 C.0.75
B. 0.15 D. 1.00
Rationalization: B- the difficulty index is 0.15. The level of difficulty is very difficult.
67. Fifty students took 40 items in English, below are their scores. ( items 67 to 68)
Scores Number of Students
11-15 7
16-20 10
21-25 7
26-30 20
31-35 6
68. Based on the data on item number 82, how many percent of the scores lower than 21?
A. 14% C. 20%
B. 17% D. 34%
17
Rationalization: D- There are 32 percent of the scores lower than 21. ( X100%=34%)
50
69. Teacher Lawrence gave a test in Mathematics. The facility of item No. 10 is 75%. The best
way to describe item No. 10 is _______.
A. very easy C. average item
B. easy item D. difficult item
Rationalization: B- 75% of the students got item number correctly. The level of difficulty is easy.
70. At the end of the school year, all third year students presented their portfolio in English
subject. Students, teachers, and other stakeholders were asked to view and give their
comments regarding what was viewed. Which authentic assessment was organized?
A. Exhibits C. Conference
B. Program D. Seminar
71. The point of departure of an inter-quartile range which indicate the spread of the scores
is_________.
A. Upper limit C. mean
B. Median D. range
Rationalization: B- median. The value of the inter-quartile range represents the dispersion of the
middle 50% of the scores from the median.
72. The admissions office of a certain university conducted a qualifying test five batches of
examinees. The number of qualifiers and their mean scores are presented below.
Batch I 20 94
Batch II 10 85
Batch III 15 92
Batch IV 25 87
Batch V 10 95
A. 90.44
B. 90.60
C. 5.66
D. 92.00
Rationalization: A- the mean score of the entire qualifiers is 90.44. (7235 divided by 80)
73. Joseph’s score in Science is 1.5 standard deviation above the mean of his group and 2
standard deviation above mathematics. What does this mean?
A. He excels both in Science and in Mathematics.
B. He is better in Mathematics than in Science.
C. He is better in Science than in Mathematics.
D. He does not excel in both subjects.
Rationalization: B- the standard deviation indicates the number of units above or below the
mean. Standard deviation in Mathematics is higher than the standard deviation in Science.
Hence, Joseph performed well in Mathematics than in Science.
74. The criterion of success in Teacher Ofel objective is that “the students must be able to get
80% of the test items correctly”. Luis and 24 other students in the class answered only 20
out of 25 items correctly. This means that teacher Ofel________.
A. Attained her lesson objective because of her effective problem solving drills.
B. Did not attain her lesson objective because her students lack of attention.
C. Attained her lesson objective.
D. Did not attain her lesson objective as far as the 25 students are concerned.
Rationalization: C- exactly 80% of the students answered the items correctly. Hence, teacher
Ofel attained her objectives.
75. The grading system of Department of Education is averaging. What is the average final
grade of Andie in English for four grading periods?
90 88 93 95 ?
A. 91.75 C. 94.00
B. 92.25 D. 95.00
76. The grading method which gives weight to the present grade and the previous grade of
1 2
the student such as (Third grading grade) + (Fourth grading grade) = Final Grade is
3 3
called__________.
A. Averaging
B. Criterion reference
C. Norm reference
D. Cumulative
Rationalization: D- the method of grading the involves the present grade and the previous
grade to compute the final grade is cumulative method.
77. To increase the difficulty of a multiple-choice test item, which of the following should be
done?
A. Make the stem short and clear
B. Make the options homogeneous
C. Make it grammatically correct
D. Make the options equal in length
Rationalization: B- to increase the difficulty of multiple-choice test item, make options in the
same class or the it must be homogeneous.
Item No. 10 A* B C D
Upper 27% 16 3 10 1
Lower 27% 14 6 8 2
78. Based on the table, which group got more correct answer?
A. Lower group C. can’t be determined
B. Upper group D. either lower group or upper group
79. The table shows that the item analyzed has _____________.
A. Positive discriminating power
B. Negative discriminating power
C. High validity index
D. High reliability index
Rationalization: A- more students go the item correctly from the upper group than the lower
group. Hence it has a positive discriminating power.
80. Based on the table in situation C, which is the most effective distracters?
A. Option A. C. Option C
B. Option B D. option D
Rationalization: B- option B is the most effective distracter since more students from the lower
group choose the incorrect option.
B. Option B D. option D
Rationalization: C- option C should be revised because more students from the upper group
choose this incorrect option. Hence it is not effective.
Rationalization: C- (30/60) x 100% = 50%. The difficulty index is 50%. Therefore the level of
difficulty is moderately difficult.
85. Teacher Ritz wrote of Michael.” When Michael came to class this morning, he seemed
very tired and slouched into his seat. He took no part in his class discussion and seemed
to have no interest in what was being discussed. This was very unusual for he has been
eager to participate and often monopolizes the class discussion. What Teacher Ritz wrote
is an example of a/an_______.
A. Anecdotal report C. personality report
B. Observation report D. incidence report
Rationalization: A- anecdotal reports are notes written by the teacher regarding incidents at the
classroom that might be needs special attention in the future.
87. If teacher Jerick Ivan want to test his students’ synthesizing skills. Which of the following
has the highest diagnostic value?
A. Completion test
B. Performance test
C. Essay test
D. Multiple-choice test
Rationalization: C- essay test is the most appropriate tool to measure the synthesizing skills of
the students.
89. The discriminating index of item number 15 is 0.44. This means that__________.
A. More students from the upper group got the item correctly.
B. More students from the lower group got the item correctly.
C. Equal number of students got the correct answer from the upper and lower
group.
D. The test item is very easy.
Rationalization: A- the discriminating index is 0.44 which means that the item is very
discriminating and more students from the upper group got the item correctly.
90. The difficulty index of item 20 is 0.55 and the discrimination index is 0.33. What should
the teacher do with this item?
A. Reject the item C. revise the item
B. Retain the item D. make the item bonus
Rationalization: B- the difficulty level is moderately difficult and the discriminating level is
discriminating. Therefore, the item should be retained.
91. The discriminating index of item number 1 is -0.15. This means that_______.
A. More students from the upper group got the item correctly.
B. More students from the lower group got the item correctly.
C. Equal number of students got the correct answer from the upper and lower
group.
D. The test item is very difficult.
Rationalization: B- the discrimination index is -0.15 which is negative, hence more students from
the lower group got the item correctly.
92. The score distribution of set A and set B have equal mean but with different SD’s Set A
has SD of 2.75 while Set B has SD of 3.25. Which statement is TRUE of the score
distributions?
A. Majority of the scores in set B are clustered around the mean.
B. Majority of the scores in set A are clustered around the mean than in set B.
C. Scores in set A are more widely scattered.
D. The scores of set B has less variability than the scores in set A.
Rationalization: B- the SD of A = 2.75, SD of B=3.25 and the mean are equal. Therefore, the
scores in set A is more clustered around the mean than in set B. The smaller the value of SD, on
the average the scores are closer to the mean.
93. About how many percent of the cases fall between -2SD and +2SD in the normal curve.
A. 99.72 C. 68.26
B. 95.44 D. 34.13
Rationalization: B- the area of the normal curve from -2SD to +2SD is approximately 95.44%.
Illustration, from mean to 1SD is 34.13%; from mean to -1SD is also 34.13%. From 1SD to 2SD is
13.59%; from -1SD to -2SD is 13.59%. Thus, the area under the normal curve is 34.13% + 34.13% +
13.59% + 13.59% = 95.44%.
94. In research analysis of variance utilizing the F-test is the appropriate significance test to
measure between:
A. Frequency
B. Median
C. Two means only
D. Three or more means
Rationalization: D- ANOVA using F-test is a statistical tool that is used to test the significant
difference of three or more means.
95. Skewed score distribution means:
A. The scores are normally distributed.
B. The mean and the median are equal.
C. The mode, the mean, and the median are equal.
D. The scores are concentrated more at one end or the other end and the
distribution.
Rationalization: D- when the scores concentrated at the left part of the distribution it is skewed
to the right while the scores concentrated at the right part of the distribution then it is skewed to
the left. Options A, B, and C deals with normal distribution.
Rationalization: A- Gabby performed better in spelling than 88% of his classmate. This means
that the performance of Gabby was compared to the performance of his classmates. Norm-
reference means comparing the performance of a certain student with the performance of other
students.
97. What type of validity is needed when you test course objectives and scopes?
A. Construct C. concurrent
B. Criterion D. content
98. Teacher Anne gives achievement test to her 30 students. The test consists of the 25 items.
She wants to compare her students’ performance based on the test result. What is the
appropriate measure for the position?
A. Percentage C. Z-score
B. Percentile rank D. standard nine
99. Teacher V give a 100 items multiple-choice test three students make scores of 94, 89 and
75, respectively, while the other 27 students in the class make scores ranging from 33 to
67. The measure of central tendency which is best describes for this group of 30 students
is:
A. Mean and median C. Mode
B. Mean D. Median
Rationalization: B- Mean is the most appropriate to describe the performance of the entire group
because you are going to utilize al the scores of the students. Thus, using mean you can describe
best their group performance.
100. If teacher gets the difference between the highest score and the lowest score, he
obtains the_____________.
A. Range C. standard deviation
B. Standard deviation D. index difficulty
Rationalization: A- Range is the difference between the highest and the lowest. Using the
formula: Range= Highest Score – Lowest Score.
Pg 471