Qualities of A Good Measuring Instrument

QUALITIES OF A GOOD MEASURING INSTRUMENT
A test is considered a measuring instrument. In the classroom, the test is used

to measure achievement. It is given to students to determine how much they
have students learned.
There are three qualities of a good measuring instrument.
USABILITY is the degree to which the test could be used without undue
expenditure of resources. Five factors determine the usability of a test.
1. Ease of Administration (Administrability). A test should be easy to

administer. Somebody other than the teacher who constructed the
test should be able to administer the instrument to students. This
attribute is crucial especially in the basic education where the class
adviser does the testing, not the subject teacher. Thus, instructions
ought to be complete and clear so that students are directed about
what to do.
2. Ease of Scoring (Scorability). A test should be easy to score. Answer

keys and model answers should be complete and comprehensive. A
good answer key contains all acceptable answers. Model answers
must likewise include the criteria and scoring guide.
3. Ease of Interpretation and Application (Interpretability and

Applicability). Scores obtained from a test should provide a
straightforward message as regards students’ performance,
particularly their strengths and weaknesses. A test designed in a way
that one lesson is assigned to one particular part is of advantage.
4. Low Cost. A test should be practical; it need not be expensive.

Utilization of supplies and materials (e.g., paper, ink) should be with
caution, without sacrificing the quality of the test.
5. Proper mechanical make-up. The physical appearance of a test

mirrors its quality. Essential features include layout, margin, font
style, font size, print quality, among others. Mechanics, such as
punctuation, spelling, and grammar should be considered as well.
PCK 135/JLSMBalagtey
1
Page
VALIDITY is the degree to which a test measures what it intends to measure.
There are three types of validity.
1. Content-Related validity pertains to the degree to which the test is

representative of the domain of interest. A valid test contains items
according to its design, particularly in terms of its objectives, coverage,
and, cognitive levels. To check the content-related validity of a test,
compare its items against the table of specifications.
2. Construct-Related validity pertains to degree to which the totality of

evidence obtained is consistent with theoretical expectations. A test
should therefore encompass the theoretical attributes of the
characteristic being measured.
3. Criterion-Related validity pertains to the relationship between

students’ scores in a test and another measure of the same trait.
 Concurrent Evidence of Validity. The degree to which scores

obtained on one test are related to the scores on another
instrument administered at the same time or to some other
criterion available at the same time.
 Predictive Evidence of Validity. The degree to which scores

obtained on a test predict characteristics of individuals in a future
situation.
Enhancing the Validity of a Test
1. Ask others to judge the clarity of what you are assessing.
2. Check to see if different ways of assessing the same thing give the same
result.
3. Sample sufficient number of examples of what is being assessed.
4. Prepare a detailed table of specifications.
5. Ask others to judge the match between the assessment items and the
objective of the assessment.
6. Compare groups known to differ on what is being assessed.

7. Compare scores taken before to those taken after instruction.

2
8. Compare predicted consequences to actual consequences.

Page
9. Compare scores in similar but different traits.
10. Provide adequate time to complete the assessment.
11. Ensure appropriate vocabulary, sentence structure, and item difficulty.
12. Ask easy questions first.
13. Use different methods to assess the same thing.
14. Use only for intended purposes.
RELIABILITY is concerned with the consistency, stability and dependability of

results. There are four types:
1. Stability (Test-Retest Method). Involves administering the same test

twice to the same group after a certain time interval has elapsed.
2. Equivalence (Parallel Forms Method). Two different but equivalent

instruments are administered to the same group of individuals. The
equivalent instruments may be administered with or without a time
interval.
3. Internal Consistency. Requires only a single administration of an

instrument. This may be tested using the following procedures:
a. Split-Half Procedure. Involves scoring two halves of an

instrument separately for each individual.
b. Kuder-Richardson Approaches. Analyzes items that are coded

as right “1” or wrong “0”.
c. Alpha Coefficient. Used for items that are not answered right
versus wrong.
4. Interrater Agreement. PCK 135/JLSMBalagtey

3
Page
Enhancing Reliability
1. Use a sufficient number of items or tasks.
2. Use independent raters or observers who provide similar scores to the

same performances.
3. Construct items and tasks that clearly differentiate students on what is
being assessed.
4. Make sure the assessment procedures and scoring are as objective as

possible.
5. Continue assessment until results are consistent.
6. Eliminate or reduce the influence of extraneous events or factors.
7. Use shorter assessments more frequently than fewer long assessments.
4
Page
MATCHING TWO SETS OF SCORES
Used for Assessing the Criterion-Related Validity and Reliability of Tests
A teacher is evaluating the validity of a 20-item Mathematics test he

constructed. He administered the test to 10 students. Then, he looked for a
Mathematics test of known validity and administered it to the same group of
students. Scores are recorded in the table below:
Student Score in Teacher-Made Test Score in standardized Test

(X) (Y)
A 12 14
B 15 17
C 10 12
D 15 16
E 18 19
F 20 18
G 9 11
H 13 15
I 17 19
J 16 18
First, graph the scores in a scatterplot. The first point is given for you.
5
NOTE: No need to connect the dots.

Page
What is the direction of the dots? Are they sloping upwards to the right? Or left?
A graph that shows dots that slope upwards to the right show that there is a
direct relationship between the scores in the teacher-made test and scores in
the standardized test. Direct relationship means that students who scored high
in the teacher-made test also scored high in the standardized test. Those who
scored low in the teacher-made test also scored low in the standardized test.
On the other hand, a graph that shows dots that slope downwards to the right
show that there is an inverse relationship between the scores in the teacher-
made test and scores in the standardized test. Inverse relationship means that
students who scored high in the teacher-made test scored low in the
standardized test. Those who scored low in the teacher-made test scored high
in the standardized test.
Next, draw an imaginary line to connect all the dots. Do the dots form a straight
line? If so, we can say that there is a perfect correlation between the teacher-
made test and standardized test. A perfect relationship implies that the high X-
high Y (low X-low Y) or high X-low Y (low X-high Y) trend is true to all students.
A more precise description regarding the matching of scores in the teacher-made

test and standardized test employs the use of a statistical tool called the
correlation coefficient. Correlation coefficient, denoted by the letter r in its
lowercase (“r”), is a numerical index that describes the relationship between two
sets of scores. An r value is composed of two elements, the sign and the number.
The sign of r indicates the direction of the graph. A positive r value means that
the dots are sloping upwards to the right, thus a direct relationship. A negative
r value means that the dots are sloping downwards to the right, thus an inverse
relationship.
The number denotes the magnitude of the relationship, i.e., how strong the
relationship is between scores in X and scores Y. The value of the number that
indicates magnitude ranges from 0.00 to 1.00. Relationship is described to be
stronger as r approaches 1.00 and weaker as it gets closer to 0.00. A perfect
relationship has an r value equal to 1.00.
r value is obtained using an equation called Pearson Product-Moment

Correlation Coefficient (PPMCC or Pearson’s r),
𝑛 ∑ 𝑋𝑌 − (∑ 𝑋)(∑ 𝑌)
𝑟𝑋𝑌 =
√[𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
6
Page
where rxy = correlation between X and Y
n = number of students
∑ 𝑋 = sum of scores in X
∑ 𝑌 = sum of scores in Y
∑ 𝑋𝑌 = sum of the product of scores in X and Y
∑ 𝑋 2 = sum of squared scores in X
∑ 𝑌 2 = sum of squared scores in Y
To help us compute the r value between the students’ scores in the teacher-made
test and standardized test, let us complete a contingency table. A column labeled
𝑿𝟐 has been added. Numbers under this column are the squared scores in the
teacher-made test. For Student A whose score in the teacher-made test is 12,
the corresponding X2 is equal to 144. There is also an additional column labeled
𝒀𝟐 . Numbers under this column are the squared scores in the standardized test.
For Student A whose score in the standardized test is 14, the corresponding Y2
is equal to 196. Another additional column is labeled 𝑿𝒀. Numbers under this
column are the product of scores in the teacher-made test and standardized test.
For Student A whose score in the teacher-made test is 12 and score in
standardized test is 14, the corresponding XY is equal to 168. Finally, to
complete the table, find the sum of the numbers in each column.
Student 𝑿 𝑿𝟐 𝒀 𝒀𝟐 𝑿𝒀
A 12 144 14 196 168
B 15 17
C 10 12
D 15 16
E 18 19
F 20 18
G 9 11
H 13 15
I 17 19
J 16 18
∑𝑋 = ∑ 𝑋2 = ∑𝑌 = ∑ 𝑌2 = ∑ 𝑋𝑌 =
Now, you are ready to substitute the values in the formula. After which, simplify
using the GEMDAS rule (Grouping, Exponent, Multiplication or Division,
Addition or Subtraction). Do it in the box below.
7
Page
What r value did you obtain?
A number does not mean anything unless we interpret it. r values may be
described using the following equivalents,
1.00 Perfect
0.7 – 0.9 Strong
0.4 – 0.6 Moderate
0.1 – 0.3 Weak
0.00 None
For instance, an r value of -0.45 may be described as moderate inverse

relationship. Another r value equal to +0.20 may be described as weak
direct relationship.
So, how do you describe the relationship between the scores in the teacher-
made test and standardized test in our example?
Next, we need to answer the question whether the teacher-made test is valid or
reliable. For this, we use the reference value of 0.60. That means, a test is
considered valid/reliable only if the r value is greater or equal to 0.60.
Using this reference value, does the teacher-made test have concurrent
validity?
8
Page

Qualities of A Good Measuring Instrument

Uploaded by

Copyright:

Available Formats

Qualities of A Good Measuring Instrument

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Qualities of A Good Measuring Instrument

Uploaded by

Copyright:

Available Formats

QUALITIES OF A GOOD MEASURING INSTRUMENT

A test is considered a measuring instrument. In the classroom, the test is used

There are three qualities of a good measuring instrument.

1. Ease of Administration (Administrability). A test should be easy to

2. Ease of Scoring (Scorability). A test should be easy to score. Answer

3. Ease of Interpretation and Application (Interpretability and

4. Low Cost. A test should be practical; it need not be expensive.

5. Proper mechanical make-up. The physical appearance of a test

1. Content-Related validity pertains to the degree to which the test is

2. Construct-Related validity pertains to degree to which the totality of

3. Criterion-Related validity pertains to the relationship between

 Concurrent Evidence of Validity. The degree to which scores

 Predictive Evidence of Validity. The degree to which scores

Enhancing the Validity of a Test

1. Ask others to judge the clarity of what you are assessing.

3. Sample sufficient number of examples of what is being assessed.

4. Prepare a detailed table of specifications.

6. Compare groups known to differ on what is being assessed.

7. Compare scores taken before to those taken after instruction.

8. Compare predicted consequences to actual consequences.

10. Provide adequate time to complete the assessment.

11. Ensure appropriate vocabulary, sentence structure, and item difficulty.

12. Ask easy questions first.

13. Use different methods to assess the same thing.

14. Use only for intended purposes.

RELIABILITY is concerned with the consistency, stability and dependability of

1. Stability (Test-Retest Method). Involves administering the same test

2. Equivalence (Parallel Forms Method). Two different but equivalent

3. Internal Consistency. Requires only a single administration of an

a. Split-Half Procedure. Involves scoring two halves of an

b. Kuder-Richardson Approaches. Analyzes items that are coded

4. Interrater Agreement. PCK 135/JLSMBalagtey

1. Use a sufficient number of items or tasks.

2. Use independent raters or observers who provide similar scores to the

4. Make sure the assessment procedures and scoring are as objective as

5. Continue assessment until results are consistent.

6. Eliminate or reduce the influence of extraneous events or factors.

7. Use shorter assessments more frequently than fewer long assessments.

A teacher is evaluating the validity of a 20-item Mathematics test he

Student Score in Teacher-Made Test Score in standardized Test

NOTE: No need to connect the dots.

A more precise description regarding the matching of scores in the teacher-made

r value is obtained using an equation called Pearson Product-Moment

For instance, an r value of -0.45 may be described as moderate inverse

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.