Good Test Adopted 2
Good Test Adopted 2
Practicality
✔ Practicality can be simply defined as the relationship between available resources for the
test, i.e. human resources, material resources, time, etc. and resources which will be required
in the design, development, and use of the test (Bachman & Palmer, 1996:35-36)
Cost
▪ The test should not be too expensive to conduct.
▪ The cost for the test has to stay within the budget.
▪ Avoid conducting a test that requires excessive budget.
What do you think if a teacher conducts an “Ulangan Harian” for one class consisting 30
students in SMP level that spends IDR 500.000 for every student?
Is it practical in term of cost?
Time
▪ The test should stay within appropriate time constraints.
▪ The test should not be too long or too short.
What do you think if a teacher wants to conduct a test of language proficiency that it will take a
student ten hours to complete? Is that practical in term of time?
Administration
▪ The test should not be too complicated or complex to conduct.
▪ The test should be quite simple to administer.
What do you think if a teacher in remote area who his/her students know nothing about
computer conducts a test which requires the test-takers to at least know how to interact with
the computer in order to be able to complete the test?
Is it practical in term of administration?
Scoring / Evaluation
▪ The scoring/evaluation process should fits into the time
allocation.
▪ A test should be accompanied with scoring rubrics, key answers, and so on to make it easy to
score/evaluate.
What do you think if a teacher conducts a test that it will take students a couple minutes to
complete and take the teacher several hours to score/evaluate?
Is it practical in term of scoring/evaluation?
Validity
✔ Validity of a test is the extent, to which it exactly measures what
it is supposed to measure (Hughes, 2003:26).
✔ A test must aim to provide a true measure of the particular skill which it is intended to
measure not to the extent that it measures external knowledge and other skills at the same
time (Heaton, 1990:159). For example, if a student is given a reading test about the
metamorphosis of a butterfly, a valid test will measure the reading ability (such as identifying
general or specific information of the text) not his/her prior knowledge (biology) about the
metamorphosis of a butterfly. The test should make the student relies on his/her reading ability
to complete the test.
Brown (2004:22-27) proposed five ways to establish validity. They are:
1. Content Validity
2. Criterion Validity
3. Construct Validity
4. Consequential Validity
5. Face Validity
Content Validity
▪ The correlation between the contents of the test and the language skills, structures, etc. with
which it is meant to be measured has to be crystal clear.
▪ The test items should really represent the course objective.
What do you think if a listening test requires students to read passages to complete instead of
requiring students to listening attentively? Does the test have content validity?
Criterion Validity
▪ This kind of validity emphasizes on the relationship between the test score and the outcome.
▪ The test score should really represent the criterion that is intended to measure in the test.
▪ Criterion validity can be established through two ways.
1. Concurrent Validity
A test is said to have concurrent validity if its result is supported by other concurrent
performance beyond the assessment itself (Brown, 2004:24). For example, the validity of a
high score on the final examination of a foreign language
course will be verified by the actual proficiency in the language.
2. Predictive Validity
The predictive validity tends to assess and predict a student’s possible future
success (Alderson et al.,1995:180-183). For example, TOEFL® or IELTS tests are
intended to know how well somebody will perform the capability of his/her English
in the future.
Construct Validity
▪ Construct validity refers to concepts or theories which are underlying the usage of certain
ability including language ability.
▪ Construct validity shows that the result of the test really represents the same construct with
the ability of the students which is being measured (Djiwandono, 1996:96).
Consequential Validity
▪ Consequential validity to refer to the social consequences of using a particular test for a
particular purpose.
▪ The use of a test is said to have consequential validity to the extent that society benefits from
that use of the test.
Face Validity
▪ A test is said to have face validity if it looks to other testers, teachers, moderators, and
students as if it measures what it is supposed to measure (Heaton, 1990:159).
▪ In speaking test, for instance face validity can be shown by speaking activities as the main
activities in the test. The test should focus on students activities in speaking, not anything else.
▪ The test can be judged to have face validity by simply look at the items of the test.
▪ Note that face validity can affect students in doing the test (Brown, 2004:27 & Heaton,
1988:160).
▪ To overcome this, the test constructor has to consider these:
a. Students will be more confident if they face a well constructed, expected format with familiar
tasks.
b. Students will be less anxious if the test is clearly doable within the allotted time limit.
c. Students will be optimistic if the items are clear and uncomplicated (simple).
d. Students will find it easy to do the test if the directions are very clear.
e. Students will be less worried if the tasks are related to their course work (content validity).
f. Students will be at ease if the difficulty level presents a reasonable challenge.
Reliability
✔ Reliability refers to the consistency of the scores
obtained (Gronlund, 1977:138).
✔ It means that if the test is administered to the same students on different occasions (with no
language practice work taking place between these occasions)
then it produces (almost) the same results.
✔ Reliability actually does not really deal with the test itself. It deals with the results of the test.
The test results should be consistent.
✔ Reliability falls into 4 kinds (Brown, 2004:21-22). They are:
1) Student-Related Reliability
2) Rater Reliability
3) Test Administration Reliability
4) Test Reliability
Student-Related Reliability
▪ This kind of reliability refers to temporary illness, fatigue, a bad day, anxiety and other
physical or psychological factors of the students. Thus, the score obtained of the student
maybe not his/her actual score.
Rater Reliability
Rater reliability deals with the scoring process. Factors that can affect the reliability might be
human error, subjectively, and bias in scoring process.
▪ This kind of reliability fall into two categories. They are:
1. Inter-rater reliability
It occurs when two or more scorers yield inconsistent scores of the same test, possibility for
lack of attention to scoring criteria, inexperience, inattention, or even biases.
2. Intra-rater reliabiliy
It is a common occurence for classroom teacher because of unclear scoring criteria, fatigue,
and bias toward particular “good” or “bad” students or simple carelesness.
Test Reliability
▪ Tests reliability refers to the test itself. Whether the test fits into
the time constraints.
▪ It means that the test should not be too long or short.
▪ The items of the test should be crystal clear that it will not end
with ambiguity.
Authenticity
✔ Authenticity deals with the “real world”.
✔ Authenticity is the degree of correspondence of the characteristics of a given language test
task to the features of a target language task Brown (2004:28).
✔ Teachers should construct a test with the test items are likely to be used or applied in the
real contexts of daily life.
✔ Brown (2004:28) also proposes considerations that might be helpful to present authenticity
in a test. They are:
1. The language in the test is natural as possible.
2. Items are contextualized rather than isolated.
3. Topics are meaningful (relevant, interesting) to the learners.
4. Some thematic organization to items is provided, such as
through a story or episode.
5. Tasks represent, or closely approximate, real-world tasks.