0% found this document useful (0 votes)
11 views20 pages

Article Assessment

The document outlines the definitions and differences between testing, assessment, and evaluation in educational contexts. It details various types of assessments, including formative and summative assessments, as well as different test types such as diagnostic, placement, and proficiency tests. Additionally, it discusses key principles of language assessment, emphasizing practicality, reliability, validity, authenticity, and washback.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views20 pages

Article Assessment

The document outlines the definitions and differences between testing, assessment, and evaluation in educational contexts. It details various types of assessments, including formative and summative assessments, as well as different test types such as diagnostic, placement, and proficiency tests. Additionally, it discusses key principles of language assessment, emphasizing practicality, reliability, validity, authenticity, and washback.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Testing and Evaluation

By : Ana MARYBEL Cajahuaman Paucar


1. DEFINITION OF TESTING :

It is a technique of obtaining infomation needed for evaluation purposes. (Tests, quizzes,


measuring instruments) are devices used to obtain such information.

A TEST is « a method of measuring a person’s ability on knowledge in a given


area »
2. DEFINITION OF ASSESSMENT :
 It is the process of collecting information or evidence of a learner’s learning

progress and achievement over a period of time in order to improve teaching and
learning.
 Assessment is typically used to describe processes to examine or measure
student learning that results from academic programs.
 Assessment is an ongoing process aimed at imroving student learning.
 Assessment is not based on one test or one task, nor it is expressed by mark or
grade, but rather in a report form with scales or levels as well as description and
comment from the teacher.

1. TYPES OF ASSESSMENT:
1. Formative Assessment : when teachers use it to check on the progress of
their students, to see how far they have mastered what they should have learnt,
and then use this information to modify their future teaching plans.
*Informal assessment is a part of formative assessment. It can take a number
of forms : Unplanned comments, verbal feedback to students, observing
students perform a task of work in small groups and so on.

2. Summative Assessment : is used at the end of the term, semester, or year


in order to measure what has been achieved both by groups and by
individuals.

Research Methodology : https://


www.facebook.com/groups/689682314444236/
*Formal assessment is part of summative assessment. i.e. Exercises or
procedures which are systematic and give students and teachers an appraisal
of students’ achievement.

3. DEFINITION OF EVALUATION:
It is the process of making overall judgment about one’s work or a whole
school’s work.
Evaluation is typically broader concept than assessment as it focuses on the
overall, or summative experience.

when we ASSESS our students we commonly are interested in“how and


how much our students have learnt” , but when we EVALUATE them we
are concerned with “how the learning process is developing” .

4. WHAT ARE THE MAIN REASONS FOR TESTING ?

a. Achievement/Attainment tests: usually more formal, designed to show


mastery of a particular syllabus (e.g. end-of-year tests, school-leaving exams,
public tests) though similar (re-syllabus) to progress tests. Rarely constructed by
classroom teacher for a particular class. Designed primarily to measure individual
progress rather than as a means of motivating or reinforcing language.
b. Progress Tests: Most classroom tests take this form. Assess progress students
make in mastering material taught in the classroom. Often given to motivate
students. They also enable students to assess the degree of success of teaching
and learning and to identify areas of weakness & difficulty. Progress tests can
also be diagnostic to some degree.
c. Diagnostic Tests : can include Progress, Achievement and Proficiency tests,
enabling teachers to identify specific weaknesses/difficulties so that an
appropriate remedial programme can be planned. Diagnostic Tests are primarily
designed to assess students' knowledge & skills in particular areas before a course
of study is begun. Reference back to class-work. Motivation. Remedial work.

Research Methodology : https://


www.facebook.com/groups/689682314444236/
d. Placement Tests : sort new students into teaching groups so that they are
approx. the same level as others when they start. Present standing. General ability
rather than specific points of learning. Variety of tests necessary. Reference
forward to future learning. Results of Placement Tests are needed quickly.
Administrative load.
e. Proficiency Tests : measure students' achievements in relation to a specific task
which they are later required to perform (e.g. follow a university course in the
English medium; do a particular job). Reference forward to particular application
of language acquired: future performance rather than past achievement. They
rarely take into account the syllabus that students have followed. Definition of
operational needs. Practical situations. Authentic strategies for coping. Common
standard e.g. driving test regardless of previous learning. Application of common
standard whether the syllabus is known or unknown.
f. Aptitude Tests: measure students probable performance. Reference forward but
can be distinguished from proficiency tests. Aptitude tests assess proficiency
in language for language use (e.g. will S experience difficulty in identifying
sounds or the grammatical structure of a new language?) while Proficiency tests
measure adequacy of control in L2 for studying other things through the medium
of that language.

5. TYPES OF TESTS :

Proficiency tests Achievement tests Diagnostic tests Placement tests


*They are designed to *Achievement tests are “more *Are used to identify *intended to provide
measure students’ aility in formal”, whereas Hughes learners’ strengths and information that will
a language. (1989:8) assumes that this type weaknesses. help to place students
*The content is not based of tests will fully involve *They are intended to at the stage of the
on the content or teachers, for they will be ascertain what learning still teaching programme
objectives of language responsible for the preparation needs to take place. most appropriate to
courses that people taking of such tests and giving them to a diagnostic test is helps us their abilities.
may have followed. the learners. evaluate our teaching ,the *They are used to
*It is based on a *Are directly related to language syllabus, as well as the assign students to
specification of what courses, their purpose being to material used in addition to classes at different
candidates have to be able establish how successful locating difficulties and levels.
to do in the language in individual students, or the planning appropriate placement tests are

Research Methodology : https://


www.facebook.com/groups/689682314444236/
order to be considered courses themselves have been in remedial teaching. designed to help
proficient. achieving objectives. * diagnostic testing often decide what each
* proficiency test is a test, *An achievement test at the end requires more student’s
which measures how much of the course to check the detailed information about appropriate level will
of a language a person acquisition of the material the very specific areas in be within a specific
knows or has learnt. It is covered during the study year which students have program, skill area, or
not bound to any *There are of 2 types : strengths and weaknesses. course.
curriculum or syllabus, but Final achievement tests : *The purpose is to help *The purpose of such
is intended to check the administered at the end of a students and their tests is to reveal which
learners’ language course of a study. They may be teachers to focus their students have more of,
competence written and administered by efforts where they will be or less
The examples of such tests ministries of education, official most effective. of, a particular
could be the American examining boards, or by * the most effective use of knowledge or skill so
Testing of English as memebers of teaching a diagnostic test is to report that students with
Foreign Language test institutions. The content of the performance level on similar levels of
(further in the text these tests must be related to the each objective (in a ability can be grouped
TOEFL) that is used to courses with which they are percentage) to each student together
measures the learners’ concerned. so that he or she can decide *the placement test
general knowledge of *Progress achievement tests : how and where to invest typically could be
English in order to allow intended to measure the progress time and energy most represented in the form
them to enter any high that the students are making. profitably. of dictations,
educational establishments They contribute to formative *They are designed to interviews, grammar
or to take up a job in the assessment. determne the degree to tests, etc.
USA *Achievement tests must be not which the specific *a placement test is
*proficiency test is used to only very specifically designed objectives of the course designed and given in
assess the general to measure the objectives of a have been accomplished as order to use the
knowledge or skill given course but also flexible well as to assess students’ information of the
commonly required to enough to help teachers readily strengths and weaknesses to students’ knowledge
entry into a group of respond to what they learn from correct individual for putting the students
similar institution. Because the test about the students’ deficiencies before it’s too into groups according
of the general nature of abilities, the students’ needs, and late. These tests aim at to their level of the
proficiency decisions, a the students’ learning of the fostering achievement by language.
proficiency test must be course objectives. promoting strengths and
designed so that the *Achievement tests are mainly eliminating weaknesses of
general abilities and skills given at definite times of the students. In other words,
of students are reflected in school year. Moreover, they the purpose of this type of
a wide distribution of could be extremely crucial for tests is to diagnose
scores. Thus, proficiency the students, for they are students’ problems during
decisions must be based on intended either to make the the learning process to
the best obtainable students pass or fail the test. focus their efforts where
proficiency test scores as Alderson (ibid.) mentions two they will be most effective.
well as other information usage types of achievement
about students tests: formative and summative.
The notion of a formative test

Research Methodology : https://


www.facebook.com/groups/689682314444236/
denotes the idea that the teacher
will be able after evaluating the
results of the test reconsider
his/her teaching, syllabus design
and even slow down the pace o f
studying to consolidate the
material if it is necessary in
futur
eSummative usage will dea l
precisely with the students’
success or failure. The teacher
will immediately can take up
remedial activities to improve a
situatio
n.
*students are tested to find out
how much each person has
learnt within the program.
Achievement decisions are
about
the amount of learning that
students have done. They are
flexible to help teachers respond
to what they learn from the test
about students’ ability, students’
needs and students’ learning of
the course objective

6. PRINCIPLES OF LANGUAGE
ASSESSMENT
There are five principles of language
assessment; they are practicality, reliability,
validity, authenticity, and washback.

1. PRACTICALITY

An effective test is practical. This means that


it:
 is not excessively expensive.
 A test that is prohibitively expensive is
impractical.
 stays within appropriate time constraint.
A test of language proficiency that takes a student 10 hours to complete is
Research Methodology : https://
impractical. www.facebook.com/groups/689682314444236/

 is relatively easy to administer


A test that takes a few minutes for a student to take and several hours for an examiner
to evaluate for most classroom situation is impractical.
 has a scoring/evaluation procedure that is specific and time efficient.
A test that can be scored only by computer if the test takes place a thousand miles away
from the nearest computer is impractical.
Furthermore, for a test to be practical:
 administrative details should clearly be established before the test,
 students should be able to complete the test reasonably within the set time frame,
 all materials and equipment should be ready,
 the cost of the test should be within budgeted limits,
 the scoring/evaluation system should be feasible in the teacher’s time frame.

Validity and reliability are not enough to build a test. Instead, the test should be practical
across time, cost, and energy. Dealing with time and energy, tests should be efficient in
terms of making, doing, and evaluating. Then, the tests must be affordable. It is quite
useless if a valid and reliable test cannot be done in remote areas because it requires an
inexpensive computer to do it (Heaton, 1975: 158 -159; Weir, 1990: 34-35;
Brown, 2004: 19-20).

6.2. RELIABILITY
A reliability test is consistent and dependable. A number of sources of unreliability may
be identified:
·
a. Students-related Reliability
A test yields unreliable results because of factors beyond the control of the test taker,
such as illness, fatigue, a “bad day”, or no sleep the night before.

b. Rater (scorer) Reliability

Research Methodology : https://


www.facebook.com/groups/689682314444236/
Rater reliability sometime refers to the consistency of scoring by two or more scorers.
Human error, subjectivity, and bias may enter into the scoring process. Inter- rater
unreliability occurs when two or more scorers yield inconsistent score of the same test,
possibly for lack of attention to scoring criteria, inexperience, or inattention. Intra-
rater unreliability is because of unclear scoring criteria, fatigue, and bias toward
particular “good” and “bad” students.
· Test Administration Reliability
Unreliability may result from the condition in which the test is administered. For
example is the test of aural comprehension with a tape recorder. When a tape recorder
played items, the students sitting next to windows could not hear the tape accurately
because of the street noise outside the building.
c. Test Reliability
If a test is too long, test-takers may become fatigued by the time they reach the later
items and hastily respond incorrectly.
d. Test and test administration reliability can be achieved by making sure that all
students received the same quality of input. Part of achieving test reliability
depends on the physical context-making sure, for example, that every students has a
cleanly photocopied test sheet, sound amplification is clearly audible to everyone
in the room, video input is equally visible to all, lightning, temperature, and other
classroom conditions are equal (and optimal) for all students.

Reliability refers to consistency and dependability. A same test delivered to a same


student across time administration must yield same results. Factors affecting reliability
are (Heaton, 1975: 155-156; Brown, 2004: 21-22):
1. student-related reliability: students personal factors such as motivation,
illness, anxiety can hinder from their ‘real’ performance,
2. rater reliability: either intra-rater or inter-rater leads to subjectivity, error,
bias during scoring tests,
3. test administration reliability: when the same test administered in
different occasion, it can result differently,

Research Methodology : https://


www.facebook.com/groups/689682314444236/
4. test reliability: dealing with duration of the test and test instruction. If a test
takes a long time to do, it may affect the test takers performance such as
fatigue, confusion, or exhaustion. Some test takers do not perform well in the
timed test. Test instruction must be clear for all of test takers since they are
affected by mental pressures.
Some methods are employed to gain reliability of assessment (Heaton, 1975:
156; Weir 1990: 32; Gronlund and Waugh, 2009: 59-64). They are:
1. test-retest/re-administer: the same test is administered after a lapse of
time. Two gained scores are then correlated.
2. parallel form/equivalent-forms method: administrating two cloned tests at
the same time to the same test takers. Results of the tests are then correlated.
3. split-half method: a test is divided into two, corresponding scores obtained,
the extent to which they correlate with each other governing the reliability of
the test as a whole.
4. test-retest with equivalent forms: mixed method of test-retest and parallel
form. Two cloned tests are administered to the same test takers in different
occasion.
5. intra-rater and inter-rater: employing one person to score the same test in
different time is called intra-rater. Some hits to minimize unreliability are
employing rubric, avoiding fatigue, giving score on the same numbers, and
suggesting students write their names at the back of test paper. When two
people score the same test, it is inter-rater. The tests done by test takers are
divided into two. A rubric and discussion must be developed first in order to
have the same perception. Two scores either from intra- or inter-rater are
correlated.
6.3. VALIDITY
Validity is the extent to which inferences made from assessment result are appropriate,
meaningful, and useful for the purpose of the assessment. It is the most complicated yet
the most important principle. Validity can be measured using statistical correlation with
other related measures.

Research Methodology : https://


www.facebook.com/groups/689682314444236/
According to Bynom (Forum, 2001), validity deals with what is tested and degree to
which a test measures what is supposed to measure (Longman Dictionary, LTAL). For
example, if we test the students writing skills giving them a composition test on Ways of
Cooking, we cannot denote such test as valid, for it can be argued that it tests not
our abilities to write, but the knowledge of cooking as a skill.

A. Content-related Validity
A test is said to have content validity when it actually samples the subject matter about
which conclusion are to be drawn, and require the test–taker to perform the behavior being
measured. For example, speaking ability is tested using speaking performance, not pencil
and paper test. It can be identified when we can define the achievement being measured.
It can be achieved by making a direct test performance. For example to test pronunciation
teacher should require the students to pronounce the target words orally.
There are two questions are used to applying content validity in classroom test:
1. Are classroom objectives identified and appropriately framed? The objective should
include a performance verb and specific linguistic target.
2. Are lesson objectives represented in the form of test specification? A test should
have a structure that follows logically from the lesson or unit being tes ted. It can be
designed by dividing the objectives into sections, offering students a variety of item
types, and gives appropriate weight to each section.

B. Criterion-related Validity
The extent to which the “criterion” of the test has actually been reached. It can be best
demonstrated through a comparison of result of an assessment with result of some other
measure of the same criterion.
Criterion-related validity usually falls into two categories:
1. Concurrent Validity: if the test result supported by other concurrent
performance beyond assessment. (e.g.: high score in English final exam supported
by actual proficiency in English)

Research Methodology : https://


www.facebook.com/groups/689682314444236/
2. Predictive Validity: to asses or predict the test-taker’s likelihood of
future success. (e.g.: placement test, admission assessment)

C. Construct-related Validity
Construct validity ask “Does the test actually touch into the theoretical construct as
it has been defined?”. An informal construct validation of the use of virtually every
classroom test is both essential and feasible. For example, the scoring analysis of
interview includes pronunciation, fluency, grammatical accuracy, vocabulary used and
sociolinguistics appropriateness. This is the theoretical construct of oral proficiency.
Construct validity is a major issue in validating large-scale standardized test of
proficiency.

D. Consequential Validity
It includes all the consequences of a test, such as its accuracy in measuring the intended
criteria, its impact on the test-takers preparation, its effect on the learner, and the social
consequences of the test interpretation and use. One aspect of consequential validity
which draws special attention is the effect of test preparation courses and manual on
performance.
E. Face Validity
Face validity is the extent to which students view the assessment as fair, relevant, and
useful for improving learning. It means that students perceive the test to be valid. It will
be perceived valid if it samples the actual content of what the learners has achieved or
expect to achieve. Nevertheless the psychological state of the test -taker (confidence,
anxiety) is an important aspect in their peak performance.
Test with high face validity has the following characteristics:
• Well constructed, expected format with familiar task.
• Clearly doable within the allotted time.
• Clear and uncomplicated test item.
• Crystal clear direction.
• Task that relate to students course work.
• A difficulty level that present a reasonable challenge.

Research Methodology : https://


www.facebook.com/groups/689682314444236/
Another phrase associated with face validity is” biased for best”. Teachers can make
a test which is” biased for best” by offering students appropriate review and preparation
for the test, suggesting strategies that will be beneficial, or structuring the test so that the
best students will be modestly challenged and the weaker students will not be
overwhelmed.
The concept of face validity according to Heaton (1975: 153) and Brown (2004:
26) is that when a test item looks right to other testers, teachers, moderators, and test-
takers. In addition, it appears to measure the knowledge or abilities it claims to measure.
Heaton argues that if a test is examined by other people, some absurdities and
ambiguities can be discovered.
Face validity is important in maintaining test takers’ motivation and performance
(Heaton, 1975; 153; Weir, 1990: 26). If a test does not have face validity, it may not be
acceptable to students or teachers. If students do not take the test as valid, they will show
adverse reaction (poor study reaction, low motivation). In other words, they will not
perform in a way which truly reflects their abilities.
Brown (2004: 27) states that face validity will likely be high if learners
encounter:
1. a well-constructed, expected format with familiar tasks,
2. a test that is clearly doable within the allotted time limit,
3. items that are clear and uncomplicated,
4. directions that are crystal clear,
5. tasks that relate to their course work (content validity), and
6. a difficulty level that presents a reasonable challenge.

To examine face validity, no statistical analysis is needed. Judgmental responses


from experts, colleagues, or test takers may be involved. They can read thoroughly to the
whole items or they can just see at glance the items. Then, they can relate to the ability
that the test want to measure. If a speaking test appears in vocabulary items, it may not
have face validity.
6.4. AUTHENTICITY

Research Methodology : https://


www.facebook.com/groups/689682314444236/
Authenticity is the degree of correspondence of the characteristics of a given
language test task to the features of a target language task. It also means a task that is
likely to be encountered in the “real world”.
Authenticity can be presented by:
• Using a natural language
• Contextualizing the test item
• Giving meaningful (relevant, interesting) topics for the learners.
• Providing thematic organization to the item (e.g. through story line or episode)
• Giving test which represent or closely approximate real world task.

6.5. WASHBACK
In general terms, washback means the effect of testing on teaching and lear ning. In
large-scale assessment, it refers to the effects that test have on instruction in the terms
of how the students prepare for the test. While in classroom assessment, washback
means the beneficial information that washesback to the students in the f orm of useful
diagnoses of strengths and weaknesses.
In enhancing washback, the teachers should comment generously and specifically on
test performance, respond to as many details as possible, praise strengths, criticize
weaknesses constructively, and give strategic hints to improve performance.
The teachers should serve classroom tests as learning device through which
washback is achieved. Students’ incorrect responses can become windows of insight into
further work. Their correct responses need to be praised, especially when they represent
accomplishments in a student’s inter-language.
Washback enhances a number of basic principles of language acquisition: Intrinsic
motivation, autonomy, self-confidence, language ego, inter-language, and strategic
investment, among others.
One way to enhance washback is to comment generously and specifically on test
performance. Washback implies that students have ready access to the teacher to discuss
the feedback and evaluation he/she has given.

Research Methodology : https://


www.facebook.com/groups/689682314444236/
The effects of tests on teaching and learning are called washback. Teachers must be able
to create classroom tests that serve as learning devices through which washback is
achieved. Washback enhances intrinsic motivation, autonomy, self-confidence, language
ego, interlanguage, and strategic investment in the students. Instead of giving letter
grades and numerical scores which give no information to the students’ performance,
giving generous and specific comments is a way to enhance washback (Brown 2004: 29).
Heaton (1975: 161-162) mentions this as backwash effect which falls into macro
and micro aspects. In macro aspect, tests impact society and education system such as
development of curriculum. In micro aspect, tests impact individual student or teacher
such as improving teaching and learning process.
Washback can also be negative and positive (Saehu, 2012: 124 -127). It is easy to
find negative wash back such as narrowing down language competencies only on those
involve in tests and neglecting the rest. While language is a tool of communication,
most students and teachers in language class only focus on language competencies in the
test. On the other hand, a test can be positive washback if it encourages better teaching
and learning. However, it is quite difficult to achieve. An example of positive washback
of a test is National Matriculation English Test in China. It resulted that after the test was
administered, students’ proficiency in English for actual or authentic language use
situation improved.
Washback can be strong or weak (Saehu, 2012: 122-123). An example of strong
effect of the test is national examination; meanwhile weak effect of the test is the
impact of formative test. Let us compare and decide how most students and
teachers react on those two kinds of test.

7. WAYS OF TESTING:

1. Direct and Indirect Testing:

Direct Indirect
*Hughes (1989:14) : the involvement of a skill that is *Indirect testing. It differs from direct one in the way
supposed to be tested. The following view means that that it measures a skill through some other skill. It
when applying the direct testing the teacher will be could mean the incorporation of various skills that
interested in testing a particular skill, e.g. if the aim of are connected with each other, e.g. listening and

Research Methodology : https://


www.facebook.com/groups/689682314444236/
the test is to check listening comprehension, the students speaking skills.
will be given a test that will check their listening skills,
such as listening to the tape and doing the *Indirect testing, regarding to (Hughes), tests the
accompanying tasks. Such type of test will not engage usage of the language in real-life situation
testing of other skills.
*indirect testing is more effective than direct one, for
*Testing is direct when it requires the learner to perform it covers a broader part of the language. It denotes
precisely the skill that we wish to measure. If we want that the learners are not constrained to one particular
to know how well learners can write compositions, we skill and a relevant exercise. They are free to
get them to write compositions. The tasks and the texts elaborate all four skills; what is checked is their
used should be as authentic as possible ability to operate with those skills and apply them i n
various, even unpredictable situations. This is the
*It is said that the advantages of direct testing is that it is true indicator of the learner’s real knowledge of the
intended to test some certain abilities, and preparation language
for that usually involves persistent practice of certain .
skills. Nevertheless, the skills tested are deprived from
the authentic situation that later may cause difficulties
for the students in using them.

7.2. Discrete point and Integrative Testing:


Discrete point Integrative
_discrete point test is a language test that is meant to test _ The integrative test intends to check several
a particular language item, e.g. tenses. The basis of that language skills and language components together or
type of tests is that we can test components of the simultaneously. Hughes (1989:15) stipulates that the
language (grammar, vocabulary, pronunciation, and integrative tests display the learners’ knowledge of
spelling) and language skills (listening, reading, grammar, vocabulary, spelling together, but not as
speaking, and writing) separately separate skills or items.

7.3. Norm referenced and Criterion referenced Testing:

They are not focused directly on the language items, but on the scores the students
can get

Norm Referenced (proficiency & placement tests) Criterion Referenced (achievement & diagnostic)
*Norm-referenced tests refer to standardized tests that *Criterion-referenced tests are designed to measure
are designed to compare and rank test takers in relation students’ performance against a fixed set of criteria or
to one another. This type of tests reports whether test learning standards. That is to say, they are written
takers performed better or worse than a hypothetical descriptions of what students are expected to know
average student. It is designed to measure global and be able to do a lot at a specific stage of their
language abilities, such as overall English language education. CRTs provide information on whether
proficiency and academic listening ability, in which each students have attained a predetermined level of
student’s score is interpreted relative to the scores of all performance called “mastery”.
other students who took the test.

Research Methodology : https://


www.facebook.com/groups/689682314444236/
* The aim of testing is not to compare the results of
* Norm-referenced test that measures the knowledge of the students. It is connected with the learners’
the learner and compares it with the knowledge of knowledge of the subject. As Hughes (1989:16) puts it
another member of his/her group. The learner’s score is the criterion-referenced tests check the actual
compared with the scores of the other students language abilities of the students. They distinguish the
weak and strong points of the students. The
*In NRTs, testers interpret each student’s performance in students either manage to pass the test or fail it.
relationship to the performances of other students in the
norm group. In other words, NRTs examine the * the primary focus in interpreting scores is on how
relationship of a given student’s performance to that of all much of the material each student has learnt in
other students in percentile terms. Scores are expressed absolute terms. That is, teachers are concerned with
with no references to the actual number of test questions how much of the material the students know (The
answered correctly .This means that teachers are mainly Percentage). They care about the percentage of
concerned with the student’s percentile score which questions the students answered correctly in
informs them about the proportion of students who connection with the material at hand without reference
scored above and below the student in question. to students’ positions. A high percentage score means
that the test was very easy for students who knew the
material being tested.
*Tests are used to measure general abilities such as
language proficiency in English. This type of tests has * CRTs are designed to provide precise information
subtests that are general in nature. For example, about each individual’s performance on well-defined
measuring listening comprehension, reading learning points. Subtests for a notional functional
comprehension and writing language course might consist of a short interview
where ratings are made of students’ ability to perform
*the purpose is to generate scores that spread the greetings, express opinions and so on.
students out along a continuum of general abilities. *the purpose is to assess the amount of skill learnt by
Thus, any existing difference between individuals can be each student. That is to say, the focus here is on a
distinguished since a student performance is compared student’s performance compared to the amount of
to others in the same group material known by that student, and not on scores’
distribution.
*the test is very long and contains a variety of different
types of question content. The content is diverse and * a CRT consists of numerous and short subtests in
students find difficulties to know exactly what will be which each objective in the course will have its own
tested because the test is made up of a few subtests on subtest. To save time and efforts, subtests will be
general language skills such as reading and listening collapsed together which makes it difficult for an
comprehension outsider to identify the subtests.

* students know the general format of the questions but * students can predict both the questions formats on
not the language points or content to be tested by those the test and the language points to be tested.
questions Teaching to such a test should help teachers and
students stay on track .Besides, the results should
provide a useful feedback on the effectiveness of
teaching and learning processes.

Research Methodology : https://


www.facebook.com/groups/689682314444236/
7.4.Objective and Subjective Testing:
Objective Subjective
*A objective testing is one *A subjective testing is one that can possibly be interpreted differently
that can’t be interpreted
differently because of *The subjective test involves personal judgement of the examiner
numerical values.
*when testing the *Testing subjectively could imply the teacher’s ideas and judgements. This could be
students encountered during speaking test where the student can produce either positive or
objectively, the teacher
usually checks just the negative impression on the teacher. Moreover, the teacher’s impression and his/her
knowledge of the students’ true abilities can seriously influence assessing process.
knowledge of the topic
For example, the student has failed the test; however, the teacher knows the true
abilities of the student and, therefore, s/he will assess the work of that student
differently taking all the factors into account.

7.5. Communicative Testing:

*It involves the knowledge of grammar and how it could be applied in written and oral language; the
knowledge when to speak and what to say in an appropriate situation; knowledge of verbal and non-
verbal communication. All these types of knowledge should be successfully us ed in a situation
*without a context the communicative language test would not function. The context should be as
closer to the real life as possible. It is required in order to help the student feel him/herself in the natural
environment
.*the student has to possess some communicative skills, that is how to behave in a certain situation, how
to apply body language, etc.
*Communicative language testing involves the learner’s ability to operate with the language s/he knows
and apply it in a certain situation s/he is placed in. S/he should be capable of behaving in real -life
situation with confidence and be ready to supply the information required by a certain situation.
Thereof, we can speak about communicative language testing as a testing of the student’s ability to
behave him/herself, as he or she would do in everyday life. We evaluate their performance.

8. A SHORT HISTORY OF LANGUAGE TESTING:

Spolsky (1975) identifies three stages in the recent history of language testing: 1) The
pre-scientific 2) the psychometric-structuralist and 3) the psycho-linguistic-
sociolinguistic.

1. The Pre-scientific Period:

Research Methodology : https://


www.facebook.com/groups/689682314444236/
Language testing has its roots in pre-scientific stage in which no special skill or
expertise in testing is required. This is characterized by lack of concern for statistical
considerations or for such notions as objectivity and reliability (Heaton 1988, Weir
1990; Farhady et al., 1994). In its simplest form, this trend assumes that one can and
must rely completely on the subjective judgment of an experienced teacher, who can
identify after a few minutes of conversation, or after reading a student’s essay, what
mark to give him/her in order to specify the related language ability.

The pre-scientific movement is characterized by translation tests developed exclusively


by the classroom teachers. One problem that arises with these types of tests is that they
are relatively difficult to score objectively; thus, subjectivity becomes an important
factor in the scoring of such tests (Brown, 1996). It is inferred from Hinofotis’s article
(1981) that the pre-scientific movement ended with the onset of the psychometric
structuralist movement, but clearly such movements have no end in language teaching
and testing because, such teaching and testing practices are indubitably going on in
many parts of the world depending on the needs which specific academic contexts
demand.

8.2. The Psychometric Structuralist Period :

With the onset of the psychometric-structuralist movement of language testing, language


tests became increasingly scientific, reliable, and precise. In this era, the testers and
psychologists, being responsible for the development of modern theories and techniques
of educational measurement, were trying to provide objective measures, using various
statistical techniques to assure reliability and certain kind of validity. According to
Carrol (1972), psychometric-structuralist tests typically set out to measure the discrete
structural elements of language being taught in audio-lingual and related teaching
methods of the time. The standard tests, constructed according to discrete point
approach, were easy to administer and score and were carefully constructed to be
objective, reliable and valid. Therefore, they were considered as an improvement on the
testing practices of the pre-scientific movement (Brown, 1996).

Research Methodology : https://


www.facebook.com/groups/689682314444236/
In psychometric structuralist period, there was a remarkable congruence between
American structuralist view of language and psychological theories and practical needs
of testers. On the theoretical side, both agreed that language learning was chiefly
concerned with the systematic acquisition of a set of habits; on the practical side, testers
wanted and structuralists knew how to deliver long lists of small items which could be
sampled and tested objectively.

However, the following triple objectives were achieved from discrete tests, which was
the result of the coalescence of the two fields.

1) diagnosing learner strengths;

2) prescribing curricula at particular skills;

3) developing scientific strategies to help learners overcome particular weakness

The psychometric-structuralist movement was important because for the first time
language test development followed scientific principles. In addition, Brown (1996)
maintains that psychometric-structuralist movement could be easily handled by trained
linguists and language testers. As a result, statistical analyses were used for the first time.
Interestingly, psychometric-structuralist tests are still very much in evidence around the
world, but they have been supplemented by what Carrol (1972) called integrative tests.

8.3. The Integrative-Sociolinguistic Period :

With the attention of linguists inclined toward generativism and psychologist toward
cognition, language teachers adopted the cognitive-code learning approach for teaching a
second and/or foreign language. Language professionals began to believe that language is
more than the sum of the discrete elements being tested during the psychometric-
structuralist movement (Brown, 1996; Heaton 1991; Oller, 1979).

The criticism came largely from Oller (1979) who argued that competence is a unified
set of interacting abilities that cannot be tested apart and tested adequately. The claim

Research Methodology : https://


www.facebook.com/groups/689682314444236/
was that communicative competence is so global that it requires the integration of all
linguistic abilities. Such global nature cannot be captured in additive tests of grammar,
reading, vocabulary, and other discrete points of language. According to Oller (1983), if
discrete items take language skill apart, integrative tests put it back together; whereas
discrete items attempt to test knowledge of language a bit at a time, integrative tests
attempt to assess a learner’s capacity to use many bits all at the same time.

This movement has certainly its roots in the argument that language is creative.
Beginning with the work of sociolinguists like Hymes (1967), it was felt that the
development of communicative competence depended on more than simple grammatical
control of the language; communicative competence also hinges on the knowledge of the
language appropriate for different situations.

Tests typical of this movement were the cloze test and dictation, both of which assess the
students’ ability to manipulate language within a context of extended text rather than in a
collection of discrete-point questions. The possibility of testing language in context led
to further arguments that linguistic and extralinguistic elements of language are
interrelated and relevant to human experience and operate in orchestration.

Consequently, the broader views of language, language use, language teaching, and
language acquisition have broadened the scope of language testing, and this brought about
a challenge that was articulated by Canale (1984) as the shift in emphasis from language
form to language use. This shift of focus placed new demands on language as well as
language testing.

Evaluation within a communicative approach must necessarily address, for example, new
content areas such as sociolinguistic appropriateness rules, new testing formats to
permit and encourage creative, open-ended language use, new test administration
procedures to emphasize interpersonal interaction in authentic situations, and new
scoring procedures of a manual and judgmental nature (Canale 1984, p. 79, cited in
Bachman, 1995).

Research Methodology : https://


www.facebook.com/groups/689682314444236/
For both theory and practice, the challenge is thus to develop tests that reflect current
views of language and language use, in that they are capable of measuring a wide range of
abilities generally associated with ‘communicative competence’ or ‘communicative
language ability’, and include tasks that themselves embody the essential features of
communicative language use (Bachman 1995, p. 296).

Research Methodology : https://


www.facebook.com/groups/689682314444236/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy