Assessment
Assessment
CHAPTER 7 ASSESSMENT
For English teachers assessment includes any means of checking what students can do with the language ,
of checking what students cannot do with the language. It may be carried out during of after a course , it
can be individual or as a whole class his the systematic collection, review and use of information about
educational programs to improve student learning. Assessment focuses on what students know, what they
are able to do, and what values they have when they graduate. Assessment is concerned with the collective
impact of a program on student learning. Assessment is concerned with the quality of teaching as well as
the quality in other words assessment is :
What and how assess depend on your purpose we and we can distinguish different purposes for assessment
as well different kind of assessments.We actually assess students for quite a range of different reasons
motivation
* to givefeedback
*to grade
A Placement
The purpose is to place a student on a suitable course . A test which assigns to the candidate a level so
that he or she can be placed in a particular class. Or new students into teaching groups so that they are
approx. the same level as others when they start.
B Diagnosis
Diagnostic Tests are primarily designed to assess students' knowledge & skills in particular areas before a
course of study is begun. Reference back to class-work. Motivation. Remedial work. A test which diagnoses
a student's linguistic strengths and weaknesses. For example, a diagnostic test might reveal that a
student has trouble using articles
This kind of test can include Progress, Achievement and Proficiency tests, enabling teachers to identify
specific weaknesses/difficulties so that an appropriate remedial programme can be planned. Concerns the
past. It may or may not refer to a known syllabus. Diagnostic test is intended to diagnose learning difficulties
during instruction. Thus, the main aim of diagnostic test is to determine the causes of learning difficulties and
then to formulate a plan for a remedial action.
C SELECTION
A competitive assessment or competitive analysis is a business-planning tool that attempts to account for
the presence of competitors and their potential impact on business decisions.
A competitive analysis is an assessment of the competition in a certain market aimed at informing business
decisions. An assessment typically involves creating a list of competitors and creating a profile for each
competitor that includes information such as the types of products and services they sell, their market share,
marketing strategies, and notable strengths and weaknesses. The assessment may also include comparisons
between a business's specific products and services and the offerings of competitors.
The purpose of a competitive assessment is to help managers account for the presence of competitors when
making business decisions. Identifying the strengths and weaknesses of competitors can allow managers to
exploit weaknesses, emulate strengths, or avoid competing in areas where other companies are especially
strong. Failure to account for the presence of competitors can result in bad business decisions. For example,
if a certain neighborhood already has a well-established auto repair shop, it might not be a wise to open a
similar shop in that area. On the other hand, a new shop that specializes in different or complementary
services might have a better chance of being successful.
D PROFICIENCY
Measure students' achievements in relation to a specific task which they are later required to perform (e.g.
follow a university course in the English medium; do a particular job). Reference forward to particular
application of language acquired: future performance rather than past achievement. They rarely take into
account the syllabus that students have followed. Definition of operational needs. Practical situations.
Authentic strategies for coping. Common standard e.g. driving test regardless of previous learning.
Application of common standard whether the syllabus is known or unknown. A test of the current language
level of the candidate. It is distinguished from an achievement test by the fact that its candidates may come
from a range of different language backgrounds and may have acquired their foreign language in many
different ways:
E PROGRESS
Most classroom tests take this form they assess progress students make in mastering material taught in the
classroom. Often given to motivate students. They also enable students to assess the degree of success of
teaching and learning and to identify areas of weakness & difficulty. Progress tests can also be diagnostic to
some degree. This test shows how well students have learnt the section of a course that has just been
taught. Such a test may be called a 'progress test' or an 'attainment test'. A summative test is a form of
achievement test. Has a known syllabus and concerns the future (e.g. "O" & A level or University degree exams)
F Aptitude
A test showing how well a student is likely to learn a particular skill. A language aptitude test may contain, for
example, subtests of memory, inductive ability and grammatical understanding. Has no past and concerns the
future: re language performance itself e.g. Modern Language Aptitude Test University of York .
A ANXIETY
Many students become anxious just before a test that they know will be difficult, and most get
nervous when they have to give a prepared speech in front of their peers. Such temporary feelings
of anxiety are instances of state anxiety. However, some students are anxious a good part of the
time, even when the situation is not especially dangerous or threatening. For example, some
students get excessively nervous even before very easy exams, and others may be so anxious about
mathematics that they cant concentrate on the simplest math assignment. A learner who shows a
pattern of responding with anxiety even in nonthreatening situations has a case of trait anxiety, a
chronic condition that often interferes with maximal performance.
Teachers should:
become aware of students' developmental levels and the pressure they may be placing on students
prior to test administration.
teach students successful test-taking strategies that include understanding test time limits, the
importance of pacing, and the different type of test formats (e.g. multiple choice, essay, fill in the
blank).
consider designing some classroom tests using the standardized test format during the school year.
help students understand test ceilings and provide information on whether or not they will be
penalized for incorrect responses. If points are deducted for incorrect responses students should be
informed to leave items blank (Sycamore & Corey, 1990).
address test anxiety in class by exploring students' concerns and, if necessary, meet with the school
counselor and parents of identified students to confront this issue.
PRINCIPLES OF ASSESSMENT
B Validity
Just as important as reliability is the question of validity. Does the assessed task actually assess
what you want it to? Just because an exam question includes the instruction analyze and evaluate
does not actually mean that the skills of analysis and evaluation are going to be assessed. Does it
measure what it is intended to measure? A test is not valid, for example, if it is intended to test a
student's level of reading comprehension in a foreign language but instead tests intelligence or
background knowledge.
When a new test is constructed it should be assessed for validity in as many ways as possible. The
aspects of validity which are looked at will, of course, depend on the purpose for which the test has
been designed and will partly depend on the importance of the assessment.
In conclusion validity is an aspect of test validity where specialists decide whether a test, or test items,
assesses what the test constructors intend to be assessed.
Validity has been described as 'the agreement between a test score or measure and the quality it is
believed to measure'. In other words, it measures the gap between what a test actually measures
and what it is intended to measure.
(a) the design of the test is insufficient for the intended purpose, and
(b) the test is used in a context or fashion which was not intended in the design.
The validity of assessments can be enhanced when some or all of the factors below are applied.
Evidence is gathered of transfer to new situations other than that used for assessment.
The assessor can demonstrate how evidence of competency discriminates between unlike
competencies and reinforces like competencies.
C Reliability
The reliability of a test is an estimate of the consistency of its marks; a reliable test is one where, for
example, a student will get the same mark if he or she takes the test, possibly with a different
examiner. A test must be reliable, as a test cannot be valid unless it is reliable. However, the
converse is not true: it is perfectly possible to have a reliable test which is not valid. If a test is
unreliable, then although the results for one use may actually be valid, for another they may be
invalid. Reliability is thus a measure of how much you can trust the results of a test.
Tests often have high reliability but at the expense of validity. In other words, you can get the
same result, time after time, but it does not tell you what you really want to know.
Reliability also refers to the consistency of the interpretation of evidence and the assessment
outcome. To make reliable assessments assessors must be competent in terms of their own
assessor competencies, have the relevant technical competencies or have access to a subject matter
expert who can advise the assessor on the relevant vocational competencies at least to the level
being assessed.
The criteria for the judgment of competence must be stated clearly and adhered to.
Assessment practices in the assessment and training of persons with assessment responsibilities
needs to be monitored and reviewed to ensure consistency of judgement.
As a minimum requirement people assessing against the Assessment and Workplace Training
Competency Standards must meet the qualifications for assessors as outlined in these Assessment .
Providing clear and careful instruction when an employee is requested to monitor his/her own or
others behavior (documentation).
D Washback
Any language test or piece of assessment must have positive backwash, by which the test mean
that the effect of the test on the teaching must be beneficial. This should be held in mind by the
test constructors; it is only too easy to construct a test which leads.Backwash should not been seen
in terms of bad effects but also in terms of possible good effects
A RANGE
This is one of the most widely known and perhaps most widely-used of all strategic assessment
tools. It's well suited to for quick assessments, particularly where a decision to go ahead with a
project has in essence already been made. But it can be too simplistic, and in some cases
dangerously misleading, when applied to more complex business contexts.
The SCORE checklist - Titrations extension of SWOT - is designed to resolve these concerns: it has
the simplicity of SWOT, but the versatility and flexibility of a full strategic tool:
Strengths / services / support, existing capabilities and resources, potential for synergies
Challenges / needed capabilities, 'weaknesses' indicate needed capabilities and resources
Options / opportunities and risks, opportunity is also risk, risk is also opportunity
Responses / returns / rewards, probable or emergent consequences of action or inaction
Effectiveness, efficient, reliable, elegant, appropriate, integrated
Is proven. Developed, tested, and researched for over a decade in more than 3,000 classrooms.
Addresses a school's most urgent needs. Use for accountability efforts, program planning,
professional development, and research.
Targets "trouble spots." Tabulated scores help identify areas in which improvement is needed.
1. Emotional Support: Social and emotional functioning in the classroom is an indicator of school readiness.
CLASS evaluates the dimensions of positive climate, negative climate, teacher sensitivity, and regard for
student perspectives.
2. Classroom Organization: Classrooms provide the most opportunities for learning when students are well
behaved, active and engaged. CLASS considers behavior management, productivity, and instructional
learning formats.
3. Instructional Support: Are teachers making the most of opportunities to effectively support cognitive and
language development through the curriculum? CLASS focuses on the roles of concept development, quality
of feedback, and language modeling.
Has training available!* 2-day reliability training ensures accurate use of the CLASS system.
What happens in the classroom between teacher and student is critical to how well the child learns. This
must-have tool allows educators to assess the emotional and instructional environment and target efforts for
improved academic outcomes and a brighter future for young learners.
B ITEM ANALYSIS
Item analysis is not an end in itself, no point unless you use it to revise items, and help students on basis of
information you get out of it. Item analysis refers to the process of collecting, summarizing, and using
information about individual test items especially information about pupils response to item analysis is an
important and necessary step in the preparation of good multiple choice test. Because of this fact; it is
suggested that every classroom teacher who uses multiple choice test data should know something of item
analysis. How it is and what it means.
For the teacher made test , the followings are the important uses of item analysis: determining whether an
item functions as teacher intended, feed back to students about their performance and as a basis for class
discussion, feedback about pupil difficulties, area for curriculum improvement, revising the item and
improving item writing skill. Item analysis usually provides two kinds of information on items:
1. Item facility, which helps us decide if the test items are at the right level for the target group.
2. Item discrimination, which allows us to see if the individual items are providing information on candidates
abilities consistent with that provided by the other items on the test. Item facility expresses the proportion of
the people taking the test who got a given item right. (item difficulty is sometimes used on express similar
information, in this case the proportion that got an item wrong). Where the test purpose is to make
distinctions between candidates, to spread them out in terms of their performance on the test, the items
should be neither too easy nor too difficult. If the items are too easy, then people with differing levels of
ability or knowledge will all get them right, and the differences in ablity or knowledge will not revealed by
the item. Similarly if the items are too hard, then able and less able candidates alike will get them wrong and
the item will not help us in distinguishing between them.
2. Sort the pile into rank order from top to bottom score(1 minute, 30 seconds tops)
3. If normal class of 30 students, divide class in half number in top and bottom group: toss middle
paper if odd number (put aside)
4. Take 'top' pile, count number of students who responded to each alternative
5. Subtract the number of students in lower group who got question right from number of high group
students who got it right
6. Divide the difference by number of students in upper or lower group
7. Total number who got it right
8. Divide total by total number of students
ITEM ANALYSIS is one area where even a lot of otherwise very good classroom teachers fall down:
they think they're doing a good job; they think they've doing good evaluation, but without doing
item analysis, they can't really know
part of being a professional is going beyond the illusion of doing a good job to finding out whether
you really are
but something just a lot of teachers don't know HOW to do
do it indirectly when kids argue with them...wait for complaints from students, student's parents and
maybe other teachers.
An item analysis involves many statistics that can provide useful information for improving the quality
and accuracy of multiple-choice or true/false items (questions). Some of these statistics are:
Item difficulty: the percentage of students that correctly answered the item.
Item discrimination: the relationship between how well students did on the item and their total exam
score.
Reliability coefficient: a measure of the amount of measurement error associated with a exam score.
Item-total statistics: measure the relationship of individual exam items to the overall exam score.
C. Types of Item
The Question, exercises and tasks appearing on the test are called items the kinds of items on test are:
1. The letter type of items called choice items, which includes true false item, multiple choice items, and
matching exercise.
2. Completion items, present and incomplete sentence and examine is required to supply a word or
short phrase that best complete the sentence.
3. Short answer items, in this type of item, the students usually is not free on give expression to creative
and imaginative thoughts.
4. Essay items, permit the testing of a student is ability to organize ideas and thoughts and allow for
creative verbal expressions.
C CORRELATION
The correlation is one of the most common and most useful statistics. A correlation is a single number that
describes the degree of relationship between two variables. IT is a statistical technique that can show
whether and how strongly pairs of variables are related.
In teaching English we can say that its a measure of the extend to which two sets of measurements, agree
on the way the rank or classify certain population. It comes into the measurement of validity and reliability
A standard correction for random guessing on multiple-choice examinations was examined retrospectively in
an oral and maxillofacial pathology course for second-year dental students. The correction was a weighting
formula for points awarded for correct answers, incorrect answers, and unanswered questions such that the
expected value of the increase in test score due to guessing was zero. We compared uncorrected and
corrected scores on examinations using a multiple-choice format with scores on examinations composed of
short-answer questions. The short-answer format eliminated or at least greatly reduced the potential for
guessing the correct answer.
If we view objectivity and subjectivity of evaluation along a continuum, we can represent various assessment
and scoring methods along its length.
Test items that can be evaluated objectively have one right answer (or one correct response pattern, in the
case of more complex item formats). Scorers do not need to exercise judgment in marking responses correct
or incorrect. They generally mark a test by following an answer key. In some cases, objective tests are scored
by scanning machines and computers. Objective tests are often constructed with selected-response item
formats, such as multiple-choice, matching, and true-false. An advantage to including selected-response
items in objectively scored tests is that the range of possible answers is limited to the options provided by
the test writerthe test taker cannot supply alternative, acceptable responses.
This two terms subjective and objective tend to be used very loosely. By objective I mean capable of being
marked with 100% reliability. Anything which doesnt approach its called subjective, though obviously there
are widely differing degrees of subjectivity. Traditional English language testing both MT and FL, was of
course highly subjective with a preponderance of essay type items. Objection began to no raised in the
1920s and 1930s, when research revealed very great unreliability un this kind of MT testing when comparing
results of different markers and those of a single marker on separate occasions.
When the pendulum did being to swing, it swung with a vengeance. Essay type test were rejected in many
quarters not only on grounds of unreliability, but because the creative writing type of task customarily set
were considered invalid. This development was linked to the dominance of behaviourist attitudes to FL
learning and the related views of structural linguistic on language FL learning tended to be atomized into the
learning of a long series of linguistic items of pronunciation, grammar and lexis.
A more general problem commonly occurs: a passage which is highly relevant to the comprehension needs
of a given group of students may be unusable for multiple-choice testing because it doesnt lend itself to the
production of the right number of items each with number of distracters.
The 1970s saw a return swing of the pendulum towards subjective testing. Among the manifold problems of
objective testing some of which we have touched on the most serious of all were perhaps back wash effects
and low validity, in terms of the actual abilities we wish to teach and measure. Attempts were therefore made
to rehabilitate subjective testing by making it more valid and reliable. One attempt at increased reliability
which should be dismissed as a false trail is error counting. Its drawbacks as a way of assessing continuous
writing ability may seem so obvious as to make it hardly worth mentioning.
It gives very low inter-marker reliability. Teachers vary in what they consider acceptable, and in their assiduity
in nothing errors. Often its impossible to say with certainly how many errors a defective passage contains
since several are compounded together. Again one would presumably regard the same error occurring four
times as less serious than four different errors, other things being equal. But any attempt to allow for this is
fraught with difficultly.
What methods then have proved effective in judging students performance in the necessarily subjective
areas of continuous writing and spoken interaction? Actually a considerable amount of research has been
done and a firm consensus of opinion has been built up. First if the works can be marked by two markers the
pooling of their results will obviously increase reliability.
To operate in this way we need a scale with a fairly small number of grades. Five six or seven seem workable
numbers of grades. Teachers often assume that they can make much finer distinctions than this.
In addition to such a set of descriptions we need for each point on the scale an example of a script which the
markers can agree typifies that grade.
Subjective testing of this kind then can be reliable and reasonably economical. Still more important it can be
valid and have a healthy backwash effect. Emphasis on practice in writing and speaking will result from
inclusion of direct test in these activities.
We now face perhaps the most difficult question regarding assessment exactly what is it about the learners
knowledge and command of English that we wish to assess? Broadly speaking we may distinguish two
contrasted approaches to this question with the rather forbidding names of discrete-point and integrative
assessment.
DISCRETE-POINT ASSESSMENT
Merely taking an area such as grammar for separate assessment does not constitute discrete-point
assessment. You could, for example, set piece of continuous writing and give it separate general impression
ratings for grammar and vocabulary using for each a scale from very good to very weak without testing
specific points of grammar or vocabulary.
A way of minimizing this difficulty is to have several items to test a single linguistic point. Suppose we had
three items testing subject-verb concord in the present simple tense.
Apart from the question of how to measure discrete points validly there is the more fundamental question of
why we should wish to do so .
INTEGRATIVE ASSESSMENT
Itll be clear from the foregoing section that there isnt an absolute distinction between discrete-point and
integrative assessment. We have seen that complete texts may be used in discrete-point assessment. In
functional discrete-point assessment in particular the task the students asked to perform may be very closely
related to a real life communication task.
SPEAKING
Speaking a second language is probably the most difficult skill to test in that it involves a combination of
skills that may have no correlation with each other, and which do not lend themselves to objective testing. In
addition, what can be understood is a function of the listener's background and ability as well as those of the
speaker. Another difficulty is separating the listening skill from the speaking skill. In spite of the difficulties in
testing speaking, it can be very beneficial in that it encourages the teaching of speaking in class. Reading
aloud, conversational exchanges, and tests using visual material as stimuli are common test items for testing
speaking.
Aspects of speaking that might be considered in the assessment scale are grammar, pronunciation, fluency,
content, organization, and vocabulary. Even though methods of testing speaking are not perfect, they are
worth the effort for their effects on teaching and classroom instruction. (Its understandable that testing of
speaking has been so widely neglected. Its time consuming and administratively awkward. Problems of
validity and reliability have also loomed large. But unless Speaking is tested, itll continue to be under
emphasized in teaching.
READING
The problems of multiple-choice testing of R have already been discussed. Cloze testing of R is a practical
and effective alternative for certain purpose. To make a cloze R test, you simply remove one word from the
text at regular intervals for example 8th, 10th or 12th word. The frequency of omission must not be too high or
the test becomes too much of puzzle.
How to test
Cloze Scoring
CONCLUSSION
For me assessment should occur naturally through everyday classroom interactions. Teachers and students
gather information, and subsequent analysis and interpretation allow them to adjust their teaching and
learning accordingly. It is our responsibility of teachers and schools to choose formal assessment tools that
will provide the most valid and reliable information on student learning. Assessment is done with the
student, not to the student.
As is the case with teaching and learning, assessment is a collaborative between the teacher and the student
- where both want to determine what the student knows and what might be learnt next. Therefore, a major
role for the teacher is to manage the learning culture of the classroom in order to maximize students'
motivation to engage keenly with assessment. If the student is not motivated to try with the assessment, it is
likely that the results will not really show what the student knows or can do. Such a result will not help either
the teacher or the student to plan next steps.
Teachers should always involve students in assessment decision-making. Whether informal or formal,
assessment should always involve the students in decision-making about as many aspects of the assessment
as possible. These include the timing, the design and the assessment criteria so that students are able to
properly see themselves as co-constructors of the assessment, with equal ownership of the results. Some
tools lend themselves to greater student involvement than others, depending on how they have been
designed. However, even where there is little opportunity for student input into the actual assessment
construction, students should be supported to see the assessment results as providing them with valuable
information about what they know and what they might choose to learn next.
I would like to say that assessment plays a crucial role in the education process: it determines much of the
work students undertake (possibly all in the case of the most affects their approach to learning and, it can be
argued, is an indication of which aspects of the course are valued most highly.