0% found this document useful (0 votes)

28 views

Language Testing Summary of Chapters 1 6

Uploaded by

Yeojin Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Language Testing Summary of Chapters 1 6

Uploaded by

Yeojin Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Chapter One: Preliminaries of Language Testing

1. WHY TESTING
In general, testing has the following purposes:
 Students, teachers, administrators and parents want to ascertain the degree to which those goals have
been realized
 Government and private sectors who employ the students are interested in having precise information
about students’ abilities.
 Most importantly, through testing, accurate information is obtained based on which educational
decisions are made.
Tests can benefit students in the following ways:
 Testing can create a positive attitude toward class and will motivate them in learning the subject
matter.
 Testing can help students prepare themselves and thus learn the materials.
Testing can also benefit teachers:
 Testing helps teachers to diagnose their efforts in teaching.
 Testing can also help teachers gain insight into ways to improve evaluation process.

2. TEST, MEASUREMENT, EVALUATION

Measurement is the process of quantifying the characteristics of persons according to explicit procedures
and rules.

Test is an instrument, often connoting the presentation of a set of questions to be answered, to obtain a
measure of a characteristic of a person.
 Note: What distinguishes a test from other types of measurement is that it is designed to obtain a
specific sample of behavior.

Evaluation has been defined in a variety of ways:

1) The process of delineating, obtaining, and providing useful information for judging decision
alternatives.
2) The determination of the congruence between performance and objectives.
3) A process that allows one to make a judgment about the desirability or value of a measure.

 Note: It is important to point out that we never measure or evaluate people. We measure or evaluate
characteristics or properties of people.

3. NORM-REFERENCED TESTS vs. CRITERION-REFERENCED TESTS

If we compare the score of a testee to the scores of other testees, this would be norm referencing. However,
if we interpret a testee’s performance by comparing it to some specific criterion without concern for how
other testees performed, this would be criterion referencing. Usually, a testee passes the test only when he
has given the right answer to all or a specific number of test items.
Characteristics NRT CRT
Relative (A student’s Absolute (A student’s performance
performance is compared to is compared only to the amount, or
Type of Interpretation
those of all other students in percentage, of material learned.)
percentile terms)
To measure general language To measure specific objectives-
Type of Measurement
abilities or proficiencies based language points
Normal distribution of scores Varies; often non-normal. Students
around the mean who know the material should
score 100%
Distribution of Scores

Spread students out along a Assess the amount of material

Purpose of Testing continuum of general abilities known or learned by each student
or proficiencies
A few relatively long subtests A series of short, well-defined
Test Structure
with a variety of item contents subtests with similar item contents
Students have little or no ideas Students know exactly what
Knowledge of Questions of what content to expect in content to expect in test items
test items
When a great number of When a test item is missed by a
testees miss an item, it is great number of testees, the
Missed Items
eliminated from the test instructional materials are revised
or additional work is given

4. TEACHER-MADE TESTS VS. STANDARDIZED TESTS

A teacher-made test is a small scale, classroom test which is generally prepared, administered and scored by
one teacher. On the other hand, standardized tests are commercially prepared by skilled test-makers and
measurement experts. They provide methods of obtaining samples of behavior under uniform procedures.

Characteristics Teacher-Made Test Standardized Test

Type of Interpretation Criterion-referencing Norm-referencing
Direction for Specific, culture-free direction for every
Usually no uniform directions
Administration and testee to understand; standardized
specified
Scoring administration and scoring procedures
Both content and sampling are Content determined by curriculum and
Sampling of Content determined by classroom subject-matter experts; involves
teacher extensive investigations of existing
syllabi, textbooks, and programs;
sampling of content done systematically
May be hurried and haphazard;
Uses meticulous construction procedures
often no test blueprints, item
that include constructing objectives and
Construction tryouts, item analysis or
test blueprints, employing item tryouts,
revision; quality of test may be
item analysis, and item revisions
quite poor
Only local classroom norms are
In addition to local norms, standardized
available, i.e. they are
Norms tests typically make available national
determined by the school or a
schools district norms
department
Unknown; usually lower than
High; written by specialists, pretested
Quality of Items standardized tests due to
and selected on the basis of effectiveness
limited time and skill of
teacher
Unknown; usually high if
Reliability High
carefully constructed

5. WASHBACK
A facet of consequential validity is washback. Washback generally refers to the effects the tests have on
instruction in terms of how students prepare for the test.
 Note: ‘Cram’ courses and ‘teaching to the test’ are examples of such washback.
In classroom assessment it refers to the information that washes back to students in the form of useful
diagnoses of strengths and weaknesses.
 Harmful washback is said to occur the test content and testing techniques are at variance with the
objectives of the course.
 Beneficial washback is said to result when a testing procedure encourages good teaching practice is
introduced.

6. TEST BIAS
A test or item can be considered to be biased if one particular section of the candidate population is
advantaged or disadvantaged by some feature of the test or item which is not relevant to what is being
measured.
 Fairness can be defined as the degree to which a test treats every student the same, or the degree to
which it is impartial. Equitable treatment in terms of testing conditions, access to practice materials,
performance feedback, retest opportunities, and other features of test administration, including
providing reasonable accommodation for test takers with disabilities when appropriate, are important
aspects of fairness under this perspective.

7. AUTHENTICITY
It is the degree of correspondence of the characteristics of a given language test task to the features of target
language task.
 The language in the test is as natural as possible.
 Items are contextualized rather than isolated.
 Topics are meaningful for the learner.
 Some thematic organization to items is provided, such as through a story line or episode.
 Tasks represent, or closely approximate, real-world tasks.
Chapter Two: Language Test Functions

1. TWO MAJOR FUNCTIONS OF LANGUAGE TESTS

1.1. Evaluation of attainment tests
Attainment evaluation tests measure to what extent examinees have learned the intended skill, performance,
knowledge, etc. in a given area.

1.1.1. Achievement tests

Such tests are related directly to classroom lessons, units, or even a total curriculum.
 General achievement tests are (standardized) tests which deal with a body of knowledge. Constructors
of such tests rarely teach students being tested. One example is a test to measure students’
achievement in the first year of high school.
The content of a final achievement test may be based directly on a detailed course syllabus or on the
books and other materials used. This has been referred to as the syllabus-content approach.
 Since the test only contains what it is thought that the students have actually encountered, and
thus can be considered a fair test.
 If the syllabus is badly designed, or the books and other materials are badly chosen, the results
of a test can be very misleading.
The alternative approach is to base the test directly on the objectives of the course.
 It makes it possible for performance on the test to show just how far students have achieved
those objectives. This in turn puts pressure on those responsible for the syllabus and for the
selection of books and materials to ensure that these are consistent with the course objectives.
Tests based on objectives work against the perpetuation of poor teaching practice, something
which course-content-based tests fail to do.
 It is unfair. If the course content does not fit well with objectives, examinees will be expected
to do things for which they have not been prepared
 Diagnostic tests measure the degree of students’ achievement on a particular subject/topic and
specifically the detailed elements of an instructional topic. They show weaknesses and strengths of
students so that teachers can modify the instructional procedure and remedial action can be taken if
the number of students is big.

Criterion-Referenced
Test Qualities
Achievement Diagnostic
Details of Information Specific Very specific
Terminal objectives of course or
Focus Enabling objectives of courses
program
To determine the degree of
To inform students and teachers of
Purpose of Decision learning for advancement or
objectives needing more work
graduation
When Administered End of courses Middle of courses
1.1.2. Knowledge tests
These tests are used when the medium of instruction is a language other than examinees’ mother tongue.

1.1.3. Proficiency tests

If your aim in a test is to tap the overall language ability, i.e. global competence, then you are, in a
conventional terminology, testing proficiency. A proficiency test in not limited to any one course curriculum,
or single skill in the language; rather it tests overall ability. More precisely, these tests measure:
degree of his capability to demonstrate his knowledge in language use;
and degree of capability in language components

 Note: A key issue in testing proficiency is the difficulty that centers on the complexity of defining the
term ‘proficiency’ (construct of language). This difficulty renders the construct of proficiency tests
difficult.

1.2. Prognostic tests

Prognostic tests are not related to a particular course of instruction. Their objective is to predict and make
decisions about future success and actions of examinee based on present capabilities.

1.2.1. Placement tests

Placement tests are used to determine the most appropriate channel of education for examinees. The purpose
of placement tests is merely to measure the capabilities of an applicant in perusing a certain path of language
learning and to place them into an appropriate level or section of a language curriculum or school.
 Note: There is no pass or fail in placement tests.
 Note: Teachers benefit from placement decisions because they end up with classes that have students
with relatively homogeneous ability levels.
 Note: If there is a mismatch between the placement test and what is taught in a program, the danger is
that the groupings of similar ability levels will simply not occur.

1.2.2. Selection tests

The purpose of selection tests is to provide information upon which the examinees’ acceptance or non-
acceptance into a particular program can be determined.
 Note: In contrast to placement tests, testees pass or fail in selection tests.
 Note: As many candidates obtain the criterion, due to administrative restrictions, these tests turn into
competition tests.

1.2.3. Aptitude tests

These tests are used to predict applicants’ success in achieving certain objectives in the future. Such tests are
designed to measure a person’s capability or general ability to learn a foreign language and to be successful
in that undertaking.
Language aptitude tests are designed to measure a person’s capability or general ability to learn a foreign
language a priori and to be successful in that undertaking. Language aptitude tests were ostensibly designed
to apply to the classroom learning of any language. These tests do not tell us who will succeed or fail in
learning a foreign language. They attempt to predict the rate at which certain students will be able to acquire
a language. Language aptitude tests usually consist of several different tests which measure the following
cognitive abilities.
 Sound coding ability (or phonetic coding): the ability to identify and remember new auditory
phonetic material in such a way that this material can be recognized, identified and remembered over a
period longer than a few seconds. This is a rather unique auditory component of foreign language
aptitude.
 Grammatical coding ability: the ability to identify the grammatical functions of different parts of
sentences in a variety of contexts.
 Memorization (or rote learning ability): the ability to remember words, rules, etc. in a new language.
Rote learning ability is a kind of general memory, but individuals seem to differ in their ability to
apply their memory to the foreign language situation.
 Inductive learning ability: the ability to work out linguistic forms, rules, patterns, and meanings from
new linguistic content with a minimum of supervision or guidance.
Chapter Three: Forms of Language Test

The form of a test refers to its physical appearance.

1. STRUCTURE OF AN ITEM
An item, the smallest unit of a test, consists of two parts: the stem and the response.

1) How many functions do language tests serve? (stem)

two key
alternatives three
four distractor
five

2. CLASSIFICATION OF ITEM FORMS

2.1. Subjective vs. objective items
Subjective items are those in which the scorer must make an opinionated judgment. Objective items are
those in which the correctness of the test taker’s response is determined by predetermined/objective criteria.
 Note: You should know that objectivity and subjectivity refers to the way a test item is scored.
The most beautiful season is ……
1) spring 2) summer 3) fall 4) winter
There are …… seasons in a year.
1) four 2) three 3) two 4) five

2.2. Essay-Type vs. multiple-choice items

Essay-type items are those in which the examinee is required to produce language elements. Multiple-
choice items are those in which examinee is required to select the correct response from among given
alternatives.

2.3. Suppletion vs. recognition items

Suppletion (or production; supply) items require the examinee to supply the missing part(s) of the sentence
or complete an incomplete sentence. Recognition items require examinee to select an answer from a list of
possibilities.

3. TYPES OF ITEMS
3.1. Receptive response items
Multiple-choice (MC) items are undoubtedly one of the most widely used types of items in objective tests.
MC items have the following advantages.
 Because of the highly structured nature of these items, the test writer can get directly at many of the
specific skills and learning he wishes to measure. This in turn leads to their diagnostic function.
 The test writer can include a large number of different tasks in the testing session. Thus they have
practicality.
 Scoring can be done quickly and involves no judgments as to degrees of correctness. Thus they have
reliability.
However, these items are disadvantageous on the grounds they:
 are passive, i.e. such items test only recognition knowledge but not language communication,
 may have harmful washback,
 expose students to errors,
 are de-contextualized,
 are one of the most difficult and time-consuming types of items to construct,
 are simpler to answer than subjective tests,
 encourage guessing.
There is a way to compensate for students’ guessing on tests. That is, there is a mathematical way to adjust
or correct for guessing. This statistical procedure properly named guessing correction formula is:
Wrong
Guessing Correction Formula  Right 
n-1

where n refers to the number of options.

 Example: In a test which consisted of 80 items with four options, a student answered 50 items
correctly and gave 30 wrong answers. After applying guessing correction formula his score would
be
1) 45 2) 35 3) 40 4) 30
Wrong 30
Score  Right   50 
 50  10  40
n1 41

3.2. Personal response items (or alternative assessment options)

In recent years, language teachers have stepped up efforts to develop non-test assessment options. Such
innovations are referred to as personal response items that encourage the students to produce responses that
hold personal meaning.

3.2.1. Self-assessment
Self-assessment is defined as any items wherein students are asked to rate their own knowledge, skills, or
performances. Thus, self-assessments provide the teacher with some idea of how the students view their own
language abilities and development.
 speed,
 direct involvement of students → increased motivation
 the encouragement of autonomy,
 subjectivity
There are at least two categories of self-assessment:
 Direct assessment of a specific performance: a student typically monitors himself in either oral or
written production and renders some kind of evaluation of performance.
 Indirect assessment of general competence: this type of assessment targets large slices of time with
a view to rendering an evaluation of general ability, as opposed to one specific, relatively time-
constrained performance.
3.2.2. Journal
Journals can range from language learning logs, to grammar discussions, to responses to readings, to
attitudes and feelings about oneself. One of the principal objectives in a student’s dialogue journal is to carry
on a conversation with the teacher. Through dialogue journals, teachers can become better acquainted with
their students, in terms of both their learning progress and their affective states, and thus become better
equipped to meet students’ individual needs.
 Because journal writing is a dialogue between students and teacher, journals afford a unique
opportunity for a teacher to offer various kinds of feedback to learners.
 Journals are too free to form to be assessed accurately.
 Certain critics have expressed ethical concerns.
CHAPTER FOUR: BASIC STATISTICS IN
LANGUAGE TESTING

1. STATISTICS
Statistics involves collecting numerical information called data, analyzing them, and making
meaningful decisions on the basis of the outcome of the analyses. Statistics is of two types:
descriptive and inferential.

2. TYPES OF DATA
2.1. Nominal Data
As the name implies names an attribute or category and classifies the data according to presence
or absence of the attribute, e.g. ‘gender,’ ‘nationality,’ ‘native language,’ etc.

2.2. Ordinal Data

Like the nominal scale, an ordinal scale names a group of observations, but, as its label implies,
an ordinal scale also orders, or ranks, the data. For example, the degree of happiness is shown by
very unhappy – unhappy – happy – very happy.

2.3. Interval Data

Interval data represent the ordering of a named group of data, but they provide additional
information. Interval data also show the (more) precise distances between the points in the
rankings, e.g. test scores.

2.4. Ratio Data

Ratio data are similar to interval data except that they have absolute zero. As a result of this new
characteristic, in ratio data we can say ‘this point is two time as high as that point,’

Shows Gives Equal Absolute

categories ranking distances zero
Nominal
Ordinal
Interval
Ratio

3. TABULATION OF DATA
Suppose that the following table shows the reading scores of students in an achievement test.

Student a b c d f g h i j k l
Score 93 95 92 95 100 96 92 96 92 95 92
3.1. Rank Order
The first step is to arrange the scores in the order of size, usually from highest to lowest. If two
testees received the same score, we should divide the sum of their rank.

Score Rank order

100 1
96 2.5
96 2.5
95 5
95 5
95 5
93 7
92 9.5
92 9.5
92 9.5
92 9.5

The next table shows the same scores. The remaining terms used in tabulation of data will be
presented according to this tale.

Frequency Relative Cumulative

Score Percentage Percentile
(f) Frequency Frequency (F)
100 1 0.09 0.09 × 100= 9 11 100
96 2 0.18 0.18 × 100= 18 10 90
95 3 0.27 0.27 × 100= 27 8 72
93 1 0.09 0.09 × 100= 9 5 45
92 4 0.36 0.36 × 100= 36 4 36
Total = 11

3.2. The Frequency Distribution

(Simple/Absolute) frequency (f), also called simple or absolute frequency, is the number of times
a score occurs.

3.3. Relative Frequency

Relative frequency refers to the simple frequency of each score divided by the total number of
scores.

3.4. Percentage
When relative frequency index is multiplied by 100, the result is called percentage.

3.5. Cumulative Frequency

Cumulative frequency (F) indicates the standing of any particular score in a group of scores. This
index shows how many students received a particular score and less than that.
3.6. Percentile
When cumulative frequency index is divided by the total number of learners multiplied by 100,
the result is percentile. Percentile rank shows what percentage of students received a particular
score or below that.

4. DESCRIPTIVE STATISTICS
4.1. Measures of Central Tendency
4.1.1. Mode
The most easily obtained measure of central tendency is the mode. The mode is the score that
occurs most frequently in a set of scores, e.g. 88 is mode in
80, 81, 81, 85, 88, 88, 88, 93, 94, 94

Note: When all of the scores in a group occur with the same frequency, it is customary to
say that the group of sores has ‘no mode’ as in
83, 83, 83, 88, 88, 88, 90, 90, 90, 95, 95, 95

Note: When two adjacent scores have the same frequency, the mode is the average of the
two adjacent scores, so 86.5 is mode in the following set
80, 82, 84, 85, 85, 88, 88, 90, 94

Note: When two non-adjacent scores have the same frequency, the distribution is bi-
modal.
82, 82, 85, 85, 85, 87, 88, 88, 88, 90, 94

4.1.2. Median
The median (Md) is the score at the 50th percentile in a group of scores, e.g. 85 is median in
81, 81, 82, 84, 85, 86, 86, 88, 89

Note: If the data are an even number of scores, the median is the point halfway between
the central values when the scores are ranked, e.g. 85 in the following set
81, 81, 82, 84, 86, 86, 88, 90

4.1.3. Mean
Mean is probably the single most often reported indicator of central tendency. It is the same as

∑𝑋
arithmetic average:
𝑋 =
𝑁
Note: If we were to find the deviation of scores from the mean of the set, the sum would
be exactly zero.
Note: The limitation of means is that it is seriously sensitive to extreme scores.

4.1.4. Mid Point

The midpoint in a set of scores is that point halfway between the highest score and the lowest
score on the test. The formula for calculating the midpoint is:
𝐻𝑖𝑔ℎ +
𝑀𝑖𝑑 𝑝𝑜𝑖𝑛𝑡 =
𝐿𝑜𝑤
2

4.2. Measures of Variability

4.2.1. Range
Range is the simplest measure of dispersion and is defined as the difference between the number
of points between the highest score on a measure and the lowest score, e.g. range is 8 in the set
92, 95, 95, 97, 98, 98, 100

Note: Range changes drastically with the magnitude of extreme scores (or outliers).

4.2.2. Standard Deviation (SD)

The most frequently used measure of variability is the standard deviation. SD is the average
difference of all scores from the mean.

3, 5, 5, 8, 9
1 2 3 4 5 6 7 8 9 10 11
Mean: 6

Mean of the arrorws length in the first ∑X 1+1+3+3+2

= =2
figure = N 5

1, 4, 5, 10, 10
1 2 3 4 5 6 7 8 9 10 11
Mean: 6

Mean of the arrorws length in the second

figure = ∑X 5+2+1+4+4
= = 3.2
N 5

Therefore, we can say the scores in the second figure in average deviate more from their mean

∑(𝑋 − 𝑋̅ ) 2
than do the scores in the first set, i.e. they

𝑆𝐷 = √
𝑁
4.2.3. Variance
To find variance, you simply stop short of the last step in calculating the standard deviation. You
do not need to bother with finding the square root.

𝑆𝐷 = ̅ )2 ∑(𝑋−𝑋̅ ) 2
√
∑(𝑋−𝑋
Variance =
𝑁

�
�
You will frequently find variance showed as 𝑆2.

Variance = S2 (standard deviation squared)  Standard deviation = √variance

5. NORMAL DISTRIBUTION
A normal distribution means that most of the scores cluster around the mean of the distribution,
and the number of scores gradually decreases on either side of the mean. The resulting figure is a
symmetrical bell-shaped curve.

 Example: In vocabulary test, mean and standard deviation are calculated to be 82

and 4, respectively. In this test, 68% of students fall between ---------

 Example: The mean and SD of a set of scores are 45 and 5. A student who
obtained 55 has a percentile rank of---------.

30 35 4045 50 55 60
 Example: In a test mean and standard deviation are 32 and 3. A student is ---------
probable to obtain a score higher than 29.

2326 29 32338 41

6. DERIVED SCORES
Raw scores are obtained simply by counting the number of right answers. Raw scores from two
different tests are not comparable. To solve this problem, we could convert the raw scores into
percentile or standard scores.
Percentile scores indicate how a given student’s score relates to the test scores of the
entire group of students.
Standardized scores are obtained by taking into account the mean and SD of any given
set of scores. Standard scores represent a student’s score in relation to how far the score
varies from the test mean in terms of standard deviation units.

6.1. z score
The ‘z score’ just tells you how many standard deviations above or below the mean any score or
observation might be:
𝑋−
𝑧=
𝑋
𝑆𝐷

 Example: In a set of scores where, mean and SD are 41 and 10, what is the z score of

51 −
a student who obtained 51?
𝑋
𝑧= 41 = +1
− =
𝑋 10
𝑆
𝐷

𝑇 𝑠𝑐𝑜𝑟𝑒 = 10𝑧 + 50
6.2. T score
The formula for calculating T score is:
Therefore, the T score of the student in the previous example would be 60.

8. CORRELATION
Correlation analysis refers to a family of statistical analyses that determines the degree of
relationship between two sets of numbers. The numerical value representing the degree to which
two variables are related (co-vary, or vary together) is called correlation coefficient. Correlation
is the go-togetherness of two sets of scores. Let’s take the following hypothetical set of scores
and then represent them on a scatter plot.
Positive correlation:
Students Test A Test B
Dean 2 3
Randy 3 5
Joey 4 7
Jeanne 5 9
Kimi 6 11
Shenan 7 13

This is a linear perfect positive correlation.

The two sets of scores may not necessarily be ordered in exactly the same way. Here is another
set of hypothetical scores with the representing scatergram.

Students Test A Test B

Dean 2 2
Randy 4 3
Joey 6 7
Jeanne 8 7
Kimi 9 10
Shenan 12 11
Negative correlation:
Students Days of absence English scores
Dean 8 20
Randy 7 30
Joey 6 40
Jeanne 5 50
Kimi 4 60
Shenan 3 70

This is a linear (not perfect) positive correlation.

The two sets of scores may not necessarily be ordered in exactly the same way. Here is another
set of hypothetical scores with the representing scatergram.

Students Days of absence English test

Dean 2 90
Randy 3 60
Joey 5 40
Jeanne 8 40
Kimi 8 30
Shenan 9 10
Note: If high scores in one set are associated with low scores on the other set, there is a
negative relationship between the two sets of scores.
Note: If high scores in one set are associated with high scores on the other set, there is a
positive relationship between the two sets of scores.

Zero Correlation:

Curvi-linear:

9. CORRELATIONAL FORMULAS
Correlational values are named after their strength:
 Both ± 1 are considered perfect correlations.
 – 0.4 ≤ r ≤ + 0.4 are considered weak correlations.
 –0.8 ≥ r > –1 and 0.8 ≤ r < 1 are considered strong correlations.

Note: The sign (– or +) of the correlation coefficient doesn’t have any effect on the degree
of association, only on the direction of the association.

9.1. Pearson Product Moment Correlation

Karl Pearson developed a correlation which demonstrates the strength of a relationship between
two sets of continuous scale data:

𝑁(∑ 𝑋𝑌) − (∑ 𝑋)(∑ 𝑌)

𝑟=
√[𝑁(∑ 𝑋2) − (∑ 𝑋)2][𝑁(∑ 𝑌2) − (∑ 𝑌)2]

9.2. Rank Order Correlation

The Spearman rho (𝜌) correlation coefficient is used only when the data exist in ranked (ordinal)

6 ∑ 𝐷2
form:
𝜌 =1−
𝑁 (𝑁2 − 1)

9.3. Point Biserial Correlation

It is used when one set of data is continuous and the other set is nominal. The nominal variable is
dichotomous which can take only the values of 1 or 0. The correlation between each single test
item (nominal scale) and the total test (continuous scale) can be computed by this formula:
𝑋𝑝 − 𝑋𝑞
𝑟𝑝𝑏 = √𝑝 ∙ 𝑞
𝑆
𝑥

Note: Correlation doesn’t show causality between two variables. It shows relative
positions in one variable are associated with relative positions in the other variable.
CHAPTER FIVE: TEST CONSTRUCTION

1. DETERMINING FUNCTION AND FORM OF THE TEST

In order to determine the function and form of a test, three factors should be taken into account:
(a) characteristics of the examinees
(b) specific purpose of the test
(c) scope of the test

2. PLANNING (Specifying the test content)

It is important for the tester to decide on the area of knowledge to be measured. In order to determine
the content of a test table of specification should be prepared.
 The main purpose of table of specification is to assure the test developer that the test includes a
representative sample of the materials covered in a particular course.

Instructional objectives/
Content Number of items

Reported speech 3
Subjunctive 2
Dangling structure 5

3. PREPARING ITEMS
Last year, incoming students ……… on the first day of school.
1) enrolled 2) will enroll 3) will enrolled 4) should enroll
Have you heard the planning committee’s ……… for solving the city’s traffic problems?
1) purpose 2) propose 3) design 4) theory

4. REVIEWING
It is highly recommended that the produced test be reviewed by an outsider to know his subjective
ideas and evaluation of the test.

5. PRETESTING
Pretesting is defined as administering the newly developed test to a group of examinees with
characteristics similar to those of the target group. The purpose of pre-testing is to determine,
objectively, the characteristics of the individual items and the characteristics of the items altogether.

Item Facility (IF)

Item facility refers to the easiness of an item:
∑𝐶
𝐼𝐹
= 𝑁
𝐼𝐹 = item facility
∑ 𝐶 = sum of the correct responses
𝑁 = total number of responses

Example: In a test 20 testees answered an item correctly. If 50 students took the exam, what would be
item facility?

∑ 20
𝐼𝐹 = 𝐶 = = 0.4
50
𝑁

Example: A test was given to 75 examinees: 50 answered correctly, 10 answered wrongly, and 15 left
the item blank. What is FV?

∑ 50
𝐼𝐹 = 𝐶 = = 0.83
60
𝑁

Note: The range of IF index is 0 ≤ 𝐼𝐹 ≤ 1

Note: The acceptable range of IF index is 0.37 ≤ 𝐼𝐹 ≤ 0.63


Note: The ideal index for IF is 𝐼𝐹𝑖𝑑𝑒𝑎𝑙 = 0.5



 Note: By determining item facility, the test constructor can easily find out item difficulty. Item

Item Difficulty (ID) = 1 − 𝐼𝐹

difficulty can be calculated by using the following formula:

Lower
Items
Item Discrimination (ID)
Subjects 1 2 3 4 5 6 7 8 9 10 Total
Item discrimination refers to the extent to which a particular item discriminates more knowledgeable
Shenan 1 0 1 1 1 1 1 1 1 1 9
Robert 1 0 1 1 1 1 1 0 1 1 8
Millie 1 0 1 1 1 1 1 0 1 0 7
Higher
Kimi 1 0 0 1 0 1 0 1 1 1 7
Jeanne 1 0 1 1 0 1 0 0 0 1 5
examinees from less knowledgeable ones. To compute the item discrimination, the following formula
Corky 0 1 0 0 1 0 0 1 0 1 4
should be used:
Dean 0 1 0 0 0 0 1 1 1 0 4
Bill 0 1 1 0 1 1 0 0 0 0 4
Randy 0 1 0 0 0 0 0 1 0 0 2
Mitsuko 0 1 0 0 0 0 0 0 0 0 1
∑ 𝐶ℎ𝑖𝑔ℎ − ∑
𝐼𝐷 =
𝐶𝑙𝑜𝑤 1⁄
2
𝑁
∑ 𝐶ℎ𝑖𝑔ℎ= the number of correct responses to a particular item by the examinees in the high group
∑ 𝐶𝑙𝑜𝑤 = the number of correct responses to a particular item by the examinees in the low group
1⁄ 𝑁 = the total number of responses divided by 2
2

Example: If in a class with 50 students, 20 students in high group and 10 students in the low group
answered an item correctly, then ID equals ---------

∑ 𝐶ℎ𝑖𝑔ℎ − ∑ 𝐶𝑙𝑜𝑤
20 − 10 10
𝐼𝐷 = 1⁄ 𝑁 = = +0.4
1⁄ 2 (50) 25
2
=

Example: All the 30 testees in the high group and one-third of the students in the low group answered
item number one correctly. In case there were 100 items in the test, what are IF and ID?

∑ 𝐶ℎ𝑖𝑔ℎ−∑ 𝐶𝑙𝑜𝑤 20 ∑𝐶 40
𝐼𝐷 =
30−10 = = = +0.66 𝐼𝐹 = = = 0.66
1⁄ 𝑁 1⁄ (60) 30 𝑁 60
2 2

 Note: The range of ID index is −1 ≤ 𝐼𝐷 ≤ +1

 Note: The acceptable range of item discrimination is 𝐼𝐷 ≥ +0.4
 Note: If all students answered a question correctly (IF = 1), it would mean that the item is not
only too easy but also non-discriminating (ID = 0). Similarly, if none of the students answered
an item (IF = 0), it would mean that the item is not only too difficult but also non-
discriminating (ID = 0).

Choice Distribution (CD)

Choice distribution refers to:
(1) The frequency with which alternatives are assigned as the correct answer.
(2) The frequency with which alternatives are selected by the examinees in a multiple-choice item.
Accordingly, there are three types of distractors:
Functioning: a distracter which attracts more low-scoring students, who have not mastered the
subject
Non-functioning: a distracter which attracted no one, not even the poorest examinees
Mal-functioning: a distracter which attracted more high than low scorers, it is a mal-
functioning choice

Example: (Choice C is the answer)

Choice Highs Lows Total

A 3 8 11
B 7 3 10
C 14 5 19
D 0 0 0
(20) (20) (40)
∑C
IF = = 0.47
19
=
N
40
∑ 𝐶ℎ𝑖𝑔ℎ − ∑ 𝐶𝑙𝑜𝑤
= 14 − 5 = 0.45
ID = 1⁄ 𝑁 1⁄ (40)
2 2

6. VALIDATION
Through validation, which is the last step in test construction process, validity as a characteristic of a
test as a total unit is determined.
1) RELIABILITY
On a ‘reliable’ test, one’s score on its various administrations would not differ greatly. That is,
one’s score would be quite consistent. The notion of consistency of one’s score with respect to
one’s average score over repeated administrations is the central concept of reliability.

2) CLASSICAL TRUE SCORE THEORY (CTS)

CTS states that an observed score an examinee obtains on a test comprises two factors or
components: a true score and an error score. If the observed score is represented by X, the true
score by T and the error score by E, the relationship between the observed and true score can be
illustrated as follows:

T=X
T>XT<X
True score Error score Observed score

𝐓 + 𝐄 = 𝐗
It’s important to keep in mind that we observe the X score – we never actually see the true (T) or
error (E) scores. According to CTS, reliability or unreliability is explained as follows. A measure
is considered reliable if it would give us the same result over and over again (assuming that what
we are measuring isn't changing!).
Now, we don’t speak of the reliability of a measure for an individual – reliability is a
characteristic of a measure that’s taken ‘across’ individuals. Therefore, we can say, the
performance of students on any test will tend to vary from each other, but their performances can
vary for a variety of reasons. Generally, the sources of variance in a set of scores fall into two
general sources of variance: (a) meaningful variance: those creating variance related to the
purposes of the test or subject matter area being tested, and (b) error variance: those generating
variance due to other extraneous sources. Meaningful variation, which would be predictable, is
called systematic variation and contributes to reliability. Error variation which may not be
predictable is called unsystematic variation. A list of issues which are the potential sources of
error variance is provided in the following table.

The relationship between true, error and observed scores which was stated by a simple equation
has a parallel equation at the level of the variance of a measure. That is, across a set of scores, we
assume that:

Vt Ve Vx

1
Reliability is expressed as the ratio of the variance of true scores to the variance of observed
scores. Notationally, this relationship is presented as:

Vt
r=
Vx

3) STANDARD ERROR OF MEASUREMENT

The formula for calculating SEM is relatively simple:
𝑆𝐸𝑀 = 𝑆𝑥√1 − 𝑟

𝑆𝑥 is the standard deviation of the test

𝑟 is reliability of the test
where

Example
If the standard deviation of a test were 15 and its reliability were estimated as 0.84,
then what would be standard error of measurement?
1) 1.5 2) 4 3) 3.5 4) 6

𝑆𝐸𝑀 = 𝑆𝑥√1 − 𝑟 = 15 × √1 − 0.84 = 15 × √0.16 = 15 × 0.4 = 6

 It can be inferred from the last example that there is a negative relationship between
standard error of measurement and reliability. When there is no measurement error
reliability equals +1.

Conceptually this statistic is used to determine a band around a student’s score within which that
student’s score would probably fall. Using a normal distribution, we have:

2
Example
If the SEM of a set of scores is 2.5 we can be sure that a student’s true score who
obtained 15 would fluctuate 68% of times between ---------
1) 12.5 and 17.5 2) 15 and 17.5 3) 10 and 20 4) 10.5 and 15

Since the item asks for 68% of the times, the shaded area covering between 12.5 and 17.5
is the answer.

4) APPROACHES TO ESTIMATING RELIABILITY

4.1) Internal Consistency
Internal consistency reliability uses information internal available in one administration of a
single test. The main idea behind the internal consistency method is that all the items in a test
attempt to measure elements of a single trait, i.e. there is an internal homogeneity among the
items.
a) test scores are unidimensional, which means that the parts or items of a given test all
measure the same, single ability, i.e. items comprising a test are homogeneous. For
example, grammatical points, vocabulary, reading and listening comprehension, are all
subparts of the trait called language ability;
b) the items or parts of a test are locally independent. That is, we assume that an

3
individual’s response to a given test item does not depend upon how he responds to other
items that are of equal difficulty, i.e. items comprising a test are independent. This is a
praised characteristic of multiple-choice items.

4.3.1) Split-half Methods

In this method, when a single test is administered to a group of examinees, the test is split, or
divided, into two equal halves.

Spearman Brown estimate

The correlation between the two halves is an estimate of the test score reliability. Since the
length of the test is divided the test into two halves, to estimate the reliability of the full test, the
resulting correlation between the two halves is plugged into the Spearman-Brown prophecy

2(𝑟ℎ𝑎𝑙𝑓)
formula:
𝑟𝑡𝑜𝑡𝑎𝑙 =
1 + ℎ𝑎𝑙𝑓)

𝑟𝑡𝑜𝑡𝑎𝑙 is the reliability of the full-length test

𝑟ℎ𝑎𝑙𝑓 is the reliability of half of the test
where:

Example
The reliability of half of a grammar test is calculated to be 0.35. By applying the
Spearman Brown’s prophecy formula, the total reliability would be ---------
1) 0.51 2) 0.63 3) 0.45 4) 0.38

2𝑟ℎ𝑎𝑙𝑓 2 × .35 0.7

𝑟𝑡𝑜𝑡𝑎𝑙 = = = =
1 0.51 1 + .35
ℎ𝑎𝑙𝑓
1.35

4.3.2) Item variance methods

KR-21 Method
This formula which is sometimes called rational equivalence is the easiest and most frequently

𝑋̅ (𝐾 −
used method of internal consistency estimates.
] [1 𝑋̅ )
�
(𝐾𝑅 − 21)𝑟 = [ − ]
�
𝐾−1 𝐾𝑉
where: 𝐾 is the number of the items in a

𝑋̅ is the mean score

test

𝑉 is the variance

 KR-21 always provides an underestimate index of reliability.

4
Example
If a test with 30 items has a variance and mean of 10 and 20, then reliability of the
test would be---------.
1) 0.44 2) 0.34 3) 0.29 4) 0.52
𝐾 𝑋̅ (𝐾 − 𝑋̅ )
30 20 × (30 − 20) 2 30 1
30
𝑟=( ) (1 )= ) × (1 )=( ) × (1 − ) = ×
− ( −
𝐾−1 𝐾𝑉 30 × 10 3 29 3

29 29
= 0.34
5

Qualities of A Good Test
100% (1)
Qualities of A Good Test
24 pages
Chapter 1 - Assessment Concepts and Issues
No ratings yet
Chapter 1 - Assessment Concepts and Issues
21 pages
1ab - Basics of Language Testing
No ratings yet
1ab - Basics of Language Testing
4 pages
Lecture 3 handout Testing - Assessment
No ratings yet
Lecture 3 handout Testing - Assessment
3 pages
1ab - Basics of Language Testing
No ratings yet
1ab - Basics of Language Testing
4 pages
Educ 18 LESSON 3
No ratings yet
Educ 18 LESSON 3
9 pages
Assessment and Evaluation in Education- teacher note forthe midterm
No ratings yet
Assessment and Evaluation in Education- teacher note forthe midterm
8 pages
On Thi
No ratings yet
On Thi
43 pages
Principles of Language Testing
No ratings yet
Principles of Language Testing
48 pages
Principles of Teaching D Brown
No ratings yet
Principles of Teaching D Brown
9 pages
Lecture3 - TESTING AND EVALUATION PDF
No ratings yet
Lecture3 - TESTING AND EVALUATION PDF
45 pages
Abdulwahab 19
No ratings yet
Abdulwahab 19
68 pages
Đề cương KTDG
No ratings yet
Đề cương KTDG
13 pages
Standardized & Non Standarized Tests
No ratings yet
Standardized & Non Standarized Tests
24 pages
Language Assessments - Teacher Note
No ratings yet
Language Assessments - Teacher Note
12 pages
PPT_LT_MEETING_5_05
No ratings yet
PPT_LT_MEETING_5_05
18 pages
Introduction To Assessment
No ratings yet
Introduction To Assessment
49 pages
EVALUATION
No ratings yet
EVALUATION
11 pages
Assessment of Learning
No ratings yet
Assessment of Learning
23 pages
Testing For Language Teachers
100% (2)
Testing For Language Teachers
87 pages
Language Testing. Oxford. Oxford University Press)
No ratings yet
Language Testing. Oxford. Oxford University Press)
6 pages
TSL3123 Language Assessment
No ratings yet
TSL3123 Language Assessment
11 pages
اختبارات لغوية
No ratings yet
اختبارات لغوية
19 pages
El 115 FT
No ratings yet
El 115 FT
8 pages
Testing and Evaluation
100% (3)
Testing and Evaluation
11 pages
Measurement Assessment Tes T
No ratings yet
Measurement Assessment Tes T
3 pages
worksheet 5 thầy Sinh
No ratings yet
worksheet 5 thầy Sinh
12 pages
Measurement, Testing, Assessment and Evaluation
No ratings yet
Measurement, Testing, Assessment and Evaluation
27 pages
Language Testing and Assessment, 2018-2019
No ratings yet
Language Testing and Assessment, 2018-2019
79 pages
Assessment in Learning: Prepared By: Sittie Nermin A. H.Noor
100% (1)
Assessment in Learning: Prepared By: Sittie Nermin A. H.Noor
47 pages
Assessment-Learning-1
No ratings yet
Assessment-Learning-1
34 pages
UE-MA-LT-W2-Purpose of Tests-2019
No ratings yet
UE-MA-LT-W2-Purpose of Tests-2019
23 pages
C2-LE THI THANH TU - Classroom-Based Assessment-Assignment
No ratings yet
C2-LE THI THANH TU - Classroom-Based Assessment-Assignment
9 pages
Lesson 2
No ratings yet
Lesson 2
25 pages
TESTING&ASSESSMENT
No ratings yet
TESTING&ASSESSMENT
6 pages
Week 6 - Assessment
No ratings yet
Week 6 - Assessment
68 pages
Karaj Islamic Azad University ELT Department Syllabus For "Language Assessment"
No ratings yet
Karaj Islamic Azad University ELT Department Syllabus For "Language Assessment"
6 pages
Preparing Exams
No ratings yet
Preparing Exams
32 pages
Constructionoftests 211015110341
No ratings yet
Constructionoftests 211015110341
57 pages
Testing Evaluation PP T
No ratings yet
Testing Evaluation PP T
51 pages
CBA-session 1 - Key Concepts - Part 1
No ratings yet
CBA-session 1 - Key Concepts - Part 1
19 pages
Edl What Is A Test
100% (1)
Edl What Is A Test
18 pages
Standardized & Non Standardized Tests
100% (1)
Standardized & Non Standardized Tests
45 pages
Language Tests
No ratings yet
Language Tests
16 pages
Introducing Language Testing Spring 2024
No ratings yet
Introducing Language Testing Spring 2024
32 pages
Assist. Prof. Dr. Sevgi Şahin: İöabt Field Education
No ratings yet
Assist. Prof. Dr. Sevgi Şahin: İöabt Field Education
36 pages
Week V & VI
No ratings yet
Week V & VI
77 pages
Muhammad Syahid Usman Curriculum Assessment and Testing
No ratings yet
Muhammad Syahid Usman Curriculum Assessment and Testing
16 pages
Standardised and Non Standardised Test
100% (5)
Standardised and Non Standardised Test
31 pages
Applied Linguistics and Language Testing
No ratings yet
Applied Linguistics and Language Testing
5 pages
Understanding Assessment
No ratings yet
Understanding Assessment
29 pages
ASSESSMENT
No ratings yet
ASSESSMENT
6 pages
Lecture
No ratings yet
Lecture
14 pages
Unit 1
No ratings yet
Unit 1
14 pages
Evaluation, Measurement and Assessment Cluster 14
No ratings yet
Evaluation, Measurement and Assessment Cluster 14
25 pages
REPORT
No ratings yet
REPORT
3 pages
Group 4 Mk.l.testing
No ratings yet
Group 4 Mk.l.testing
19 pages
Sample Ims
No ratings yet
Sample Ims
18 pages
Final Language Testing - Sita Larasati
No ratings yet
Final Language Testing - Sita Larasati
4 pages
How to Practice Before Exams: A Comprehensive Guide to Mastering Study Techniques, Time Management, and Stress Relief for Exam Success
From Everand
How to Practice Before Exams: A Comprehensive Guide to Mastering Study Techniques, Time Management, and Stress Relief for Exam Success
Ranjot Singh Chahal
No ratings yet
Atomic Wedgie
No ratings yet
Atomic Wedgie
8 pages
P2 - 9H - Implicit Differentiation - QP1
No ratings yet
P2 - 9H - Implicit Differentiation - QP1
14 pages
5 KPI Report - 2
No ratings yet
5 KPI Report - 2
6 pages
Dragon s Justice 5 1st Edition Bruce Sentar download
100% (2)
Dragon s Justice 5 1st Edition Bruce Sentar download
43 pages
Report On ERT
No ratings yet
Report On ERT
15 pages
LESSON PLAN MAPEH 6 Quarter 2 (Arts)
No ratings yet
LESSON PLAN MAPEH 6 Quarter 2 (Arts)
6 pages
Usability Lecture Notes
No ratings yet
Usability Lecture Notes
56 pages
(Over 150 Million User) 1
100% (2)
(Over 150 Million User) 1
4 pages
Analisis Dan Perancangan Sistem Informasi Manajemen Layanan Pelanggan Pada PDAM Tirta Mayang Kota Jambi
No ratings yet
Analisis Dan Perancangan Sistem Informasi Manajemen Layanan Pelanggan Pada PDAM Tirta Mayang Kota Jambi
20 pages
Evaluation of Ointment
No ratings yet
Evaluation of Ointment
15 pages
A Study of Cognitive Style of Junior College Students of Science Stream With Respect To Gender and Locality
No ratings yet
A Study of Cognitive Style of Junior College Students of Science Stream With Respect To Gender and Locality
8 pages
Structure and Function of Neurons
No ratings yet
Structure and Function of Neurons
32 pages
Hac Hfw1509tm Il A Datasheet 20220817
No ratings yet
Hac Hfw1509tm Il A Datasheet 20220817
3 pages
Religious Factors
No ratings yet
Religious Factors
5 pages
Tanko Lighting Proposal
No ratings yet
Tanko Lighting Proposal
109 pages
Surya Resume
No ratings yet
Surya Resume
1 page
Philippine Society of Mechanical Engineers Articles of Incorporation and By-Laws
83% (6)
Philippine Society of Mechanical Engineers Articles of Incorporation and By-Laws
24 pages
Fee Structure: Phd. Admission (2018-19)
No ratings yet
Fee Structure: Phd. Admission (2018-19)
1 page
Business Analysis in The Blockchain Age PDF
No ratings yet
Business Analysis in The Blockchain Age PDF
19 pages
Duroxite 100 Data Sheet en
No ratings yet
Duroxite 100 Data Sheet en
3 pages
111.wheels and Brakes
No ratings yet
111.wheels and Brakes
94 pages
Final Geotechnical Report - PARKLAND Geo
100% (1)
Final Geotechnical Report - PARKLAND Geo
70 pages
Espelita - Final Term - Assignment #6
No ratings yet
Espelita - Final Term - Assignment #6
3 pages
CBRE Coupa CXML Questionnaire
No ratings yet
CBRE Coupa CXML Questionnaire
3 pages
Workplace Issues
No ratings yet
Workplace Issues
6 pages
Mixing Enhancement Using Chevron Nozzle: Studies On Free Jets and Confined Jets
No ratings yet
Mixing Enhancement Using Chevron Nozzle: Studies On Free Jets and Confined Jets
14 pages
Solidworks Leather Belt Jig
No ratings yet
Solidworks Leather Belt Jig
1 page
Perimeter and Area Ch 14
No ratings yet
Perimeter and Area Ch 14
4 pages
Monthly Prayer Times 2023 October
No ratings yet
Monthly Prayer Times 2023 October
3 pages
Evidence-Based Considerations For Removable Prosthodontic and Dental Implant Occlusion: A Literature Review
No ratings yet
Evidence-Based Considerations For Removable Prosthodontic and Dental Implant Occlusion: A Literature Review
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Language Testing Summary of Chapters 1 6

Uploaded by

Language Testing Summary of Chapters 1 6

Uploaded by

Chapter One: Preliminaries of Language Testing

2. TEST, MEASUREMENT, EVALUATION

Evaluation has been defined in a variety of ways:

3. NORM-REFERENCED TESTS vs. CRITERION-REFERENCED TESTS

Spread students out along a Assess the amount of material

4. TEACHER-MADE TESTS VS. STANDARDIZED TESTS

Characteristics Teacher-Made Test Standardized Test

1. TWO MAJOR FUNCTIONS OF LANGUAGE TESTS

1.1.1. Achievement tests

1.1.3. Proficiency tests

1.2. Prognostic tests

1.2.1. Placement tests

1.2.2. Selection tests

1.2.3. Aptitude tests

The form of a test refers to its physical appearance.

1) How many functions do language tests serve? (stem)

2. CLASSIFICATION OF ITEM FORMS

2.2. Essay-Type vs. multiple-choice items

2.3. Suppletion vs. recognition items

where n refers to the number of options.

3.2. Personal response items (or alternative assessment options)

2.2. Ordinal Data

2.3. Interval Data

2.4. Ratio Data

Shows Gives Equal Absolute

Score Rank order

Frequency Relative Cumulative

3.2. The Frequency Distribution

3.3. Relative Frequency

3.5. Cumulative Frequency

4.1.4. Mid Point

4.2. Measures of Variability

4.2.2. Standard Deviation (SD)

Mean of the arrorws length in the first ∑X 1+1+3+3+2

Mean of the arrorws length in the second

Variance = S2 (standard deviation squared)  Standard deviation = √variance

 Example: In vocabulary test, mean and standard deviation are calculated to be 82

This is a linear perfect positive correlation.

Students Test A Test B

This is a linear (not perfect) positive correlation.

Students Days of absence English test

9.1. Pearson Product Moment Correlation

𝑁(∑ 𝑋𝑌) − (∑ 𝑋)(∑ 𝑌)

9.2. Rank Order Correlation

9.3. Point Biserial Correlation

1. DETERMINING FUNCTION AND FORM OF THE TEST

2. PLANNING (Specifying the test content)

Item Facility (IF)

Note: The range of IF index is 0 ≤ 𝐼𝐹 ≤ 1

Note: The ideal index for IF is 𝐼𝐹𝑖𝑑𝑒𝑎𝑙 = 0.5

Item Difficulty (ID) = 1 − 𝐼𝐹

 Note: The range of ID index is −1 ≤ 𝐼𝐷 ≤ +1

Choice Distribution (CD)

Example: (Choice C is the answer)

Choice Highs Lows Total

2) CLASSICAL TRUE SCORE THEORY (CTS)

3) STANDARD ERROR OF MEASUREMENT

𝑆𝑥 is the standard deviation of the test

𝑆𝐸𝑀 = 𝑆𝑥√1 − 𝑟 = 15 × √1 − 0.84 = 15 × √0.16 = 15 × 0.4 = 6

4) APPROACHES TO ESTIMATING RELIABILITY

4.3.1) Split-half Methods

Spearman Brown estimate

𝑟𝑡𝑜𝑡𝑎𝑙 is the reliability of the full-length test

2𝑟ℎ𝑎𝑙𝑓 2 × .35 0.7

4.3.2) Item variance methods

𝑋̅ is the mean score

 KR-21 always provides an underestimate index of reliability.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.