Module 4 PT
Module 4 PT
Module 4 PT
PT NOTES
CLASSIFIED AND SIMPLE
Steps
1) Plan for the test.
2) Preparation of the test.
3) Trial run of the test.
4) Checking the Reliability and Validity of the test.
5) Prepare the norms for the test.
6) Prepare the manual of the test and reproducing the test.
2. Writing the items of the test – This requires a lot of creativity and is dependent on the
imagination, expertise, and knowledge. Its requirements are:
In-Depth knowledge of the subject
Awareness about the aptitude and ability of the individuals to be tested.
Large vocabulary to avoid confusion in writing. Words should be simple and
descriptive enough for everybody to understand.
Assembly and arrangement of items in a test must be proper, generally done in
ascending order of difficulty.
Detailed instructions of the objective, time limit and the steps of recording the
answers must be given.
Help from experts should be taken to crosscheck for subject and language errors.
3. Preliminary Administration – After modifying the items as per the advice of the
experts the test can be tried out on experimental basis, which is done to prune out any
inadequacy or weakness of the item. It highlights ambiguous items, irrelevant choices
in multiple choice questions, items that are very difficult or easy to answer. Also, the
time duration of the test and number of items that are to be kept in the final test can be
ascertained, this avoids repetition and vagueness in the instructions. This is done in
following three stages:
b) The proper try-out – It is administered to approximately four hundred people wherein the
sample is kept same as the final intended participants of the test. This test is done to remove
the poor or less significant items and choose the good items and includes two activities:
Item analysis – The difficulty of the test should be moderate with each item
discriminating the validity between high and low achievers. Item analysis is the
process to judge the quality of an item.
Post item analysis: The final test is framed by retaining good items that have a
balanced level of difficulty and satisfactory discrimination. The blueprint is used to
guide in selection of number of items and then arranging them as per difficulty. Time
limit is set.
Final try-out – It is administered on a large sample in order to estimate the reliability
and validity. It provides an indication to the effectiveness of the test when the
intended sample is subjected to it.
4. Reliability and Validity of the test – When test is finally composed, the final test is
again administered on a fresh sample in order to compute the reliability coefficient.
This time also sample should not be less than 100. Reliability is calculated through
test-retest method, split-half method, and the equivalent -form method. Reliability
shows the consistency of test scores. Validity refers to what the test measures and
how well it measures. If a test measures a trait that it intends to measure well then, the
test can be said to be a valid one. It is correlation of test with some outside
independent criterion.
5. Norms of the final test – Test constructor also prepares norms of the test. Norms are
defined as average performance scores. They are prepared to meaningfully interpret
the scores obtained on the test. The obtained scores on test themselves convey no
meaning regarding the ability or trait being measured. But when these are compared
with norms, a meaningful inference can be immediately drawn. The norms may be
age norms, grade norms etc. as discussed earlier. Similar norms cannot be used for all
tests.
6. Preparation of manual and reproduction of the test – The manual is prepared as the
last step and the psychometric properties of the test norms and references are reported.
It provides in detail the process to administer the test, its duration and scoring
technique. It also contains all instructions for the test.
Item Analysis
Item analysis is a method used to evaluate the quality of test items. It involves calculating
various indices to assess the difficulty, reliability, validity, and discrimination of each item.
Item Difficulty
Item difficulty is an index that measures how difficult an item is for test-takers. It is
calculated by determining the proportion of test-takers who answered the item correctly. This
index is denoted by 'pi' and can range from 0 to 1.
where 'p' represents the percentage of people who answered the item correctly and
- The difficulty level should be higher than the chance probability of guessing the correct
answer.
• X+a=Y
Example:
If 70% of test-takers answered an item correctly and the chance probability of guessing the
correct answer is 50%, the calculation would be:
The probability of getting the correct answer can be calculated using the formula.
(c + 1)/2,
For example, if there are 5 options and the guessing chance is 0.20, the calculation would be
as follows:
- Guessing chance: 0.20
Item Discrimination
Item discrimination is an index that measures how well an item differentiates between high-
performing and low-performing test-takers. It assesses the ability of the item to discriminate
between individuals with different levels of the construct being measured. A higher
discrimination index indicates a better item.
It is a measure of the item's ability to differentiate between individuals who perform well on
the test and those who perform poorly.
Items can be selected or evaluated based on their relationship to the same external criteria.
This means that the items are compared to an external standard or measure to determine their
effectiveness in discriminating between high and low scorers. However, this method requires
the availability of an external criterion.
When an external criterion is not available, item discrimination can be investigated against
the total test score. This means that the items are compared to the overall performance of the
test takers to determine their ability to differentiate between high and low scorers. However,
this method is only effective when the original item pool measures a single attribute, and that
attribute is a component of the external criteria or construct being assessed.
Considerations for Complex Attributes
For tests that measure complex attributes, using item discrimination based on comparison
with total scores may not be sufficient. This method may reduce criteria coverage and lower
the validity of the test. It is important to consider the specific attributes being measured and
the relationship between the items and the external criteria.
Satisfactory Item
A satisfactory item is one that has the highest external validities and the lowest coefficients of
internal consistency. This means that the item demonstrates a strong relationship with the
external criteria and is consistent in measuring the intended attribute.
Example:
- In a normally distributed set of scores, the upper and lower 27% serve as the optimum level.
- Identify the number of people in the two groups that have answered the item correctly.
- Subtract the number of people who answered in the low scoring group from the number of
people in the high scoring group.
- Divide the result by the total number of people to obtain a rough index of discriminative
value.
B. Point Biserial Method
- Applied to a dichotomously scored item (correct/incorrect) and the total test score, which is
a continuous variable representing overall test performance.
- A negative correlation would indicate a poor item that does not effectively discriminate.
Criteria of Testing
- The criteria of testing refer to the standards or guidelines used to evaluate the quality and
effectiveness of a test.
- These criteria help determine whether a test is reliable, valid, and fair.
- Reliability: The consistency and stability of test scores over time and across different
administrations.
- Fairness: Ensuring that the test does not discriminate against any particular group of
individuals based on factors such as race, gender, or socioeconomic status.
Note: The provided information does not include specific details about the criteria of testing.
The purpose of the Manual is to provide all relevant information about the test. It should
include details about the test's norms, cultural applicability, reliability, and validity in
different contexts.
Establishing Norms
Norms are established benchmarks or standards that are derived from the performance of a
representative sample of individuals. They are created with the purpose of the test in mind.
Types of Norms
1. Generic Norms:
- For example: tests for employment, Urban college students, IT sector managers, etc.
3. Customized Norms:
- However, these tests have limited sample size and context of the use.
Cultural Applicability
Refers to the extent to which the test is relevant and applicable across different cultural
contexts. It is important to test the reliability and validity of the test in different cultural
contexts to ensure its effectiveness.
1. Conduct the test with a representative sample from different cultural backgrounds.
2. Analyse the test results to determine if there are any cultural biases or differences in
performance.
3. Assess the reliability of the test by conducting test-retest studies to determine if the results
are consistent over time.
4. Assess the validity of the test by comparing the test results with other established measures
of the construct.
Establishing norms for a test is important to ensure that the test is fair and accurate. Norms
can be generic, specific, or customized based on the purpose of the test.
Sample Size
- refers to the number of individuals or units that are included in a study or experiment.
- A larger sample size generally provides more reliable and accurate results, as it reduces the
impact of random variation.
- The sample size should be determined based on the research question, the desired level of
precision, and the available resources.
- A sample size that is too small may lead to biased or inconclusive results, while a sample
size that is too large may be impractical or unnecessary.
Context of Use
- refers to the specific population or setting in which the study or experiment is conducted.
- It is important to consider the context of use when interpreting the results of a study, as
findings may not be generalizable to other populations or settings.
- The context of use may include factors such as demographics, geographic location, cultural
background, or specific conditions or characteristics of the population.
- Researchers should carefully define and describe the context of use to ensure that the
findings are applicable and relevant to the intended audience or target population.
Example:
- A study on the effectiveness of a new medication for treating a specific disease may have a
sample size of 500 patients from different hospitals across the country. The context of use
would include factors such as the age, gender, and medical history of the patients, as well as
the specific hospitals and healthcare systems involved. The findings of this study may be
applicable to similar patient populations in similar healthcare settings but may not be
generalizable to other populations or settings.