We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17
Reliability of Measures
The Concept Of Reliability
• The reliability of a measurement procedure is the stability or consistency of the measurement. • A measurement procedure is said to have reliability if it produces identical (or nearly identical) results when it is used repeatedly to measure the same individual under the same conditions. • For example, if we use an IQ test to measure a person’s intelligence today, then use the same test for the same person under similar conditions next week, we should obtain nearly identical IQ scores. • The inconsistency in a measurement comes from error. • Observer error: The individual who makes the measurements can introduce simple human error into the measurement process. • Environmental changes: there are small changes in the environment (such as time of day, temperature, weather conditions, and lighting) from one measurement to another, and these small changes can influence the measurements. • Participant changes: The participant can change between measurements. As noted earlier, a person’s degree of focus and attention can change quickly and can have a dramatic effect on measures of reaction time. I.e. Hunger Vs. IQ • In summary, any measurement procedure involves an element of error and the amount of error determines the reliability of the measurements. When error is large, reliability is low, and when error is small, reliability is high. Reliability Types 1. Inter Rater Reliability • When measurements are obtained by direct observation of behaviors, it is common to use two or more separate observers who simultaneously record measurements; is the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure. • For example, two psychologists may watch a group of preschool children and observe social behaviors. Each individual records (measures) what she observes, and the degree of agreement between the two observers is called inter-rater reliability. • Inter-rater reliability can be measured by computing the correlation between the scores from the two observers or by computing a percentage of agreement between the two observers 2. Test-retest Reliability • The reliability estimate obtained by comparing the scores obtained from two successive measurements is commonly called test-retest reliability. • A researcher may use exactly the same measurement procedure for the same group of individuals at two different times; The reliability of test scores is by repeating the identical test on a second occasion. • The reliability coefficient in this case is simply the correlation between the scores obtained by the same persons on the two administrations of the test. • Of course, sometimes poor test–retest correlations do not mean that a test is unreliable Limitations of Test-retest reliability ● Carry Over Effect: This effect occurs when the first testing session influences scores from the second session. For example, test takers sometimes remember their answers from the first time they took the test. Carryover problems are of concern only when the changes over time are random not systematic. ● Practice effects- type of carryover effect. Some skills improve with practice. When a test is given a second time, test takers score better because they have sharpened their skills by having taken the test the first time. ● The time interval between testing sessions must be selected and evaluated carefully. If the two administrations of the test are close in time, there is a relatively great risk of carryover and practice effects. ● Motivation Level 3. Parallel Form Reliability • Researcher may use modified versions of the measurement instrument (such as alternative versions of an IQ test) to obtain two different measurements for the same group of participants. • The same persons can be tested with one form on the first occasion and with another, comparable form on the second. The correlation between the scores obtained on the two forms represents the reliability coefficient of the test • When different versions of the instrument are used for the test and the retest, the reliability measure is often called parallel-forms reliability. • Both groups take both tests: group A takes test A first, and group B takes test B first. The results of the two tests are compared, and the results are almost identical, indicating high parallel forms reliability Just think! You missed the midterm examination and have to take a makeup exam. Your classmates tell you that they found the midterm impossibly difficult. Your instructor tells you that you will be taking an alternate form, not a parallel form, of the original test. How do you feel about that? Limitations ● test scores may be affected by factors such as motivation, fatigue, or intervening events such as practice, learning, or therapy ● The order of administration is usually counterbalanced to avoid practice effects ● Developing alternate forms of tests can be time-consuming and expensive. 4. different methods for Internal Consistency estimates of Reliability 1.Split half Reliability 2. KR 20 Formula and Cronbach Alpha 1. Split Half Reliability To measure the degree of consistency, researchers commonly split the set of items in half and compute a separate score for each half. The degree of agreement between the two scores is then evaluated, usually with a correlation. This general process results in a measure of split-half reliability. Only a single administration of a single form is required. Steps: Divide the test into two equivalent halves 1. Random method 2. Odd-even 3. Content & difficulty 4. Compute the Pearson r b/w scores on the two halves of the test 2.KR20 Formula and Cronbach Alpha • Cronbach’s Alpha and the Kuder-Richardson formula are two statistical techniques for dealing with this problem • KR20 first published in 1937 is a measure of internal consistency reliability for measures with dichotomous choices. e.g. Right/wrong, true/false, correct/incorrect • it shouldn’t be used for questions with partial credit is possible or for scales like the Likert Scale. If you have a test with more than two answer possibilities (or opportunities for partial credit), use Cronbach’s Alpha instead. • Cronbach’s alpha refers to the degree of correlation among all the items on a scale. It is a measure of inter-item consistency is calculated from a single administration of a single form of a test and it is used to estimate the internal consistency of homogeneous as well heterogeneous tests (Most preferable). How Reliability can be improved • Sufficient .70 t0 .90 alpha coefficient • Increase the number of items • affect by sample size
• Factor and item analysis (correlation between
single item score and Total scale score; low correlation indicates item is so hard, irrelevant, different; also omitted to improve overall reliability) • Correction for attenuation- low reliability reduces the chances of finding significant correlations between measures. If a test is unreliable, information obtained with it is of little or no value. Thus, we say that potential correlations are attenuated, or diminished, by measurement error Thank You ☺
The Mediating Effect of Self-Regulation on the Relationship between Mathematical Disposition and Mathematics Proficiency among Mathematics Education Students