Reliability
- Introduction
- Test-Retest
- Split half
- Covariance : Coefficient Alpha or Kuder-Richardson
- Alternate-form
- Generalized Reliability Coefficient
- References and More Reading
Introduction
A technique is reliable when applied to the same person several times, in same conditions, it gives same results. Some speak about stability or consistency of a measure instead of reliability.
When a test is applied to the same persons a few days apart, and give similar results, it would then be considered reliable. However, particularly with personality measurements, the reliability of a test must be differentiated from the reliability of the constructs that are measured by the test. The construct measured can substantially evolve for true along medium or long periods of time. Or it might be the same or close. But its measurement by different tests or by the same test at different time intervals might show inconsistent results.
Reliability of personality measurements involve delicate issues which are not present for instance in physics when measuring distances, temperatures, time, speed, density, weights, etc. Unlike most other measurements, when taking a personality test, the person reflecting on himself/herself can significantly affect the results. How the questions are formulated, the relation between the tester and the tested, how the instructions are given to the candidate, time limits imposed, how the person “feels good” about himself or herself the day the test is taken, the tendency to answer positively to the questions, the tendency to answer what is felt as most desirable for a targeted job, etc.: all these factors do affect the results of a personality test.
Stability is one of the primary assets of personality measurements for utilizing them in organizational situations: to anticipate and predict. Because of generally relatively stable results of personality measurements, behaviors can be anticipated. And observations and studies show repetitive positive results of the stability of personality measurements.
Reliability of a personality test needs to be differentiated from its validity and validity of a personality test has no meaning if the test cannot first show reliability: What about “something” that is always changing either because it is intrinsically in its nature to always change, or because no measurement can capture any form of consistency of this “something” along time?
Reliability must be proved through studies conducted thoroughly. Stability results can be easily affected by manipulating the data and selecting an adequate sample of people. Different studies need to be conducted using different techniques and statistics on different samples. Reliability is generally measured by calculating correlation coefficients between two sets of scores. The technique is considered reliable if the correlation coefficient is close to 1. In fact, the reliability measurement refers to the estimate of the proportion of the total variance that can be attributed to an error. The Reliability of technique is usually assessed in four different ways: (1) test-retest, (2) Split half, (3) Covariance, and (4) alternate-form.
Test-retest
The same technique is administered several times to the same persons, in same conditions. The time interval can be several hours, days, weeks, months or years.
The concept of the test reliability over time can be restrained to the test itself but not to the persons; one might not want to study the stability of a particular person’s characteristic but the capacity of the measuring tool to give consistent information over time. After some days or weeks it is important to evaluate if the instructions given the first time influenced how the test was taken the second time or if the test might be too sensitive to the ever changing environment (and person).
Split-Half
Two scores are obtained for each person by dividing the test into equivalent halves. Two distinct scores are obtained and a correlation coefficient is computed between the two. This reliability is sometimes called internal consistency (like for covariance). Temporal stability does not enter into consideration here
There are different ways to separate the sample: items scored first versus items scored second, or items odds versus even. The correlation between the two scores constitutes a reliability index. In cutting the test in two halves, one can understand if the two different forms in fact measure the same phenomenon. However the reliability that is measured concerns only half of the test (the shorter the test, the more the probability of the reliability is reduced).
Covariance
With the covariance technique one evaluates the consistency of the answers to the items of the test. The internal consistency might be influenced in two different ways: the item sample and the heterogeneity of the groups of behaviors measured. The more homogeneous are the groups of behaviors, the more consistent are the items between one another inside the groups. The item performance is examined by knowing how much it shares meaning with other items. A homogeneity coefficient is calculated. If the scores are dichotomous (succeed/failed), the Kuder- Richardson index is calculated. If the scores are continuous (notation on a scale), the Cronbach alpha is computed.
Alternate Form
Two similar versions of the same technique are built. The content of the form and the difficulties must be comparable. The two versions are then given to the same persons and a correlation coefficient is computed between the two series of measures. If the correlation between the two scores are high, one can generalize the results of one person from one version of the test to its other version. The parallel forms of the same test enable to observe or neutralize certain bias such as the training to the test or familiarization with it.
In practice, this way of testing reliability is seldom utilized because it is difficult to really build two equivalent forms of a same test.
Generalized reliability coefficient
The different results of a reliability calculation can be integrated in a unique result called generalized reliability coefficient. This coefficient enables to extract part of the variance which is due to the test itself and which is not related to other origins such as time, survey conditions, or residual inconsistency between the items. The generalized reliability coefficient can be utilized in criterion validity studies and in meta-analyses. In practice, this generalized coefficient is rarely utilized with tests in organization.
References and More Reading
There is a lot of general comprehensive information about psychological testing on line:
Google Scholar,
Wikipedia.
General information on psychological testing:
- Michell J. (2005). Measurement in Psychology. A critical History of a Methodological Concept. Cambridge University Press
- Rust J., Golombok S. (2009). Modern Psychometrics. The Science of Psychological Assessment. Routledge.
- Anastasi A., Urbina S. (1997). Psychological testing. Prentice Hall.
- Groth-Marnat G. (1999). Handbook of psychological assessment. Wiley.
More in this wiki:
Correlation coefficients,
Validity.
Comments
No comments for this document