Validity and reliability in assessment

An assessment system which relies heavily on making inferences about individuals, teachers, and schools from just end-of-course examination data can be seen as hugely problematic; to some one of the principle issues is that, by doing so, a serious threat to validity has occurred: construct under-representation. What this means is that, since only a small part of the entire taught curriculum can be assessed, other equally important learning outcomes go untested. (Brookhart et al., 2019). Furthermore, when considering reliability, there are also issues because no single test or examination can ever be 100% reliable, and classifying learners (by grading them, for example) will always consist of a certain amount of error.

Formative assessment can help address the notion of construct under-representation as a threat to validity, as well as reliability; instead of getting a “snapshot” of learning, what we are getting is a broader ‘photo album’ (Brookhart et al., 2019). What this means is that ‘it has the effect of lengthening the test’ (Wiliam, 2007: 1). 

For example, during the normal course of teaching and learning, teachers gather evidence from a range of regular activities by the coverage of a more varied, and more complete, set of learning goals which can then be integrated into the normal day-to-day teaching and learning cycle. This is a particular strength for an effective assessment system; it includes important information about learning which externally-set tests cannot capture. In principle, what this means is that a much wider variety of learning outcomes can be assessed (for example, deeper assessment which includes creativity and problem solving in more “authentic” contexts) and not just those that can be easily tested.