Abstract
Test developers face two issues: (a) what to measure, and (b) how to measure it (Lindquist, 1936). For most large-scale testing programs, test blueprints are developed that specify content and cognitive demands in terms of “what to measure.�? Regarding “how to measure,�? one dilemma facing designers is the choice of item format. The issue is significant in a number of ways. First, interpretations vary according to item format. Sec- ond, for policymakers, the cost of scoring open-ended items can be enormous compared with multiple-choice items. Third, the consequences of using any given format may affect instruction in ways that foster or hinder the development of cognitive skills being measured by tests-an effect related to systemic validity (Frederiksen & Collins, 1989). Everyone involved in these discussions points to the centrality of validity concerns. Whether our attention is to systemic validity, a unitary construct validity orientation (Messick, 1989), or a focus on consequential validity (see Mehrens, chap. 7, this volume; Messick, 1994), meaning and inference are our concerns.
Original language | English (US) |
---|---|
Title of host publication | Large-Scale Assessment Programs for All Students |
Subtitle of host publication | Validity, Technical Adequacy, and Implementation |
Publisher | Taylor and Francis |
Pages | 182-197 |
Number of pages | 16 |
ISBN (Electronic) | 1410605116, 9781135653897 |
ISBN (Print) | 9780805837094 |
DOIs | |
State | Published - Jan 1 2012 |