The use of trained raters has a long tradition in educational research. A standard feature of studies employing raters is the use of indica ors of agreement among raters such as interrater reliability coefficients. Surprisingly, the validity of rating data has received relatively little attention, raising the undesirable prospect of ratings with satisfactory reliability but little validity. This article suggests two complementary frameworks for providing validity evidence for rating data. One conceptualizes raters as data collection instruments that should be subject to traditional procedures for establishing validity evidence; another evaluates studies employing raters from an experimental design perspective, permitting the internal validity of the study to be assessed and used as an indicator of the extent to which ratings are attributable to the training of the raters. Two studies employing raters are used to illustrate these ideas.