Despite calls for the improvement of self-assessment as a basis for self-directed learning, instructional designs that include reflection in practice are uncommon. Using data from a screen-based simulation for learning radiograph interpretation, we present validity evidence for a simple self-monitoring measure and examine how it can complement skill assessment. Methods: Medical students learning ankle radiograph interpretation were given an online learning set of 50 cases which they were asked to classify as 'abnormal' (fractured) or 'normal' and to indicate the degree to which they felt certain about their response (Definitely or Probably). They received immediate feedback on each case. All students subsequently completed two 20-case post-tests: an immediate post-test (IPT), and a delayed post-test (DPT) administered 2 weeks later. We determined the degree to which certainty (Definitely versus Probably) correlated with accuracy of interpretation and how this relationship changed between the tests. Results: Of 988 students approached, 115 completed both tests. Mean ± SD accuracy scores decreased from 59 ± 17% at the IPT to 53 ± 16% at the DPT (95% confidence interval [CI] for the difference: -2% to -10%). Mean self-assessed certainty did not decrease (rates of Definitely: IPT, 17.6%; DPT, 19.5%; 95% CI for difference: +7.2% to -3.4%). Regression modelling showed that accuracy was positively associated with choosing Definitely over Probably (odds ratio [OR] 1.63, 95% CI 1.27-2.09) and indicated a statistically significant interaction between test timing and certainty (OR 0.72, 95% CI 0.52-0.99); thus, the accuracy of self-monitoring decayed over the retention interval, leaving students relatively overconfident in their abilities. Conclusions: This study shows that, in medical students learning radiograph interpretation, the development of self-monitoring skills can be measured and should not be assumed to necessarily vary in the same way as the underlying clinical skill.