Mucus aggregation on the vocal folds, a common complaint amongst persons with voice disorders, has been visually rated on four parameters: type, pooling, thickness, and location. Rater training is used to improve the reliability and accuracy of these ratings. The goal of this study was to evaluate the effect of training on rater reliability, accuracy and response time. Two raters scored mucus aggregation from 120 stroboscopic exams after a brief introductory session and again after a thorough training session. Reliability and accuracy were calculated in percent agreement. Two-tail paired t-tests were used to assess differences in reaction time for ratings before and after training. Inter-rater reliability improved from 79% pre-training to 92% post-training. Intra-rater reliability improved from 77% to 91% for Rater 1 and 80% to 88% for Rater 2 following training. Accuracy improved from 80% to 96% for Rater 1 and 76% to 95% for Rater 2 from pre- to post-training. Reaction time decreased for both raters (p=0.025). These findings further our understanding of observer performance on judgments of laryngeal mucus. These results suggest that rater training increases reliability and accuracy while decreasing reaction time. Future studies should assess the relationship of these judgments and voice changes.