This paper develops the thesis that the sample sizes which are commonly used in clinical outcome research are not sufficient to detect meaningful differences between treatments. Behavioral weight control is used to exemplify this problem. The sample sizes needed to statistically detect a difference between treatment conditions of 5, 10, and 15 pounds have been computed based on the attrition and the variability of treatment effects reported in the literature. It is demonstrated that sample sizes used in behavioral weight control studies are usually too small to detect any but the largest differences between conditions. With usual sample sizes, a 10-pound difference between conditions at the end of treatment and a 15-pound difference at follow-up (effect size of 1.2-1.3) would be required to assure statistical significance. Recommendations are made for (a) greater attention to sample size calculation in study design, (b) attempts to reduce between-subject variability, and (c) consideration of relaxing standard criteria for statistical significance in exploratory studies.