The functional diversity of a community can influence ecosystem functioning and reflects assembly processes. The large number of disparate metrics used to quantify functional diversity reflects the range of attributes underlying this concept, generally summarized as functional richness, functional evenness, and functional divergence. However, in practice, we know very little about which attributes drive which ecosystem functions, due to a lack of field-based tests. Here we test the association between eight leading functional diversity metrics (Rao's Q, FD, FDis, FEve, FDiv, convex hull volume, and species and functional group richness) that emphasize different attributes of functional diversity, plus 11 extensions of these existing metrics that incorporate heterogeneous species abundances and trait variation. We assess the relationships among these metrics and compare their performances for predicting three key ecosystem functions (above- and belowground biomass and light capture) within a long-term grassland biodiversity experiment. Many metrics were highly correlated, although unique information was captured in FEve, FDiv, and dendrogram-based measures (FD) that were adjusted by abundance. FD adjusted by abundance outperformed all other metrics in predicting both above- and belowground biomass, although several others also performed well (e.g. Rao's Q, FDis, FDiv). More generally, trait-based richness metrics and hybrid metrics incorporating multiple diversity attributes outperformed evenness metrics and single-attribute metrics, results that were not changed when combinations of metrics were explored. For light capture, species richness alone was the best predictor, suggesting that traits for canopy architecture would be necessary to improve predictions. Our study provides a comprehensive test linking different attributes of functional diversity with ecosystem function for a grassland system.