Reliability of the classification of proximal femur fractures: Does clinical experience matter?

The Science of Variation Group

Research output: Contribution to journalArticlepeer-review

17 Scopus citations


Background: Radiographic fracture classification helps with research on prognosis and treatment. AO/OTA classification into fracture type has shown to be reliable, but further classification of fractures into subgroups reduces the interobserver agreement and takes a considerable amount of practice and experience in order to master. Questions/purposes: We assessed: (1) differences between more and less experienced trauma surgeons based on hip fractures treated per year, years of experience, and the percentage of their time dedicated to trauma, (2) differences in the interobserver agreement between classification into fracture type, group, and subgroup, and (3) differences in the interobserver agreement when assessing fracture stability compared to classifying fractures into type, group and subgroup. Methods: This study used the Science of Variation Group to measure factors associated with variation in interobserver agreement on classification of proximal femur fractures according to the AO/OTA classification on radiographs. We selected 30 anteroposterior radiographs from 1061 patients aged 55 years or older with an isolated fracture of the proximal femur, with a spectrum of fracture types proportional to the full database. To measure the interobserver agreement the Fleiss’ kappa was determined and bootstrapping (resamples = 1000) was used to calculate the standard error, z statistic, and 95% confidence intervals. We compared the Kappa values of surgeons with more experience to less experienced surgeons. Results: There were no statistically significant differences in the Kappa values on each classification level (type, group, subgroup) between more and less experienced surgeons. When all surgeons were combined into one group, the interobserver reliability was the greatest for classifying the fractures into type (kappa, 0.90; 95% CI, 0.83 to 0.97; p < 0.001), reflecting almost perfect agreement. When comparing the kappa values between classes (type, group, subgroup), we found statistically significant differences between each class. Substantial agreement was found in the clinically relevant groups stable/unstable trochanteric, displaced/non-displaced femoral neck, and femoral head fractures (kappa, 0.60; 95% CI, 0.53 to 0.67, p < 0.001). Conclusions: This study adds to a growing body of evidence that relatively simple distinctions are more reliable and that this is independent of surgeon experience.

Original languageEnglish (US)
Pages (from-to)819-823
Number of pages5
Issue number4
StatePublished - Apr 2018

Bibliographical note

Publisher Copyright:
© 2018 Elsevier Ltd


  • AO/OTA classification
  • Interobserver agreement
  • Proximal femur fractures


Dive into the research topics of 'Reliability of the classification of proximal femur fractures: Does clinical experience matter?'. Together they form a unique fingerprint.

Cite this