Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks

Jason D Kelly, Ashley Petersen, Thomas S. Lendvay, Timothy M. Kowalewski

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Purpose: The majority of historical surgical skill research typically analyzes holistic summary task-level metrics to create a skill classification for a performance. Recent advances in machine learning allow time series classification at the sub-task level, allowing predictions on segments of tasks, which could improve task-level technical skill assessment. Methods: A bidirectional long short-term memory (LSTM) network was used with 8-s windows of multidimensional time-series data from the Basic Laparoscopic Urologic Skills dataset. The network was trained on experts and novices from four common surgical tasks. Stratified cross-validation with regularization was used to avoid overfitting. The misclassified cases were re-submitted for surgical technical skill assessment to crowds using Amazon Mechanical Turk to re-evaluate and to analyze the level of agreement with previous scores. Results: Performance was best for the suturing task, with 96.88% accuracy at predicting whether a performance was an expert or novice, with 1 misclassification, when compared to previously obtained crowd evaluations. When compared with expert surgeon ratings, the LSTM predictions resulted in a Spearman coefficient of 0.89 for suturing tasks. When crowds re-evaluated misclassified performances, it was found that for all 5 misclassified cases from peg transfer and suturing tasks, the crowds agreed more with our LSTM model than with the previously obtained crowd scores. Conclusion: The technique presented shows results not incomparable with labels which would be obtained from crowd-sourced labels of surgical tasks. However, these results bring about questions of the reliability of crowd sourced labels in videos of surgical tasks. We, as a research community, should take a closer look at crowd labeling with higher scrutiny, systematically look at biases, and quantify label noise.

Original languageEnglish (US)
Pages (from-to)2079-2088
Number of pages10
JournalInternational Journal of Computer Assisted Radiology and Surgery
Volume15
Issue number12
DOIs
StatePublished - Dec 2020

Bibliographical note

Publisher Copyright:
© 2020, CARS.

Keywords

  • Bidirectional LSTM
  • Crowd sourcing
  • Machine learning
  • Surgical skill
  • Surgical technical skill

Fingerprint

Dive into the research topics of 'Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks'. Together they form a unique fingerprint.

Cite this