TY - JOUR
T1 - Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks
AU - Kelly, Jason D
AU - Petersen, Ashley
AU - Lendvay, Thomas S.
AU - Kowalewski, Timothy M.
N1 - Publisher Copyright:
© 2020, CARS.
PY - 2020/12
Y1 - 2020/12
N2 - Purpose: The majority of historical surgical skill research typically analyzes holistic summary task-level metrics to create a skill classification for a performance. Recent advances in machine learning allow time series classification at the sub-task level, allowing predictions on segments of tasks, which could improve task-level technical skill assessment. Methods: A bidirectional long short-term memory (LSTM) network was used with 8-s windows of multidimensional time-series data from the Basic Laparoscopic Urologic Skills dataset. The network was trained on experts and novices from four common surgical tasks. Stratified cross-validation with regularization was used to avoid overfitting. The misclassified cases were re-submitted for surgical technical skill assessment to crowds using Amazon Mechanical Turk to re-evaluate and to analyze the level of agreement with previous scores. Results: Performance was best for the suturing task, with 96.88% accuracy at predicting whether a performance was an expert or novice, with 1 misclassification, when compared to previously obtained crowd evaluations. When compared with expert surgeon ratings, the LSTM predictions resulted in a Spearman coefficient of 0.89 for suturing tasks. When crowds re-evaluated misclassified performances, it was found that for all 5 misclassified cases from peg transfer and suturing tasks, the crowds agreed more with our LSTM model than with the previously obtained crowd scores. Conclusion: The technique presented shows results not incomparable with labels which would be obtained from crowd-sourced labels of surgical tasks. However, these results bring about questions of the reliability of crowd sourced labels in videos of surgical tasks. We, as a research community, should take a closer look at crowd labeling with higher scrutiny, systematically look at biases, and quantify label noise.
AB - Purpose: The majority of historical surgical skill research typically analyzes holistic summary task-level metrics to create a skill classification for a performance. Recent advances in machine learning allow time series classification at the sub-task level, allowing predictions on segments of tasks, which could improve task-level technical skill assessment. Methods: A bidirectional long short-term memory (LSTM) network was used with 8-s windows of multidimensional time-series data from the Basic Laparoscopic Urologic Skills dataset. The network was trained on experts and novices from four common surgical tasks. Stratified cross-validation with regularization was used to avoid overfitting. The misclassified cases were re-submitted for surgical technical skill assessment to crowds using Amazon Mechanical Turk to re-evaluate and to analyze the level of agreement with previous scores. Results: Performance was best for the suturing task, with 96.88% accuracy at predicting whether a performance was an expert or novice, with 1 misclassification, when compared to previously obtained crowd evaluations. When compared with expert surgeon ratings, the LSTM predictions resulted in a Spearman coefficient of 0.89 for suturing tasks. When crowds re-evaluated misclassified performances, it was found that for all 5 misclassified cases from peg transfer and suturing tasks, the crowds agreed more with our LSTM model than with the previously obtained crowd scores. Conclusion: The technique presented shows results not incomparable with labels which would be obtained from crowd-sourced labels of surgical tasks. However, these results bring about questions of the reliability of crowd sourced labels in videos of surgical tasks. We, as a research community, should take a closer look at crowd labeling with higher scrutiny, systematically look at biases, and quantify label noise.
KW - Bidirectional LSTM
KW - Crowd sourcing
KW - Machine learning
KW - Surgical skill
KW - Surgical technical skill
UR - http://www.scopus.com/inward/record.url?scp=85091768922&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091768922&partnerID=8YFLogxK
U2 - 10.1007/s11548-020-02269-x
DO - 10.1007/s11548-020-02269-x
M3 - Article
C2 - 33000365
AN - SCOPUS:85091768922
SN - 1861-6410
VL - 15
SP - 2079
EP - 2088
JO - International Journal of Computer Assisted Radiology and Surgery
JF - International Journal of Computer Assisted Radiology and Surgery
IS - 12
ER -