Development of a measure for evaluating lesion-wise performance of CAD algorithms in the context of mpMRI detection of prostate cancer

Ethan Leng, Benjamin D Spilseth, Lin Zhang, Jin Jin, Joe Koopmeiners, Greg Metzger

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Purpose: Computer-aided detection/diagnosis (CAD) of prostate cancer (PCa) on multiparametric MRI (mpMRI) is an active area of research. In the literature, the performance of predictive models trained to detect PCa on mpMRI has typically been reported in terms of voxel-wise measures such as sensitivity and specificity and/or area under the receiver operating curve (AUC). However, it is unclear whether models that score higher by these measures are actually superior. Here, we propose a novel method for lesion identification as well as novel measures that assess the quality of the detected lesions. Methods: A total of 46 axial MRI slices of interest from 34 patients and the associated histopathologic ground truths were used to develop and to characterize the proposed measures. The proposed lesion-wise score s is based on the Jaccard similarity index with modifications that emphasize the overlap and colocalization of predicted lesions with ground truth lesions. Thresholding of s allowed for the sensitivity and specificity of lesion detection to be assessed, while the proposed lesion-summary score sσ is a weighted average of ss that provides a single summary statistic of lesion detection performance. The proposed measures were used to compare the lesion detection performance of a predictive model vs that of a radiologist on the same data set. The measures were also used to evaluate the degree to which viewing the cancer prediction improved diagnostic accuracy. Results: The lesion-wise score qualitatively reflected the goodness of predicted lesions over a wide range of values (s = 0.1 to s = 0.8) and was found to encompass a larger range of values than the Dice coefficient did over the same range of prediction qualities (0–0.9 vs 0–0.75). The lesion-summary score was shown to vary linearly with voxel-wise sensitivity and quadratically with voxel-wise specificity and correlated well with voxel-wise AUC (ρ = 0.68) and the Dice coefficient (ρ = 0.88). Radiologist performance was found to be significantly improved after viewing the model-generated cancer prediction maps as quantified by both sσ (P = 0.01) and DSC (P = 0.04), with improvements in both lesion detection sensitivity and specificity. Conclusion: The proposed measures allow for the assessment of lesion detection performance, which is most relevant in a clinical setting and would not be possible to do with voxel-wise measures alone.

Original languageEnglish (US)
Pages (from-to)2076-2088
Number of pages13
JournalMedical Physics
Volume45
Issue number5
DOIs
StatePublished - May 1 2018

Fingerprint

Prostatic Neoplasms
Sensitivity and Specificity
Area Under Curve
Neoplasms
Research
Radiologists
Datasets

Keywords

  • CAD
  • MRI
  • computer-aided detection and diagnosis (CAD)
  • observer performance
  • performance assessment

Cite this

Development of a measure for evaluating lesion-wise performance of CAD algorithms in the context of mpMRI detection of prostate cancer. / Leng, Ethan; Spilseth, Benjamin D; Zhang, Lin; Jin, Jin; Koopmeiners, Joe; Metzger, Greg.

In: Medical Physics, Vol. 45, No. 5, 01.05.2018, p. 2076-2088.

Research output: Contribution to journalArticle

@article{1023b09a30114ac0922036b2e17f8cb3,
title = "Development of a measure for evaluating lesion-wise performance of CAD algorithms in the context of mpMRI detection of prostate cancer",
abstract = "Purpose: Computer-aided detection/diagnosis (CAD) of prostate cancer (PCa) on multiparametric MRI (mpMRI) is an active area of research. In the literature, the performance of predictive models trained to detect PCa on mpMRI has typically been reported in terms of voxel-wise measures such as sensitivity and specificity and/or area under the receiver operating curve (AUC). However, it is unclear whether models that score higher by these measures are actually superior. Here, we propose a novel method for lesion identification as well as novel measures that assess the quality of the detected lesions. Methods: A total of 46 axial MRI slices of interest from 34 patients and the associated histopathologic ground truths were used to develop and to characterize the proposed measures. The proposed lesion-wise score sℓ is based on the Jaccard similarity index with modifications that emphasize the overlap and colocalization of predicted lesions with ground truth lesions. Thresholding of sℓ allowed for the sensitivity and specificity of lesion detection to be assessed, while the proposed lesion-summary score sσ is a weighted average of sℓs that provides a single summary statistic of lesion detection performance. The proposed measures were used to compare the lesion detection performance of a predictive model vs that of a radiologist on the same data set. The measures were also used to evaluate the degree to which viewing the cancer prediction improved diagnostic accuracy. Results: The lesion-wise score qualitatively reflected the goodness of predicted lesions over a wide range of values (sℓ = 0.1 to sℓ = 0.8) and was found to encompass a larger range of values than the Dice coefficient did over the same range of prediction qualities (0–0.9 vs 0–0.75). The lesion-summary score was shown to vary linearly with voxel-wise sensitivity and quadratically with voxel-wise specificity and correlated well with voxel-wise AUC (ρ = 0.68) and the Dice coefficient (ρ = 0.88). Radiologist performance was found to be significantly improved after viewing the model-generated cancer prediction maps as quantified by both sσ (P = 0.01) and DSC (P = 0.04), with improvements in both lesion detection sensitivity and specificity. Conclusion: The proposed measures allow for the assessment of lesion detection performance, which is most relevant in a clinical setting and would not be possible to do with voxel-wise measures alone.",
keywords = "CAD, MRI, computer-aided detection and diagnosis (CAD), observer performance, performance assessment",
author = "Ethan Leng and Spilseth, {Benjamin D} and Lin Zhang and Jin Jin and Joe Koopmeiners and Greg Metzger",
year = "2018",
month = "5",
day = "1",
doi = "10.1002/mp.12861",
language = "English (US)",
volume = "45",
pages = "2076--2088",
journal = "Medical Physics",
issn = "0094-2405",
publisher = "AAPM - American Association of Physicists in Medicine",
number = "5",

}

TY - JOUR

T1 - Development of a measure for evaluating lesion-wise performance of CAD algorithms in the context of mpMRI detection of prostate cancer

AU - Leng, Ethan

AU - Spilseth, Benjamin D

AU - Zhang, Lin

AU - Jin, Jin

AU - Koopmeiners, Joe

AU - Metzger, Greg

PY - 2018/5/1

Y1 - 2018/5/1

N2 - Purpose: Computer-aided detection/diagnosis (CAD) of prostate cancer (PCa) on multiparametric MRI (mpMRI) is an active area of research. In the literature, the performance of predictive models trained to detect PCa on mpMRI has typically been reported in terms of voxel-wise measures such as sensitivity and specificity and/or area under the receiver operating curve (AUC). However, it is unclear whether models that score higher by these measures are actually superior. Here, we propose a novel method for lesion identification as well as novel measures that assess the quality of the detected lesions. Methods: A total of 46 axial MRI slices of interest from 34 patients and the associated histopathologic ground truths were used to develop and to characterize the proposed measures. The proposed lesion-wise score sℓ is based on the Jaccard similarity index with modifications that emphasize the overlap and colocalization of predicted lesions with ground truth lesions. Thresholding of sℓ allowed for the sensitivity and specificity of lesion detection to be assessed, while the proposed lesion-summary score sσ is a weighted average of sℓs that provides a single summary statistic of lesion detection performance. The proposed measures were used to compare the lesion detection performance of a predictive model vs that of a radiologist on the same data set. The measures were also used to evaluate the degree to which viewing the cancer prediction improved diagnostic accuracy. Results: The lesion-wise score qualitatively reflected the goodness of predicted lesions over a wide range of values (sℓ = 0.1 to sℓ = 0.8) and was found to encompass a larger range of values than the Dice coefficient did over the same range of prediction qualities (0–0.9 vs 0–0.75). The lesion-summary score was shown to vary linearly with voxel-wise sensitivity and quadratically with voxel-wise specificity and correlated well with voxel-wise AUC (ρ = 0.68) and the Dice coefficient (ρ = 0.88). Radiologist performance was found to be significantly improved after viewing the model-generated cancer prediction maps as quantified by both sσ (P = 0.01) and DSC (P = 0.04), with improvements in both lesion detection sensitivity and specificity. Conclusion: The proposed measures allow for the assessment of lesion detection performance, which is most relevant in a clinical setting and would not be possible to do with voxel-wise measures alone.

AB - Purpose: Computer-aided detection/diagnosis (CAD) of prostate cancer (PCa) on multiparametric MRI (mpMRI) is an active area of research. In the literature, the performance of predictive models trained to detect PCa on mpMRI has typically been reported in terms of voxel-wise measures such as sensitivity and specificity and/or area under the receiver operating curve (AUC). However, it is unclear whether models that score higher by these measures are actually superior. Here, we propose a novel method for lesion identification as well as novel measures that assess the quality of the detected lesions. Methods: A total of 46 axial MRI slices of interest from 34 patients and the associated histopathologic ground truths were used to develop and to characterize the proposed measures. The proposed lesion-wise score sℓ is based on the Jaccard similarity index with modifications that emphasize the overlap and colocalization of predicted lesions with ground truth lesions. Thresholding of sℓ allowed for the sensitivity and specificity of lesion detection to be assessed, while the proposed lesion-summary score sσ is a weighted average of sℓs that provides a single summary statistic of lesion detection performance. The proposed measures were used to compare the lesion detection performance of a predictive model vs that of a radiologist on the same data set. The measures were also used to evaluate the degree to which viewing the cancer prediction improved diagnostic accuracy. Results: The lesion-wise score qualitatively reflected the goodness of predicted lesions over a wide range of values (sℓ = 0.1 to sℓ = 0.8) and was found to encompass a larger range of values than the Dice coefficient did over the same range of prediction qualities (0–0.9 vs 0–0.75). The lesion-summary score was shown to vary linearly with voxel-wise sensitivity and quadratically with voxel-wise specificity and correlated well with voxel-wise AUC (ρ = 0.68) and the Dice coefficient (ρ = 0.88). Radiologist performance was found to be significantly improved after viewing the model-generated cancer prediction maps as quantified by both sσ (P = 0.01) and DSC (P = 0.04), with improvements in both lesion detection sensitivity and specificity. Conclusion: The proposed measures allow for the assessment of lesion detection performance, which is most relevant in a clinical setting and would not be possible to do with voxel-wise measures alone.

KW - CAD

KW - MRI

KW - computer-aided detection and diagnosis (CAD)

KW - observer performance

KW - performance assessment

UR - http://www.scopus.com/inward/record.url?scp=85045852812&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045852812&partnerID=8YFLogxK

U2 - 10.1002/mp.12861

DO - 10.1002/mp.12861

M3 - Article

VL - 45

SP - 2076

EP - 2088

JO - Medical Physics

JF - Medical Physics

SN - 0094-2405

IS - 5

ER -