2022
Reinke, Annika; Maier-Hein, Lena; Christodoulou, Evangelia; Glocker, Ben; Scholz, Patrick; Isensee, Fabian; Kleesiek, Jens; Kozubek, Michal; Reyes, Mauricio; Riegler, Michael Alexander; Wiesenfarth, Manuel; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Kavur, Ali Emre; Rädsch, Tim; Tizabi, Minu D.; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Bankhead, Peter; Benis, Arriel; Cardoso, M. Jorge; Cheplygina, Veronika; Cimini, Beth A; Collins, Gary S.; Farahani, Keyvan; Ginneken, Bram; Hamprecht, Fred A; Hashimoto, Daniel A.; Hoffman, Michael M.; Huisman, Merel; Jannin, Pierre; Kahn, Charles; Karargyris, Alexandros; Karthikesalingam, Alan; Kenngott, Hannes; Kopp-Schneider, Annette; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A.; Litjens, Geert; Madani, Amin; Maier-Hein, Klaus; Martel, Anne; Mattson, Peter; Meijering, Erik; Menze, Bjoern; Moher, David; Moons, Karel G. M.; Müller, Henning; Nichyporuk, Brennan; Nickel, Felix; Petersen, Jens; Rajpoot, Nasir; Rieke, Nicola; Saez-Rodriguez, Julio; Sánchez, Clara I.; Shetty, Shravya; Smeden, Maarten; Sudre, Carole H.; Summers, Ronald M.; Taha, Abdel A.; Tsaftaris, Sotirios A.; Calster, Ben Van; Varoquaux, Gael; Jaeger, Paul F
Metrics Reloaded - A new recommendation framework for biomedical image analysis validation Journal Article
In: Medical Imaging with Deep Learning, 2022.
Abstract | BibTeX | Tags: Classification, Instance Segmentation, Medical Imaging, Metrics, Object Detection, Validation
@article{Reinke2022,
title = {Metrics Reloaded - A new recommendation framework for biomedical image analysis validation},
author = {Annika Reinke and Lena Maier-Hein and Evangelia Christodoulou and Ben Glocker and Patrick Scholz and Fabian Isensee and Jens Kleesiek and Michal Kozubek and Mauricio Reyes and Michael Alexander Riegler and Manuel Wiesenfarth and Michael Baumgartner and Matthias Eisenmann and Doreen Heckmann-Nötzel and Ali Emre Kavur and Tim Rädsch and Minu D. Tizabi and Laura Acion and Michela Antonelli and Tal Arbel and Spyridon Bakas and Peter Bankhead and Arriel Benis and M. Jorge Cardoso and Veronika Cheplygina and Beth A Cimini and Gary S. Collins and Keyvan Farahani and Bram Ginneken and Fred A Hamprecht and Daniel A. Hashimoto and Michael M. Hoffman and Merel Huisman and Pierre Jannin and Charles Kahn and Alexandros Karargyris and Alan Karthikesalingam and Hannes Kenngott and Annette Kopp-Schneider and Anna Kreshuk and Tahsin Kurc and Bennett A. Landman and Geert Litjens and Amin Madani and Klaus Maier-Hein and Anne Martel and Peter Mattson and Erik Meijering and Bjoern Menze and David Moher and Karel G. M. Moons and Henning Müller and Brennan Nichyporuk and Felix Nickel and Jens Petersen and Nasir Rajpoot and Nicola Rieke and Julio Saez-Rodriguez and Clara I. Sánchez and Shravya Shetty and Maarten Smeden and Carole H. Sudre and Ronald M. Summers and Abdel A. Taha and Sotirios A. Tsaftaris and Ben Van Calster and Gael Varoquaux and Paul F Jaeger},
year = {2022},
date = {2022-01-01},
journal = {Medical Imaging with Deep Learning},
abstract = {Meaningful performance assessment of biomedical image analysis algorithms depends on objective and appropriate performance metrics. There are major shortcomings in the current state of the art. Yet, so far limited attention has been paid to practical pitfalls associated when using particular metrics for image analysis tasks. Therefore, a number of international initiatives have collaborated to offer researchers with guidance and tools for selecting performance metrics in a problem-aware manner. In our proposed framework, the characteristics of the given biomedical problem are first captured in a problem fingerprint, which identifies properties related to domain interests, the target structure(s), the input datasets, and algorithm output. A problem category-specific mapping is applied in the second step to match fingerprints to metrics that reflect domain requirements. Based on input from experts from more than 60 institutions worldwide, we believe our metric recommendation framework to be useful to the MIDL community and to enhance the quality of biomedical image analysis algorithm validation.},
keywords = {Classification, Instance Segmentation, Medical Imaging, Metrics, Object Detection, Validation},
pubstate = {published},
tppubtype = {article}
}
2020
Klein, Geoff; Martel, Anne; Sahgal, Arjun; Whyne, Cari; Hardisty, Michael
Metastatic Vertebrae Segmentation for Use in a Clinical Pipeline Workshop
Computational Methods and Clinical Applications for Spine Imaging, vol. 11963, Lecture Notes in Computer Science International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging Springer, 2020.
Abstract | Links | BibTeX | Tags: Medical Imaging
@workshop{Klein2020,
title = {Metastatic Vertebrae Segmentation for Use in a Clinical Pipeline},
author = {Geoff Klein and Anne Martel and Arjun Sahgal and Cari Whyne and Michael Hardisty},
doi = {https://doi.org/10.1007/978-3-030-39752-4_2},
year = {2020},
date = {2020-02-01},
urldate = {2020-02-01},
booktitle = {Computational Methods and Clinical Applications for Spine Imaging},
volume = {11963},
pages = {15-28},
publisher = {Springer},
organization = {International Workshop and Challenge on Computational Methods and Clinical Applications for Spine Imaging},
series = {Lecture Notes in Computer Science},
abstract = {Vertebral metastases are common complications of primary cancers that alter bone architecture potentially leading to vertebral fracture and neurological compromise. Quantitative measures from vertebral body segmentation from Computed Tomography (CT) scans have been useful for assessing fracture risk predictions and vertebrae stability. Previous segmentation methods used to generate these metrics were slow and required manual intervention, limiting their utility. More accurate, robust and fast methods are needed for clinical assessments. This investigation proposes a 3D U-Net Convolutional Neural Network (CNN) to accurately segment individual trabecular centrum from metastatically compromised vertebrae of interest in CT imaging. Using different augmentation techniques achieved good performance (DSC = 0.904 ± 0.056) with the segmentation model remaining accurate with simulated lower image quality, and translation of the vertebrae within the image, especially compared to when no augmentations were used (DSC = 0.774 ± 0.188). Integration of this method into a clinical tool will allow accurate and robust quantitative assessment of mechanical stability, aiding clinical decision making to improve patient care.},
keywords = {Medical Imaging},
pubstate = {published},
tppubtype = {workshop}
}
Chen, Jianan; Amemiya, Yutaka; Kuling, Grey; Fashandi, Homa; Yerofeyeva, Yulia; Hussein, Heba; Slodkowska, Elzbieta; Ginty, Fiona; Seth, Arun; Yaffe, Martin; Martel, Anne L.
Cancer Research, vol. 80, Abstracts: 2019 San Antonio Breast Cancer Symposium; December 10-14, 2019; San Antonio, Texas American Association for Cancer Research , 2020.
Abstract | Links | BibTeX | Tags: Medical Imaging
@conference{Chen2020,
title = {Texture heterogeneity of breast tumour in magnetic resonance imaging can be explained by differentially regulated genes},
author = {Jianan Chen and Yutaka Amemiya and Grey Kuling and Homa Fashandi and Yulia Yerofeyeva and Heba Hussein and Elzbieta Slodkowska and Fiona Ginty and Arun Seth and Martin Yaffe and Anne L. Martel},
doi = {10.1158/1538-7445.SABCS19-P6-10-12},
year = {2020},
date = {2020-02-01},
urldate = {2020-02-01},
booktitle = {Cancer Research},
volume = {80},
pages = {6-10-12},
publisher = {American Association for Cancer Research },
organization = {Abstracts: 2019 San Antonio Breast Cancer Symposium; December 10-14, 2019; San Antonio, Texas},
abstract = {Background: Magnetic resonance imaging (MRI) and molecular profiling of tumour tissues have become standard techniques to study breast cancer in recent years. However, despite the myriad imaging and genetic subtypes that have been identified, the underlying biological mechanisms of MRI features are seldom explained, and differentially regulated genes are rarely linked to the phenotypic appearance of tumours. In this study, we propose to fill this gap in knowledge by investigating the unbiased correlations between MRI phenotypes and differential gene expressions in breast cancer.
Methods: Patients diagnosed during 2002-15 with invasive breast cancer who went through surgery were retrospectively reviewed for magnetic resonance imaging (MRI) and genomics analysis. In total, we collected dynamic contrast-enhanced subtraction MRI and RNA sequencing results of surgical specimens from a cohort of 56 patients. Of these, 31 patients (aged 33 to 72 years) met our inclusion criteria. Tumour lesion segmentation was performed by a radiologist who has 10 years of experience. We extracted features that quantitatively describe tumour appearance from the segmented lesions using pyradiomics (v2.0.0). We then grouped the tumours into two imaging subtypes using an unsupervised clustering approach (SIMLR, v1.10.0). To probe the underlying biological mechanisms behind the difference in tumour appearance, we performed differential expression analysis (edgeR, v3.26.5) and pathway enrichment analysis (g:profiler) between the two imaging subtypes. Multiple testing correction was conducted with Benjamini-Hochberg correction using a false discovery rate of 0.05.
Results: We classified the breast tumours from our cohort into two imaging subtypes that have distinct levels of heterogeneity in texture (p=0.004). We found a list of genes that were significantly differentially expressed between the heterogenous (n=20) and homogenous (n=11) subtypes (Table 1), and their associated biological pathways. We found that the pathways controlling cell growth (p=0.022), cell migration and invasion (p=0.023), estrogen regulation (p=0.022) and DNA damage repair (p=0.015) mechanisms may have contributed to increased heterogeneity in tumour presentation when imaged with MRI.
Conclusion: The underlying biological mechanisms affecting breast MRI texture can be investigated by linking tumour appearance to gene expression profiling. Our results suggest that texture heterogeneity in breast MRI could be linked to a number of differentially expressed genes that may be further investigated as a biomarker of cancer risk assessment or recurrence. Further studies with a larger cohort will be conducted to validate and extend these results.},
keywords = {Medical Imaging},
pubstate = {published},
tppubtype = {conference}
}
Methods: Patients diagnosed during 2002-15 with invasive breast cancer who went through surgery were retrospectively reviewed for magnetic resonance imaging (MRI) and genomics analysis. In total, we collected dynamic contrast-enhanced subtraction MRI and RNA sequencing results of surgical specimens from a cohort of 56 patients. Of these, 31 patients (aged 33 to 72 years) met our inclusion criteria. Tumour lesion segmentation was performed by a radiologist who has 10 years of experience. We extracted features that quantitatively describe tumour appearance from the segmented lesions using pyradiomics (v2.0.0). We then grouped the tumours into two imaging subtypes using an unsupervised clustering approach (SIMLR, v1.10.0). To probe the underlying biological mechanisms behind the difference in tumour appearance, we performed differential expression analysis (edgeR, v3.26.5) and pathway enrichment analysis (g:profiler) between the two imaging subtypes. Multiple testing correction was conducted with Benjamini-Hochberg correction using a false discovery rate of 0.05.
Results: We classified the breast tumours from our cohort into two imaging subtypes that have distinct levels of heterogeneity in texture (p=0.004). We found a list of genes that were significantly differentially expressed between the heterogenous (n=20) and homogenous (n=11) subtypes (Table 1), and their associated biological pathways. We found that the pathways controlling cell growth (p=0.022), cell migration and invasion (p=0.023), estrogen regulation (p=0.022) and DNA damage repair (p=0.015) mechanisms may have contributed to increased heterogeneity in tumour presentation when imaged with MRI.
Conclusion: The underlying biological mechanisms affecting breast MRI texture can be investigated by linking tumour appearance to gene expression profiling. Our results suggest that texture heterogeneity in breast MRI could be linked to a number of differentially expressed genes that may be further investigated as a biomarker of cancer risk assessment or recurrence. Further studies with a larger cohort will be conducted to validate and extend these results.
Lin, Peter; Martel, Anne; Camilleri, Susan; Pop, Mihaela
Co-registered Cardiac ex vivo DT Images and Histological Images for Fibrosis Quantification Workshop
International Workshop on Statistical Atlases and Computational Models of the Heart (STACOM 2019) - MICCAI 2019, Springer, 2020.
Abstract | BibTeX | Tags: Medical Imaging
@workshop{Lin2020,
title = {Co-registered Cardiac ex vivo DT Images and Histological Images for Fibrosis Quantification},
author = {Lin, Peter and Martel, Anne and Camilleri, Susan and Pop, Mihaela},
year = {2020},
date = {2020-01-23},
booktitle = {International Workshop on Statistical Atlases and Computational Models of the Heart (STACOM 2019) - MICCAI 2019},
pages = {3-11},
publisher = {Springer},
abstract = {Cardiac magnetic resonance (MR) imaging can detect infarct scar, a major cause of lethal arrhythmia and heart failure. Here, we describe a robust image processing pipeline developed to quantitatively analyze collagen density and features in a pig model of chronic fibrosis. Specifically, we use ex vivo diffusion tensor imaging (DTI) ( 0.6×0.6×1.2 mm resolution) to calculate fractional anisotropy maps in: healthy tissue, infarct core (IC) and gray zone (GZ) (i.e., a mixture of viable myocytes and collagen fibrils bordering IC and healthy zones). The 3 zones were validated using collagen-sensitive histological slides co-registered with MR images. Our results showed a significant ( p<0.05 ) reduction in the mean FA values of GZ (by 17%) and IC (by 44%) compared to healthy areas; however, we found that these differences do not depend on the location of occluded coronary artery (LAD vs LCX). This work validates the utility of DTI-MR imaging for fibrosis quantification, with histological validation.},
keywords = {Medical Imaging},
pubstate = {published},
tppubtype = {workshop}
}
2019
Balki, Indranil; Amirabadi, Afsaneh; Levman, Jacob; Martel, Anne L; Emersic, Ziga; Meden, Blaz; Garcia-Pedrero, Angel; Ramirez, Saul C; Kong, Dehan; Moody, Alan R; Tyrrell, Pascal N
Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review Journal Article
In: Canadian Association of Radiologists Journal, vol. 70, no. 4, pp. 344–353, 2019, ISSN: 0846-5371.
Abstract | Links | BibTeX | Tags: Machine learning, Medical Imaging, Radiology, Sample size
@article{Balki2019b,
title = {Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review},
author = {Indranil Balki and Afsaneh Amirabadi and Jacob Levman and Anne L Martel and Ziga Emersic and Blaz Meden and Angel Garcia-Pedrero and Saul C Ramirez and Dehan Kong and Alan R Moody and Pascal N Tyrrell},
url = {https://doi.org/10.1016/j.carj.2019.06.002 http://journals.sagepub.com/doi/10.1016/j.carj.2019.06.002},
doi = {10.1016/j.carj.2019.06.002},
issn = {0846-5371},
year = {2019},
date = {2019-11-01},
journal = {Canadian Association of Radiologists Journal},
volume = {70},
number = {4},
pages = {344–353},
publisher = {Elsevier Inc.},
abstract = {Purpose: The required training sample size for a particular machine learning (ML) model applied to medical imaging data is often unknown. The purpose of this study was to provide a descriptive review of current sample-size determination methodologies in ML applied to medical imaging and to propose recommendations for future work in the field. Methods: We conducted a systematic literature search of articles using Medline and Embase with keywords including “machine learning,” “image,” and “sample size.” The search included articles published between 1946 and 2018. Data regarding the ML task, sample size, and train-test pipeline were collected. Results: A total of 167 articles were identified, of which 22 were included for qualitative analysis. There were only 4 studies that discussed sample-size determination methodologies, and 18 that tested the effect of sample size on model performance as part of an exploratory analysis. The observed methods could be categorized as pre hoc model-based approaches, which relied on features of the algorithm, or post hoc curve-fitting approaches requiring empirical testing to model and extrapolate algorithm performance as a function of sample size. Between studies, we observed great variability in performance testing procedures used for curve-fitting, model assessment methods, and reporting of confidence in sample sizes. Conclusions: Our study highlights the scarcity of research in training set size determination methodologies applied to ML in medical imaging, emphasizes the need to standardize current reporting practices, and guides future work in development and streamlining of pre hoc and post hoc sample size approaches.},
keywords = {Machine learning, Medical Imaging, Radiology, Sample size},
pubstate = {published},
tppubtype = {article}
}
Chen, Jianan; Milot, Laurent; Cheung, Helen MC; Martel, Anne L
Unsupervised Clustering of Quantitative Imaging Phenotypes Using Autoencoder and Gaussian Mixture Model Conference
International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 11767, 2019.
Abstract | BibTeX | Tags: Medical Imaging
@conference{Chen2019,
title = {Unsupervised Clustering of Quantitative Imaging Phenotypes Using Autoencoder and Gaussian Mixture Model},
author = {Jianan Chen and Laurent Milot and Helen MC Cheung and Anne L Martel},
year = {2019},
date = {2019-10-13},
urldate = {2019-10-13},
booktitle = {International Conference on Medical Image Computing and Computer-Assisted Intervention},
volume = {11767},
pages = {575-582},
abstract = {Quantitative medical image computing (radiomics) has been widely applied to build prediction models from medical images. However, overfitting is a significant issue in conventional radiomics, where a large number of radiomic features are directly used to train and test models that predict genotypes or clinical outcomes. In order to tackle this problem, we propose an unsupervised learning pipeline composed of an autoencoder for representation learning of radiomic features and a Gaussian mixture model based on minimum message length criterion for clustering. By incorporating probabilistic modeling, disease heterogeneity has been taken into account. The performance of the proposed pipeline was evaluated on an institutional MRI cohort of 108 patients with colorectal cancer liver metastases. Our approach is capable of automatically selecting the optimal number of clusters and assigns patients into clusters (imaging subtypes) with significantly different survival rates. Our method outperforms other unsupervised clustering methods that have been used for radiomics analysis and has comparable performance to a state-of-the-art imaging biomarker.},
keywords = {Medical Imaging},
pubstate = {published},
tppubtype = {conference}
}
Annika Reinke Lena Maier-Hein, Michal Kozubek; Martel, Anne L.; Matthias Eisenmann Tal Arbel, Allan Hanbuary
BIAS: Transparent reporting of biomedical image analysis challenges Journal Article
In: arXiv preprint arXiv:1910.04071, 2019.
Abstract | BibTeX | Tags: Medical Imaging
@article{Maier-Hein2019,
title = {BIAS: Transparent reporting of biomedical image analysis challenges},
author = {Lena Maier-Hein, Annika Reinke, Michal Kozubek and Anne L. Martel and Tal Arbel, Matthias Eisenmann, Allan Hanbuary, Pierre Jannin, Henning Müller, Sinan Onogur, Julio Saez-Rodriguez, Bram van Ginneken, Annette Kopp-Schneider, Bennett Landman},
year = {2019},
date = {2019-10-09},
urldate = {2019-10-09},
journal = {arXiv preprint arXiv:1910.04071},
abstract = {The number of biomedical image analysis challenges organized per year is steadily increasing. These international competitions have the purpose of benchmarking algorithms on common data sets, typically to identify the best method for a given problem. Recent research, however, revealed that common practice related to challenge reporting does not allow for adequate interpretation and reproducibility of results. To address the discrepancy between the impact of challenges and the quality (control), the Biomedical I mage Analysis ChallengeS (BIAS) initiative developed a set of recommendations for the reporting of challenges. The BIAS statement aims to improve the transparency of the reporting of a biomedical image analysis challenge regardless of field of application, image modality or task category assessed. This article describes how the BIAS statement was developed and presents a checklist which authors of biomedical image analysis challenges are encouraged to include in their submission when giving a paper on a challenge into review. The purpose of the checklist is to standardize and facilitate the review process and raise interpretability and reproducibility of challenge results by making relevant information explicit.},
keywords = {Medical Imaging},
pubstate = {published},
tppubtype = {article}
}
Balki, Indranil; Amirabadi, Afsaneh; Levman, Jacob; Martel, Anne L; Emersic, Ziga; Meden, Blaz; Garcia-Pedrero, Angel; Ramirez, Saul C; Kong, Dehan; Moody, Alan R; Tyrrell, Pascal N
Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review Journal Article
In: Canadian Association of Radiologists Journal, vol. 70, no. 4, pp. 344-353, 2019.
Abstract | BibTeX | Tags: Medical Imaging
@article{Balki2019,
title = {Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review},
author = {Indranil Balki and Afsaneh Amirabadi and Jacob Levman and Anne L Martel and Ziga Emersic and Blaz Meden and Angel Garcia-Pedrero and Saul C Ramirez and Dehan Kong and Alan R Moody and Pascal N Tyrrell},
year = {2019},
date = {2019-09-12},
urldate = {2019-09-12},
journal = {Canadian Association of Radiologists Journal},
volume = {70},
number = {4},
pages = {344-353},
abstract = {Purpose
The required training sample size for a particular machine learning (ML) model applied to medical imaging data is often unknown. The purpose of this study was to provide a descriptive review of current sample-size determination methodologies in ML applied to medical imaging and to propose recommendations for future work in the field.
Methods
We conducted a systematic literature search of articles using Medline and Embase with keywords including “machine learning,” “image,” and “sample size.” The search included articles published between 1946 and 2018. Data regarding the ML task, sample size, and train-test pipeline were collected.
Results
A total of 167 articles were identified, of which 22 were included for qualitative analysis. There were only 4 studies that discussed sample-size determination methodologies, and 18 that tested the effect of sample size on model performance as part of an exploratory analysis. The observed methods could be categorized as pre hoc model-based approaches, which relied on features of the algorithm, or post hoc curve-fitting approaches requiring empirical testing to model and extrapolate algorithm performance as a function of sample size. Between studies, we observed great variability in performance testing procedures used for curve-fitting, model assessment methods, and reporting of confidence in sample sizes.
Conclusions
Our study highlights the scarcity of research in training set size determination methodologies applied to ML in medical imaging, emphasizes the need to standardize current reporting practices, and guides future work in development and streamlining of pre hoc and post hoc sample size approaches.},
keywords = {Medical Imaging},
pubstate = {published},
tppubtype = {article}
}
The required training sample size for a particular machine learning (ML) model applied to medical imaging data is often unknown. The purpose of this study was to provide a descriptive review of current sample-size determination methodologies in ML applied to medical imaging and to propose recommendations for future work in the field.
Methods
We conducted a systematic literature search of articles using Medline and Embase with keywords including “machine learning,” “image,” and “sample size.” The search included articles published between 1946 and 2018. Data regarding the ML task, sample size, and train-test pipeline were collected.
Results
A total of 167 articles were identified, of which 22 were included for qualitative analysis. There were only 4 studies that discussed sample-size determination methodologies, and 18 that tested the effect of sample size on model performance as part of an exploratory analysis. The observed methods could be categorized as pre hoc model-based approaches, which relied on features of the algorithm, or post hoc curve-fitting approaches requiring empirical testing to model and extrapolate algorithm performance as a function of sample size. Between studies, we observed great variability in performance testing procedures used for curve-fitting, model assessment methods, and reporting of confidence in sample sizes.
Conclusions
Our study highlights the scarcity of research in training set size determination methodologies applied to ML in medical imaging, emphasizes the need to standardize current reporting practices, and guides future work in development and streamlining of pre hoc and post hoc sample size approaches.