2021
Ma, Jun; Chen, Jianan; Ng, Matthew; Huang, Rui; Li, Yu; Li, Chen; Yang, Xiaoping; Martel, Anne L
Loss odyssey in medical image segmentation Journal Article
In: Medical Image Analysis, vol. 71, pp. 102035, 2021, ISSN: 13618415.
Links | BibTeX | Tags: convolutional neural networks, loss function, segmentation
@article{Ma2021,
title = {Loss odyssey in medical image segmentation},
author = {Jun Ma and Jianan Chen and Matthew Ng and Rui Huang and Yu Li and Chen Li and Xiaoping Yang and Anne L Martel},
url = {https://linkinghub.elsevier.com/retrieve/pii/S1361841521000815},
doi = {10.1016/j.media.2021.102035},
issn = {13618415},
year = {2021},
date = {2021-07-01},
journal = {Medical Image Analysis},
volume = {71},
pages = {102035},
keywords = {convolutional neural networks, loss function, segmentation},
pubstate = {published},
tppubtype = {article}
}
Reinke, Annika; Eisenmann, Matthias; Tizabi, Minu Dietlinde; Sudre, Carole H.; Rädsch, Tim; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Cardoso, M. Jorge; Cheplygina, Veronika; Farahani, Keyvan; Glocker, Ben; Heckmann-Nötzel, Doreen; Isensee, Fabian; Jannin, Pierre; Kahn, Charles; Kleesiek, Jens; Kurc, Tahsin; Kozubek, Michal; Landman, Bennett A.; Litjens, Geert; Maier-Hein, Klaus; Martel, Anne L; Menze, Bjoern; Müller, Henning; Petersen, Jens; Reyes, Mauricio; Rieke, Nicola; Stieltjes, Bram; Summers, Ronald M.; Tsaftaris, Sotirios A.; Ginneken, Bram; Kopp-Schneider, Annette; Jäger, Paul; Maier-Hein, Lena
Common limitations of performance metrics in biomedical image analysis Proceedings Article
In: MIDL 2021, 2021.
Abstract | Links | BibTeX | Tags: Challenges, Metrics, segmentation, Validation
@inproceedings{Reinke2021,
title = {Common limitations of performance metrics in biomedical image analysis},
author = {Annika Reinke and Matthias Eisenmann and Minu Dietlinde Tizabi and Carole H. Sudre and Tim Rädsch and Michela Antonelli and Tal Arbel and Spyridon Bakas and M. Jorge Cardoso and Veronika Cheplygina and Keyvan Farahani and Ben Glocker and Doreen Heckmann-Nötzel and Fabian Isensee and Pierre Jannin and Charles Kahn and Jens Kleesiek and Tahsin Kurc and Michal Kozubek and Bennett A. Landman and Geert Litjens and Klaus Maier-Hein and Anne L Martel and Bjoern Menze and Henning Müller and Jens Petersen and Mauricio Reyes and Nicola Rieke and Bram Stieltjes and Ronald M. Summers and Sotirios A. Tsaftaris and Bram Ginneken and Annette Kopp-Schneider and Paul Jäger and Lena Maier-Hein},
url = {https://arxiv.org/abs/2104.05642},
year = {2021},
date = {2021-04-01},
urldate = {2021-04-01},
booktitle = {MIDL 2021},
abstract = {While the importance of automatic biomedical image analysis is increasing at an enormous pace, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are key for objective, transparent and comparative performance assessment , but little attention has been given to their pitfalls. Under the umbrella of the Helmholtz Imaging Platform (HIP), three international initiatives-the MICCAI Society's challenge working group, the Biomedical Image Analysis Challenges (BIAS) initiative, as well as the benchmarking working group of the MONAI framework-have now joined forces with the mission to generate best practice recommendations with respect to metrics in medical image analysis. Consensus building is achieved via a Delphi process, a popular tool for integrating opinions in large international consortia. The current document serves as a teaser for the results presentation and focuses on the pitfalls of the most commonly used metric in biomedical image analysis, the Dice Similarity Coefficient (DSC), in the categories of (1) mathematical properties/edge cases, (2) task/metric fit and (3) metric aggregation. Being compiled by a large group of experts from more than 30 institutes worldwide, we believe that our framework could be of general interest to the MIDL community and will improve the quality of biomedical image analysis algorithm validation.},
keywords = {Challenges, Metrics, segmentation, Validation},
pubstate = {published},
tppubtype = {inproceedings}
}
2019
Fashandi, Homa; Kuling, Grey; Lu, YingLi; Wu, Hongbo; Martel, Anne L.
An investigation of the effect of fat suppression and dimensionality on the accuracy of breast MRI segmentation using U-nets Journal Article
In: Medical Physics, 2019, (This is the pre-peer reviewed version. The definitive version is available at: https://aapm.onlinelibrary.wiley.com/doi/abs/10.1002/mp.13375).
Abstract | Links | BibTeX | Tags: _breast_segmentation, Breast MRI, deep learning, segmentation
@article{Fashandi2019,
title = {An investigation of the effect of fat suppression and dimensionality on the accuracy of breast MRI segmentation using U-nets},
author = {Homa Fashandi and Grey Kuling and YingLi Lu and Hongbo Wu and Anne L. Martel},
url = {http://hdl.handle.net/1807/93313},
doi = {10.1002/mp.13375},
year = {2019},
date = {2019-01-04},
urldate = {2019-01-04},
journal = {Medical Physics},
abstract = {Purpose
Accurate segmentation of the breast is required for breast density estimation and the assessment of background parenchymal enhancement, both of which have been shown to be related to breast cancer risk. The MRI breast segmentation task is challenging, and recent work has demonstrated that convolutional neural networks perform well for this task. In this study, we have investigated the performance of several 2D U‐Net and 3D U‐Net configurations using both fat‐suppressed and nonfat suppressed images. We have also assessed the effect of changing the number and quality of the ground truth segmentations.
Materials and methods
We designed 8 studies to investigate the effect of input types and the dimensionality of the U‐Net operations for the breast MRI segmentation. Our training data contained 70 whole breast volumes of T1‐weighted sequences without fat suppression(WOFS) and with fat suppression(FS). For each subject, we registered the WOFS and FS volumes together before manually segmenting the breast to generate ground truth. We compared 4 different input types to the U‐nets: WOFS, FS, MIXED(WOFS and FS images treated as separate samples) and MULTI(WOFS and FS images combined into a single multi‐channel image). We trained 2D U‐Nets and 3D U‐Nets with this data, which resulted in our 8 studies (2D‐WOFS, 3D‐WOFS,2D‐FS,3D‐FS,2D‐MIXED,3D‐MIXED,2D‐MULTI, and 3D‐MULT). For each of these studies, we performed a systematic grid search to tune the hyperparameters of the U‐Nets. A separate validation set with 15 whole breast volumes was used for hyperparameter tuning. We performed Kruskal‐Walis test on the results of our hyperparameter tuning and did not find a statistically significant difference in the 10 top models of each study. For this reason, we chose the best model as the model with the highest mean Dice Similarity Coefficient(DSC) value on the validation set. The reported test results are the results of the top model of each study on our test set which contained 19 whole breast volumes annotated by 3 readers fused with the STAPLE algorithm. We also investigated the effect of the quality of the training annotations and the number of training samples for this task.
Results
The study with the highest average DSC result was 3D‐MULTI with 0.96 ± 0.02. The second highest average is 2D WOFS (0.96 ± 0.03), and the third is 2D MULTI (0.96 ± 0.03). We performed the Kruskal‐Wallis 1‐way ANOVA test with Dunn's multiple comparison tests using Bonferroni p‐value correction on the results of the selected model of each study and found that 3D‐MULTI, 2D‐MULTI, 3D‐WOFS, 2D‐WOFS, 2D‐FS, and 3D‐FS were not statistically different in their distributions, which indicates that comparable results could be obtained in fat‐suppressed and nonfat suppressed volumes and that there is no significant difference between the 3D and 2D approach. Our results also suggested that the networks trained on single sequence images or multiple sequence images organized in multi‐channel images perform better than the models trained on a mixture of volumes from different sequences. Our investigation of the size of the training set revealed that training a U‐Net in this domain only requires a modest amount of training data and results obtained with 49 and 70 training datasets were not significantly different.
Conclusions
To summarize, we investigated the use of 2D U‐Nets and 3D U‐Nets for breast volume segmentation in T1 fat suppressed and without fat suppressed volumes. Although our highest score was obtained in the 3D MULTI study, when we took advantage of information in both fat suppressed and non fat suppressed volumes and their 3D structure, all of the methods we explored gave accurate segmentations with an average DSC on > 94% demonstrating that the U‐Net is a robust segmentation method for breast MRI volumes.},
note = {This is the pre-peer reviewed version. The definitive version is available at: https://aapm.onlinelibrary.wiley.com/doi/abs/10.1002/mp.13375},
keywords = {_breast_segmentation, Breast MRI, deep learning, segmentation},
pubstate = {published},
tppubtype = {article}
}
Accurate segmentation of the breast is required for breast density estimation and the assessment of background parenchymal enhancement, both of which have been shown to be related to breast cancer risk. The MRI breast segmentation task is challenging, and recent work has demonstrated that convolutional neural networks perform well for this task. In this study, we have investigated the performance of several 2D U‐Net and 3D U‐Net configurations using both fat‐suppressed and nonfat suppressed images. We have also assessed the effect of changing the number and quality of the ground truth segmentations.
Materials and methods
We designed 8 studies to investigate the effect of input types and the dimensionality of the U‐Net operations for the breast MRI segmentation. Our training data contained 70 whole breast volumes of T1‐weighted sequences without fat suppression(WOFS) and with fat suppression(FS). For each subject, we registered the WOFS and FS volumes together before manually segmenting the breast to generate ground truth. We compared 4 different input types to the U‐nets: WOFS, FS, MIXED(WOFS and FS images treated as separate samples) and MULTI(WOFS and FS images combined into a single multi‐channel image). We trained 2D U‐Nets and 3D U‐Nets with this data, which resulted in our 8 studies (2D‐WOFS, 3D‐WOFS,2D‐FS,3D‐FS,2D‐MIXED,3D‐MIXED,2D‐MULTI, and 3D‐MULT). For each of these studies, we performed a systematic grid search to tune the hyperparameters of the U‐Nets. A separate validation set with 15 whole breast volumes was used for hyperparameter tuning. We performed Kruskal‐Walis test on the results of our hyperparameter tuning and did not find a statistically significant difference in the 10 top models of each study. For this reason, we chose the best model as the model with the highest mean Dice Similarity Coefficient(DSC) value on the validation set. The reported test results are the results of the top model of each study on our test set which contained 19 whole breast volumes annotated by 3 readers fused with the STAPLE algorithm. We also investigated the effect of the quality of the training annotations and the number of training samples for this task.
Results
The study with the highest average DSC result was 3D‐MULTI with 0.96 ± 0.02. The second highest average is 2D WOFS (0.96 ± 0.03), and the third is 2D MULTI (0.96 ± 0.03). We performed the Kruskal‐Wallis 1‐way ANOVA test with Dunn's multiple comparison tests using Bonferroni p‐value correction on the results of the selected model of each study and found that 3D‐MULTI, 2D‐MULTI, 3D‐WOFS, 2D‐WOFS, 2D‐FS, and 3D‐FS were not statistically different in their distributions, which indicates that comparable results could be obtained in fat‐suppressed and nonfat suppressed volumes and that there is no significant difference between the 3D and 2D approach. Our results also suggested that the networks trained on single sequence images or multiple sequence images organized in multi‐channel images perform better than the models trained on a mixture of volumes from different sequences. Our investigation of the size of the training set revealed that training a U‐Net in this domain only requires a modest amount of training data and results obtained with 49 and 70 training datasets were not significantly different.
Conclusions
To summarize, we investigated the use of 2D U‐Nets and 3D U‐Nets for breast volume segmentation in T1 fat suppressed and without fat suppressed volumes. Although our highest score was obtained in the 3D MULTI study, when we took advantage of information in both fat suppressed and non fat suppressed volumes and their 3D structure, all of the methods we explored gave accurate segmentations with an average DSC on > 94% demonstrating that the U‐Net is a robust segmentation method for breast MRI volumes.