Augment like there's no tomorrow: Consistently performing neural networks for medical imaging

Deep neural networks have achieved impressive performance in a wide variety of medical imaging tasks. However, these models often fail on data not used during training, such as data originating from a different medical centre. How to recognize models suffering from this fragility, and how to design robust models are the main obstacles to clinical adoption. Here, we present general methods to identify causes for model generalisation failures and how to circumvent them. First, we use $\textit{distribution-shifted datasets}$ to show that models trained with current state-of-the-art methods are highly fragile to variability encountered in clinical practice, and then develop a $\textit{strong augmentation}$ strategy to address this fragility. Distribution-shifted datasets allow us to discover this fragility, which can otherwise remain undetected after validation against multiple external datasets. Strong augmentation allows us to train robust models achieving consistent performance under shifts from the training data distribution. Importantly, we demonstrate that strong augmentation yields biomedical imaging models which retain high performance when applied to real-world clinical data. Our results pave the way for the development and evaluation of reliable and robust neural networks in clinical practice.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here