Are better AI algorithms for breast cancer detection also better at predicting risk? A paired case–control study

Our analysis suggests that algorithms that perform better for cancer detection are also likely to perform better for risk assessment. We found evidence that “easier to detect” larger cancers at diagnosis are also likely to be given a higher probability of malignancy three years prior to diagnosis, i.e., for risk assessment.

There are several implications of our findings. Firstly, improvements for risk assessment algorithms for breast cancer might be gained by improving their performance for cancer detection. For example, analysis by subgroup of cancer size showed that GMIC was broadly comparable with Mirai at detection of larger cancers (> 10 mm) and DCIS, but worse for smaller cancers both at diagnosis and at risk assessment. To achieve similar performance for risk assessment therefore, one might suggest additional training or developments of the GMIC algorithm to focus on smaller cancers—improvements to risk assessment are likely to follow. Secondly, our results suggest that state-of-the-art algorithms for breast cancer detection might be considered to be repurposed for risk assessment. In time, AI for mammography is likely to become implemented in national screening programs such as the UK. Such developments could then enable routine risk assessment to help drive new risk-based screening regimens. Thirdly, our findings help to explain why Mirai works well for risk assessment: it is finding early signs of cancer. These are likely most visible in the larger cancers at screen detection because they are more likely to have been there at the previous screen than smaller cancers which might have only developed in the interval between screens. Fourthly, our results suggest that, more generally, deep learning computer vision algorithms are able to discern intricate patterns in breast scans, which are not currently acted upon by radiologists. Their ability to extract latent insights from visual data only without the use of any classical risk factors suggests that the development of a more sophisticated diagnostic models should yield better results for risk assessment.

Strengths of our study include the paired design, whereby the mammograms at detection and earlier screening rounds were on the same women. Our design has not been used before to assess correlation between performance for detection and risk assessment. It is also little applied in other work on AI algorithms for breast cancer risk assessment, where most publications have focused on cancer following a single screening visit for risk assessment. Using paired data lets us test our hypothesis more reliably than indirect comparisons of performance for risk and detection using samples of different women. Another strength is that this study was an external validation assessment of all the algorithms, with no training or fine tuning done. This helps to ensure a reliable evaluation.

There are several limitations to our study. Firstly, although some algorithms produce heatmaps that can provide a more in-depth view of the inner mechanisms, the algorithms were applied as a “black box”, and we do not know if the higher risk was due to a suspicious area in the region where the cancer was found, or something else. For example, an alternative explanation for the findings might be that the algorithms identify a field effect in the breast, not a specific pattern associated with breast cancer. Secondly, our study is a retrospective and observational case–control study. The area is largely lacking evaluation through more prospective designs, and the retrospective nature of this work makes it at risk of bias including related to the decision to seek publication of results. Thirdly, the analysis is limited by when and where screening mammograms were recorded (e.g., it was based on women attending the English screening program, but we do not know the race or ethnicity of those included). Fourthly, we were unable to compare directly with other domains or risk models, including family history and polygenic risk scores; or the other risk factors that may be added to Mirai. Fifthly, we were limited by availability of code to run pre-defined algorithms for risk or detection. Other algorithms may perform differently, and this is worth further investigating. Lastly, it is important to note that this study specifically focused on mammograms acquired via Hologic machines, which may constrain the applicability of the findings to other types of mammogram machines.

In conclusion, this study evaluated whether the performance of an AI model for detection is associated with its performance for risk assessment. We did this using four open-source algorithms. The analysis suggests that algorithms that excel at cancer detection also perform well for risk assessment. The correlation between the ability to detect cancer in mammograms and the ability to assess the risk of developing cancer suggests that improvements in risk assessment algorithms could be obtained by focusing on improving their capabilities for cancer detection. For instance, algorithms may need additional training on detecting smaller cancers to achieve better performance in risk assessment. More generally, current state-of-the-art detection algorithms might be repurposed for risk assessment. This could enable the AI technologies currently being trialled to aid cancer detection using mammograms to play a vital role in future risk-based screening programs. For example, it might be advisable to recommend more frequent screening for higher-risk patients [21]. Finally, the paired mammograms in our study were about 3 years apart as per the standard breast cancer screening interval in the UK. Therefore, the evidence reported is most relevant to short-term breast cancer risk, perhaps due to the detection of indolent breast cancers not detected by the human eye. To better inform long-term mammography screening patterns, developing breast cancer risk prediction models over a longer time horizon would be useful. This extension could provide a comprehensive understanding of breast cancer risk dynamics and contribute to refining strategies for effective and personalized long-term screening.

Comments (0)

No login
gif