Optimising AI tools for mammography clinical practice

AI tools for mammography
© Edward Olive

Lester Litchfield, Data & Science Manager, Volpara Health, explores how to build confidence in the use of AI for mammography clinical practice

Artificial Intelligence (AI) is well suited to manage repetitive processes and to identify patterns in large amounts of data. By applying AI to image interpretation tasks such as cancer detection, breast density assessment and quality improvement, mammography centres can deliver significant benefits to imaging professionals and patients. With the potential to enhance the quality of patient care and reduce workload, the role of AI is rapidly increasing.

One concerning fact is that many AI tools for mammography have algorithms built on small, homogenous datasets. This means the patients tend to come from similar ethnic backgrounds, and the images are generated by a limited assortment of imaging equipment.

The unfortunate result is that these AI tools often have an unrealised bias or do not perform as expected in clinical practice.

To build trust that AI improves breast imaging, AI products must be appropriately designed and validated through rigorous processes to ensure performance and accuracy. While there are no widely accepted standards for the design or validation of AI today, there are quality methodologies being discussed by regulators and industry organisations around the world.

Volpara Health has taken steps to establish best practices for the creation of our AI, abiding by the following guiding principles:

  • Use large, diverse datasets for training and testing to ensure better generalisation
  • Participate in independent, third-party perspective validation studies
  • Make AI explainable

In a study published recently in Academic Radiology, researchers found that only 9 of 118 FDA-cleared AI algorithms used validation datasets of over 1000 patients, with the majority having less than 500 patients. Most lacked any patient demographic information and 17 had no validation data at all.

With insufficient information on how AI tools are validated, it is very difficult to assess clinical efficacy since their generalisability and absence of bias cannot be judged. Is it any wonder that physicians – and patients – have concerns about the clinical utility of many AI tools? Volpara Health has instituted rigorous internal standards for the development and validation to ensure quality in our AI products.

AI tools for mammography

Make training and testing more transparent

First, developers need to avoid bias, which means that the training data needs to be broad enough to ensure that it includes all expressions of the disease as well as normal subjects. In addition, the datasets used for testing need to represent the population at large to ensure generalisability.

This is where most AI tools for mammography fall short – the algorithm may perform well on a small dataset but fails when applied to a larger dataset.

To ensure a robust, generalisable algorithm that is unbiased, Volpara Health utilises more than 70 datasets with patients and cases from facilities around the world, including New Zealand, Australia, United States, China, Japan, Chile, United Kingdom, and the Netherlands, to name a few. In order to validate the performance of our VolparaDensity software, we perform 36 tests to confirm that it works with every FDA-approved mammography system and that it can resolve implants, pacemakers, image quality issues and other “noise” that clinicians see in the real-world but generally are not included in “clean” training datasets. In addition, we perform tests to compare how it performs on digital 2D mammography versus 3D tomosynthesis; how it scores the same woman when the exam is taken on different mammography systems; and measure how it performs compared to radiologists.

Prospective third-party validation

Validation testing is critical to ensure an AI tool performs as expected. For example, an algorithm is trained on a dataset that includes images from a high-risk breast centre, which uses a 3D tomosynthesis system, and a general imaging centre, which uses a 2D mammography system. Is the AI detecting cancers or has it simply learned that the 3D images are from the high-risk clinic and therefore, more likely to have suspicious findings?

While AI developers should perform validation studies to document performance and accuracy, internal data is not enough to establish credibility in the clinic. Whether required by regulators or not, AI developers should encourage the completion of external, third-party validation studies. The accuracy of Volpara TruDensity has been validated by multiple independent studies, showing a high correlation to Breast MRI – considered the source of ground truth in breast density. Volpara Health’s AI tools have been featured in over 200 peer-reviewed studies as well as an additional 200 more scientific works, including conference presentations and books, making it the most independently validated AI tool of its kind.

Making AI explainable

“Explainable AI” is important as AI tools are used in clinical practice. In order to trust an algorithm, users need to understand how it generated the results. For example, a physician uses an AI tool to help guide screening decisions for a patient. She disagrees with the AI scoring the woman as low risk because she has seen similar cases where patients developed cancer. Without understanding how the AI tool scored the patient’s risk, it is difficult to trust the recommendation, whether it is correct or not.

Poor explainability is an issue for many AI tools because their results are based on an abstract treatment of data. One approach to improve explainability uses game theory to understand how pixels contribute to a prediction.

Volpara Health has been working towards more interpretable algorithms for many years based on our background in safety-critical software and the belief that if you can understand something you can trust it better. For example, our TruDensity algorithm uses well established medical physics, assisted by neural networks, to produce a density map. Having this map together with measurements of volume and contact area, can help interpreting physicians to understand why the density score may be different from their visual assessment.

Closing the AI confidence gap

Ultimately, wider adoption of AI will depend on robust evidence of the improved quality of patient care, increased efficiency, and cost-effectiveness, but trust and explainability are also key factors in getting clinicians engaged.

 

Please note: This is a commercial profile

© 2019. This work is licensed under CC-BY-NC-ND.

Contributor Details

Lester
Litchfield
Machine Learning Engineer and Data Analyst
Volpara Health

LEAVE A REPLY

Please enter your comment!
Please enter your name here