Blog: Teaching AI to Talk to Clinicians
When asked to explain how they arrived at a diagnosis, a good doctor can retrace their steps back to the moment the patient walked into the hospital. Sometimes they can go even further, back to the moment that brought the patient to the hospital in the first place. Diagnostics is a combination of test results, medical knowledge, and intuition (often referred to as clinical gestalt), and while your doctor may not always be right, the process of creating a diagnosis makes it possible to pinpoint the moment where the process went wrong.
However, defining this moment, and the diagnostic process in general, has presented a unique challenge for researchers developing machine learning for medicine. Developers rely on metrics like accuracy, sensitivity, and specificity to make sure that their models are performing correctly, only to encounter confusion from clinicians when they cannot explain how their model reached a particular conclusion. We saw this in 2018, when a report from cancer doctors working with an early version of IBM Watson Oncology’s software found themselves dissatisfied with Watson’s performance in the clinic and confused as to how it created cancer treatment plans.
Researchers at the University of Washington have begun to make strides in this area by developing machine learning that explains itself to doctors. In this case, their program, called Prescience, analyzes both patient history and real-time medical data from anesthetized patients during surgery to track a patient’s risk of developing hypoxaemia (low arterial blood oxygen tension), a condition that is linked to cardiac arrest, post-surgery infection, and stroke. Prescience then explains how it computed that risk value by highlighting clinical factors in a patient’s history (ex. BMI or age) and in the real-time surgery data (ex. pulse, blood pressure) that positively or negatively influence this calculation. This analysis is traditionally performed by an anesthesiologist based on their knowledge of the patient history and real-time data from standard surgery sensors, so Prescience is designed to act as an assistant in the operating room.
How can well Prescience identify signs of hypoxaemia? Well, it correctly predicted whether a patient would develop hypoxaemia about 81% of the time. If that sounds low, the researchers also studied how well anesthesiologists predicted whether a patient would develop hypoxaemia on the same medical data used to train Prescience, both by themselves and with the assistance of Prescience. Compared to Prescience alone, anesthesiologists without Prescience were correct about 66% of the time, going up to 77% when assisted by Prescience. In other words, Prescience was better at predicting hypoxaemia than anesthesiologists were, even when the anesthesiologists were assisted by Prescience in their decisions.
Does that mean that Prescience has found information in a patient’s history that doctors may overlook? Interestingly, it seems to rely on the same risk factors as doctors do — blood oxygenation, blood pressure, and whether the hospital specialized in trauma care were the three most important variables in the prediction of hypoxaemia onset for both anesthesiologists and Prescience. However, other variables such as BMI, pulse, and breathing rate were influential for Prescience but not for anesthesiologists, suggesting that Prescience may use correlations unseen by anesthesiologists when predicting the risk value.
While it is unlikely that you will see Prescience in your local operating room in the near future, you will likely see other applications of machine learning incorporated into your medical care if you haven’t already. One of the many hurdles that developers have faced and will continue to face in expanding the use of machine learning in healthcare is their ability to explain the decisions of a program to doctors and patients alike. Programs like Prescience demonstrate that the accuracy of prediction is only augmented by explanation, and that opening the black box of machine learning invites further discovery about programs and the people who use them.
Now, lean back in your chair, and count backwards from 10, 9, 8…