Predictive Algorithms Not Robust Enough for Clinical Research
Algorithms need to be built and trained on real-world inconsistencies and shared data sets to be generalizable
NEW HAVEN, CT — The quest for personalized medicine has emerged as a critical goal in the healthcare sector. But a new Yale-led study shows that the mathematical models currently available to predict treatments have limited effectiveness.
In an analysis of clinical trials for multiple schizophrenia treatments, the researchers found that the mathematical algorithms were able to predict patient outcomes within the specific trials for which they were developed, but failed to work for patients participating in different trials. The findings were recently published in Science.
“This study really challenges the status quo of algorithm development and raises the bar for the future,” said Adam Chekroud, PhD, an adjunct assistant professor of psychiatry at Yale School of Medicine and corresponding author of the paper. “Right now, I would say we need to see algorithms working in at least two different settings before we can really get excited about it. I’m still optimistic, but as medical researchers, we have some serious things to figure out.”
Not-so-robust algorithms
Due to the high cost of running a clinical trial, however, most algorithms are only developed and tested using a single clinical trial. But researchers had hoped that these algorithms would work if tested on patients with similar profiles and receiving similar treatments. For the new study, Chekroud and colleagues wanted to see if this hope was really true.
They aggregated data from five clinical trials of schizophrenia treatments made available through the Yale Open Data Access (YODA) Project, which advocates for and supports responsible sharing of clinical research data. In most cases, they found, the algorithms effectively predicted patient outcomes for the clinical trial in which they were developed. However, they failed to effectively predict outcomes for schizophrenia patients being treated in different clinical trials. “The algorithms almost always worked first time around,” Chekroud said. “But when we tested them on patients from other trials the predictive value was no greater than chance.”
Where is the problem?
The problem, according to Chekroud, is that most of the mathematical algorithms used by medical researchers were designed to be used on much bigger data sets. Clinical trials are expensive and time-consuming to conduct, so the studies typically enroll fewer than 1,000 patients. Applying the powerful AI tools to analyze smaller data sets, he said, can often result in “over-fitting,” in which a model has learned response patterns that are idiosyncratic, or specific just to that initial trial data, but disappear when additional new data are included.
“The reality is, we need to be thinking about developing algorithms in the same way we think about developing new drugs,” he said. “We need to see algorithms working in multiple different times or contexts before we can really believe them.”
Building algorithms on real-world variety
In the future, the inclusion of other environmental variables may or may not improve the success of algorithms in the analysis of clinical trial data, researchers added. For instance, does the patient abuse drugs or have personal support from family or friends? These are the kinds of factors that can affect the outcomes of treatment.
Most clinical trials use precise criteria to improve chances for success, such as guidelines for which patients should be included (or excluded), careful measurement of outcomes, and limits on the number of doctors administering treatments. Real-world settings, meanwhile, have a much wider variety of patients and greater variation in the quality and consistency of treatment, the researchers say.
“In theory, clinical trials should be the easiest place for algorithms to work. But if algorithms can’t generalize from one clinical trial to another, it will be even more challenging to use them in clinical practice,’’ said co-author John Krystal, MD, the Robert L. McNeil, junior professor of translational research and professor of psychiatry, neuroscience, and psychology at Yale School of Medicine.
Chekroud suggests that increased efforts to share data among researchers and the banking of additional data by large-scale healthcare providers might help increase the reliability and accuracy of AI-driven algorithms. “Although the study dealt with schizophrenia trials, it raises difficult questions for personalized medicine more broadly, and its application in cardiovascular disease and cancer,” said Philip Corlett, PhD, an associate professor of psychiatry at Yale and co-author of the study.
- This press release was originally published on the Yale University website