NEW YORK – Researchers from GE Healthcare and Vanderbilt University Medical Center are eager to further establish the accuracy of an artificial intelligence model they've developed for predicting patients' immunotherapy responses.
Immune checkpoint inhibitors, which stimulate the immune system to attack cancer cells, have become the standard treatment for a range of tumors. In clinical trials, these immunotherapies have added months or years to some patients' lives and generally caused few adverse effects. However, when rare, serious side effects do occur, it decreases patients' quality of life, causing them to discontinue or pause treatments.
Jan Wolber, global digital product leader at GE Healthcare, and Travis Osterman, a medical oncologist at Vanderbilt University Medical Center, set out five years ago to develop a predictive AI algorithm that could identify the patients likely to respond to immunotherapies and those at risk of developing toxicities, specifically pneumonitis, hepatitis, and colitis.
They presented results of their efforts at the annual meeting of the Society for Immunotherapy of Cancer earlier this month. Their goal was to use the technology to advance a precision medicine approach for selecting therapies for cancer patients. "Historically, patients have been treated using guidelines where you assume that each patient is basically the same," Wolber said. "We really wanted to use artificial intelligence to personalize the predictions that we can make here."
Wolber and Osterman used a random forest model, a machine-learning algorithm based on decision trees, which begins with a single question and its possible consequences, then branches further based on more questions and consequences. Wolber said his team relied on random forest models because they are widely used and simple to apply.
The training data set included about 3,000 patients who had received anti-PD-1, anti-PD-L1, or anti-CTLA-4 therapies at Vanderbilt, and each patient's record included up to thousands of data points collected throughout their treatment. "We focused primarily on structured data that's available and routinely collected as a part of their healthcare services" in the form of ICD-10 codes that physicians use to classify diagnoses, symptoms, and performed tests and procedures in patient health records, Osterman said. "The goal would be that these models would be able to be implemented in any clinical setting where you're providing oncologic care," without the need for specialized biomarkers or other invasive testing, he added.
"We then took these structured data points and started to build predictive models with them," Wolber explained. "We went through quite a careful and rigorous method of nested cross validation … to avoid any overfitting of the data where the model might conform to this particular dataset but then performs poorly on future datasets."
The model produces a probability that a patient might experience the three toxicities of pneumonitis, hepatitis, and colitis, and predicts overall survival, rather than giving a simple positive or negative result. "We really want to make sure the clinician is still in charge, but the information from the models is part of the information that they take into consideration," Wolber said. "It's decision support, rather than decision-making."
The model required adjustment for each of the predictions it was designed to make, and according to Wolber, the input parameters needed to predict the three toxicities were "quite different."
To accurately predict pneumonitis, for example, there were certain input variables needed, such as oxygen saturation. When it came to predicting patients' risks for hepatitis or colitis, other variables, such as blood albumin, were important. "We really had to work with these toxicities, understand them, and then tune the models to the specific toxicities," Wolber noted.
Although many AI models incorporate results from molecular tests, Wolber and Osterman said they did not include that data in the inputs for their model. "Had we wanted to include genomic testing, we would have had to curtail the dataset quite a lot because it wasn't routinely done in all of the patients," Wolber explained. Or, he added, the researchers would have had to find a way within the rules of the algorithm to compensate for the absence of that data in a large proportion of the patients, making the model less robust.
If molecular biomarker tests become more routine for directing immunotherapy treatment, Wolber speculated that the machine-learning models could be retrained with that data using the same methodology to see whether the additional parameters might further improve the accuracy of the predictions.
The model ultimately performed with an accuracy of between 70 percent and 80 percent. While "not perfect," Wolber believes the model offers a better alternative than clinicians guessing at which immune checkpoint inhibitor to give or relying on more imperfect tests. For example, oncologists have bemoaned the limitations of PD-L1 expression testing in predicting best responders to immunotherapy. Compared to that, "we are doing quite well with our 70 to 80 percent accuracy," Wolber added.
The model is also independent of tumor type because researchers included data in the algorithm based on whether patients received a checkpoint inhibitor, not the kind of cancer they had. Osterman highlighted the broad applicability of the model across cancers as one of its unique aspects.
He also pointed out that the model underwent external validation. "Once we had developed these models with the data from Vanderbilt University Medical Center, we had the opportunity to take them to the health system [at the University Medicine Essen] in Germany," Wolber said.
There, this model was tested for the first time on data from more than 4,000 patients. "The model's performance was very similar in that dataset to the original performance at Vanderbilt, which gives us great confidence that our claim that these models are scalable is actually valid."
According to Osterman, there will likely be a version of the model that continues to be trained on new datasets such as the one from Germany, but he and Wolber are most interested in testing the model prospectively in a clinical trial with partners.
"We really hope that this [model] will be able to bring more drugs and better combinations of drugs to patients as we're able to select in a more precise way [which patients] would benefit from those treatments and who would be least likely to be harmed by them," Osterman said.
Wolber speculated that in a future prospective trial at Vanderbilt, for example, the model could be run "almost as a silent companion" and not used to actively guide therapy within the study. The predictions could then be unblinded after a year and compared against how patients actually did on the treatments they received, for example, whether they benefited or stopped therapy due to toxicities, enabling researchers to "look back once we've reached the endpoint to see how well the model has done," Wolber said.
Wolber said his team is also hoping to demonstrate the utility of the AI model among drugmakers by retrospectively applying it to data from previous trials. This should build drugmakers' confidence in the model and make them more willing to prospectively use it to stratify patients in a new drug trial, Wolber said.
There have been some comparable AI algorithms developed to predict efficacy and toxicity of immunotherapies in cancer patients. For example, in 2022, researchers from Korea published on a machine-learning algorithm trained on biomarkers to predict immunotherapy response. And in another study, Finnish researchers tested a model for predicting immune-related adverse events associated with immune checkpoint inhibitors.
Wolber noted that those other models are developed with information gathered from tests that are not routinely performed in patient care. In comparison, the model developed by Wolber and Osterman's team draws on information from standard interventions.
"We've really aimed for clinical utility from the beginning," Osterman added. "And every decision that's been made along the line has been made toward eventually [building] a product that can actually be used to improve patient care."
In addition to packaging the algorithm as a service for pharmaceutical companies interested in testing the predictive model in drug trials, GE Healthcare is also hoping to offer it as a clinical decision support tool for physicians. "Of course, we need to look very carefully at the regulatory implications of such clinical decision support software," Wolber said. "We're just at the beginning of that process."