NEW YORK – A machine learning-based approach could enable the use of computationally intensive single-cell transcriptomic analysis to predict cancer patients' responses to treatment if researchers can obtain enough data to continue its development.
Sanju Sinha, the lead developer of the tool, dubbed PERsonalized Single-Cell Expression-Based Planning for Treatments In Oncology (PERCEPTION), is calling on cancer researchers to produce, curate, and openly share more tumor RNA sequencing data so the field can further train and refine machine learning precision medicine tools like the one his team is advancing.
Gene expression levels from bulk RNA-seq are an average of the expression levels in every cell in a sample. In comparison, single-cell RNA-seq gives a snapshot of gene expression within individual clones in a tumor and can potentially identify drug-resistant clones that may evade bulk sequencing-based approaches. However, single-cell RNA-seq is costly and not yet available as a clinical tool.
Sinha said that artificial intelligence methods are critical for managing the vast quantity of data produced by single-cell sequencing and translating it for clinical purposes. For example, he estimated that 10,000 cells from a single patient would produce about 10 million data points in a single-cell transcriptomic analysis — far more than a computational analysis by older methods can handle. But progress on machine learning-based tools like PERCEPTION is being limited by the dearth of data currently available to train the models they rely on.
"We want to encourage clinicians and people who are in charge of [clinical single-cell RNA-seq] data to make it public," Sinha said, adding that large datasets will be needed to move PERCEPTION and other AI tools beyond the proof-of-concept stage.
In Nature Cancer last month, Sinha, an assistant professor in the Cancer Molecular Therapeutics Program at Sanford Burnham Prebys, and collaborators at the US National Cancer Institute and other institutions reported that PERCEPTION could predict multiple myeloma and breast cancer patients' responses to single and combination therapies. The tool also identified lung cancer patients who were resistant to five standard tyrosine kinase inhibitors and outperformed other predictors based on bulk transcriptomics.
PERCEPTION makes use of a type of machine learning called transfer learning in which a model is first trained on a single task and then fine-tuned for a new, similar task. In the first step, Sinha's group used large-scale bulk RNA-seq data to train the model to predict drug response in cell lines. Next, they tuned the model with single-cell RNA-seq and drug responses from the same cell lines.
Having demonstrated that PERCEPTION could predict drug response in cell lines, the researchers then applied the model to the largest available clinical dataset they could find, which was drawn from the results of a clinical study conducted by researchers at the Tel Aviv Sourasky Medical Center and published in Nature Medicine in February 2021. In that trial, researchers tested a four-drug combination therapy regimen in patients with newly diagnosed multiple myeloma. Single-cell expression data, clonal composition, and treatment responses were available for 28 out of 41 patients in the study.
Sinha and colleagues used the multiple myeloma data from that study to improve their model and then tested it using two additional clinical trial cohorts. In particular, when they applied it to data from the FELINE breast cancer trial, PERCEPTION stratified patients who responded to Novartis' CDK4/6 inhibitor Kisqali (ribociclib) with a 0.776 area under the curve.
An important finding within Sinha's recently published study, he said, is that out of the many clonal cell populations within the tumor, the most resistant clones are the ones that will determine whether the patient responds or not. "That means if there is a patient who has cell populations that are perfectly responding, but one population is very resistant, that patient may not respond," Sinha said. But, he added, if a patient has multiple cell populations that respond only moderately and none that are resistant, the patient may have a moderate response to therapy.
Sinha and his coauthors concluded their work shows that PERCEPTION could potentially be used to predict individual treatment response in cancer patients. Although the model was originally powered to interrogate responses to 133 US Food and Drug Administration-approved drugs, the researchers ultimately produced predictions for only 44 drugs, Sinha said, because there was not enough data from both responders and non-responders to build models for most of the drugs.
"Our work wouldn't have been possible if the individuals who were doing these studies hadn't put together all of this data and made it public," Sinha said. "Data accessibility is something that could considerably advance the field of AI in biomedicine."
And in the spirit of openness, Sinha and his colleagues have made PERCEPTION available for download in a GitHub repository and have provided instructions for using it to build predictive models.
Having demonstrated proof of concept with the current study, Sinha said he is planning a larger-scale retrospective validation study. "Our tool is not ready for testing in patients prospectively yet," Sinha said. "We want to test the method on a large number of patients, first, retrospectively." He and his team are working with collaborators locally in San Diego and internationally to obtain more data for that validation.
If a larger, retrospective validation is successful, his team hopes to then prospectively test PERCEPTION's treatment predictive capabilities. A second goal for Sinha and his team will be to use single-cell transcriptomic data to anticipate the emergence of resistance to therapy and look for treatments that can overcome it.