Skip to main content
Premium Trial:

Request an Annual Quote

New Iteration of AACR's Project GENIE Focuses on Real-World Outcomes to Enhance Precision Oncology


NEW YORK – The analysis of cancer patients' genomic data has proven useful for elucidating the molecular underpinnings of common cancer types, separating out different subtypes of one cancer from another and discovering some new treatments. But researchers have also found that genomic data must be combined with other kinds of information in order to provide the kinds of insights that are needed to push cancer research from interesting discoveries into the realm of precision oncology.

The American Association for Cancer Research (AACR) initiative known as Project GENIE (Genomics Evidence Neoplasia Information Exchange), which was launched in 2015, is a registry of genomic and phenomic data from thousands of cancer patients treated at 19 participating institutions in North America and Europe. The goal of the project is to use this data to improve cancer research and aid in decision making for treatment.

At the AACR's virtual annual meeting this week, Memorial Sloan Kettering Cancer Center medical oncologist Gregory Riely provided an update on the project and a look at the initiative's next iteration, which includes the participation of nine biopharmaceutical companies and a greater emphasis on real-world data to determine the true impact of treatments on patients. Riely also provided data from this new version of the project, GENIE BPC.

Project GENIE's seventh data set was released in January, adding more than 9,000 records to the database. These data will be released to the public every six months, and the public release of the eighth data set is set to take place in July. The combined data set now includes data from 79,720 tumors collected from patients at 19 cancer centers, making it one of the largest public cancer genomic datasets released to date, Riely said. The dataset includes information for more than 80 major cancer types, including data from more than 12,500 patients with lung cancer, nearly 11,000 patients with breast cancer, and nearly 8,000 patients with colorectal cancer.

Riely also noted that the breast cancer cases include tumors of multiple subtypes, and added, "Just as importantly, the 32nd most represented cancer is appendiceal carcinoma, and with those tumors, we see 400-plus cases. So, even with the relatively rare tumors, we have a significant number of cases." The dataset also includes samples from patients who are of a broad range of ages, from infants to the elderly.

The foundation of GENIE data is somatic tumor DNA data, Riely said, combined with phenomic data — tumor type, histology, patient demographics, and vital status. The data is made publicly available 12 months after sequencing. The project also undertakes sponsored research efforts, which involve expanded phenomics data, including detailed clinicopathology, prior treatment, and outcomes data. Typically, these efforts are undertaken with specific cohorts of patients, and the data is made public at the time of publication.

Now, the initiative is making a push to further its goals of advancing precision oncology and using data to aid in clinical decision-making. At the AACR's annual meeting in Atlanta in 2019, Charles Sawyers, chair for the AACR Project GENIE Steering Committee, noted that the project had ambitions to collect data on germline DNA, cell-free DNA, RNA sequences, and epigenetics. In his presentation this year, Riely said that time has arrived.

GENIE is now going beyond publicly available genomics and level-one phenomics and going to "responsive research," he said. Beyond adding germline DNA, cfDNA, RNA-seq, and epigenetic analysis to its current somatic DNA sequencing capabilities, it's also adding medication and treatment outcomes information to its phenomics data, "knowing that this data will be able to drive discoveries," Riely added.

The new iteration of the project, GENIE BPC, was officially launched in October 2019. The five-year, $36 million research collaboration with nine biopharma companies — Amgen, AstraZeneca, Bayer, Boehringer Ingelheim, Bristol-Myers Squibb, Genentech, Janssen, Merck, and Novartis — aims to obtain clinical and genomic data from about 50,000 de-identified patients treated at the institutions participating in GENIE.

To build a harmonized standard, the collaborators built a common data model for the initiative which takes patient demographic, smoking status, and vital status and adds pan-cancer data for any status such as PDL-1 or MSI, as well as cancer-specific mutation data, treatment information, and patient outcomes, according to Riely.

In the first two years of GENIE BPC, the plan is to establish expanded clinical data, including prior cancer-directed medicines, pathology, and outcomes, for six cancers in approximately 8,000 records at three or four GENIE sites. The first-year priorities are lung, colorectal, breast, prostate, bladder, and pancreatic cancer.

"Upon this foundation, we hope to expand to multiple cancer types and go consortium wide," Riely said. In the third and fourth years, the consortium is planning pan-cancer implementation for more than 50,000 records, and by the fifth year, the hope is that they'll be analyzing as many cancer types at as many GENIE sites as possible.

To analyze outcomes, GENIE BPC uses the PRISSMM data model, which was developed by the Dana-Farber Cancer Center. It's a phenomic data standard that uses information from text reports in electronic health records pertaining to a patient's pathology, radiology, imaging, symptoms, molecular markers, and a medical oncologist's assessment to create a more complete picture of a cancer patient's overall health. Importantly, Riely explained it uses real-world endpoints, such as rwOverall Survival and rwTime to Treatment Failure to judge the efficacy of treatments.

Genie BPC "seeks to connect genomics and patient outcomes as a function of treatment," he added.

The project is currently aggregating data from more than 1,800 patients and will present that data going forward. But the researchers started with a smaller subset of patients for their first dataset, looking at 100 patients with 113 NSCLC tumors — 99 adenocarcinomas, eight squamous tumors, and six other types.

All patients had somatic analysis done on their tumors. The analysis found that EGFR mutations made up a plurality of the driver oncogenes at 27 percent, followed by KRAS G12C mutations at 18 percent, other KRAS mutations at 18 percent, and various other targetable mutations. About 24 percent of the tumors had no targetable driver mutations.

Riely also noted other types of data available from the analysis. "With genie data as a foundation, we have a broad number of characteristics for patients' tumor DNA. This includes tumor mutation burden," he said. "We see a broad range of tumor mutation burdens from very few mutations all the way to tumors with more than 26 mutations per megabase."

As part of the new GENIE BPC analysis, the researchers also built in patient treatment history with the available information. The most common treatment regimen was carboplatin and pemetrexed, but there were also a number of patients who received immune checkpoint therapies such as pembrolizumab (Merck's Keytruda) as monotherapies. The researchers were able to compare the efficacy of these treatment strategies and found that patients taking the combination regimen had a median overall survival rate of one year compared to about five months for the patients taking the single-agent immune checkpoints.

"But we really need to go beyond overall survival, because that's a commonly available endpoint," Riely said. So, the researchers started looking at progression-free survival in the real world. They used the PRISSM framework, looking at metrics called PFS-I and PFS-M — progression-free survival based on imaging studies and radiology reports and based on medical evaluation reports and oncologists' progress reports, respectively.

"In the real world, sometimes progression is noted first on the scan, and then by the physician. And sometimes progression is noted first by a physician and confirmed by a scan. Both of those endpoints are important to identify," Riely said.

Indeed, he noted that the real-world data showed that the combination regimen provided a median PFS-I of 3.3 months and median PFS-M of 6.9 months. The median PFS for the immune checkpoint treatments was 4.9 months, bracketing the combination treatment, he added.

BPC is building on GENIE's foundation by curating clinical data to link genomic and phenomic data, Riely concluded.

In an email to GenomeWeb, Sawyers concurred, adding that the biopharma collaboration "allows us to take GENIE to a whole new level by abstracting longitudinal data on tens of thousands of patients over the next several years." The new initiative is research on a scale that wasn't possible a few years ago, he noted. The findings from the first lung cancer case show that it is possible to proceed with other tumor types and to begin to expand beyond the pilot phase of the project to additional GENIE clinical sites. 

"In addition to providing a unique, open resource for the cancer research community, we believe this new phase of GENIE can serve as a powerful catalyst for real-world evidence initiatives in molecularly defined subsets of cancer patients," Sawyers said. "In addition, it demonstrates the willingness of cancer centers and biopharma to come together in a non-competitive way to share data that benefits the entire cancer community."