Home Novartis Team Develops cfDNA-Based Machine Learning Model to Detect Clonal Hematopoiesis Without Matched Blood or Tumor Tissue

Novartis Team Develops cfDNA-Based Machine Learning Model to Detect Clonal Hematopoiesis Without Matched Blood or Tumor Tissue

Apr 03, 2023 17:12 CST Updated Apr 07, 14:49
Novartis

Drug Development and Manufacturing

Clonal hematopoiesis (CH) is a common aging-related biological state characterized by the clonal expansion of hematopoietic stem cells, which increases an individual's risk of developing hematologic malignancies, heart disease, and stroke. In cancer patients, there is a higher incidence of clonal hematopoiesis of indeterminate potential (CHIP) and myeloproliferative neoplasms. Studies have shown that CHIP is associated with poor prognosis and shorter overall survival in cancer patients. Cell-free DNA (cfDNA) testing, as a non-invasive method, has been widely used in clinical trials to determine the genomic landscape of cancers, monitor minimal residual disease, and potentially detect early-stage cancer. However, due to the need for matched blood and tumor samples in cfDNA sequencing, it has not been widely used to identify the CH status in cancer patients.
Recently, a team from the Novartis Institutes for BioMedical Research in the United StatesScience Translational MedicinePublished an article titled "Clonal hematopoiesis detection in patients with cancer using cell-free DNA sequencing". The researchers developedA classification model that determines CH in cancer patients solely through cfDNA, without the need for matched blood or tumor tissue, can distinguish between blood mutations from CHIP and tumor-derived mutations, demonstrating the feasibility of detecting CH through cfDNA sequencing alone.Using this classification model, the research team found that about 30% of the samples in a cohort of 4,324 tumor cfDNA were identified with CH, and the incidence of CH varied by tumor type. Matched RNA sequencing data showed,Increased inflammation in the tumor microenvironment (TME) of CH-carrying tumor patients, particularly with neutrophil activation, indicating that CHIP may have potential pathogenic mechanisms.
First, the researchers established a database containingA cfDNA dataset of white blood cells, plasma, and tumor sequences from 124 patients with metastatic cancer and 47 healthy controlsThe tumor sequencing data was generated using the MSK-IMPACT panel, which includes all protein-coding exons of 410 cancer-related genes. Matched plasma and white blood cell data were generated using GRAIL Inc.'s 508-gene panel, with 314 mutated genes present in this dataset. A total of 1,400 single nucleotide variants (SNVs) derived from either tumors or white blood cells in this dataset were used to train random forest and logistic regression machine learning models. Plasma data was then utilized to classify the origin of mutations, constructing the CH classification model (Figure 1a). Researchers evaluated the performance at various thresholds and found the classification model to perform consistently well (Figure 1b). Compared to the random forest approach, the logistic regression method also emphasized a broader set of features, including relevant SBS mutational signatures (Figures 1c and 1d).

Figure 1. Classification methods and performance for distinguishing blood and tumor-derived mutations. Source:Science Translational Medicine

To further validate the performance of the CH classification model and investigate the role of CH in cancer, researchers collected baseline cfDNA sequencing data from 4,324 patients with advanced metastatic cancer from Novartis oncology clinical trials. Using logistic regression and random forest models, they classified 35,148 detected SNVs, and the results showedCH Classification Model Can Distinguish Blood- and Tumor-Derived Mutations in Tumor Clinical Samples. The predicted blood mutations are divided into possible CHIP driver genes (DNMT3ATET2ASXL1, JAK2SF3B1PPM1D) and other common bone marrow/blood cancer mutation genes(Figure 2A). The results showed that the CHIP driver gene with the highest predicted proportion of blood mutations in the model wasDNMT3AAndTET2, followed bySF3B1CBLAndKMT2C(Figure 2B). Tumor DNA sequencing of 38 NSCLC patients and 130 ER+BC patients (946 SNV cases) also confirmed the validity of the two models, with the logistic regression model demonstrating better classification of the biological origin of SNVs. The researchers divided the patients intoCH-positive (typical CH gene), CH-myeloid (presumed myeloid driver gene), or CH-negative, using a logistic regression model,In a sample of 4,324 cancer patients, 30.4% were found to carry blood-borne mutations consistent with CH, and the proportion of CH-positive patients increased significantly with age.(Figure 2C).

Figure 2. CH Characteristics in Cancer Patients. Source:Science Translational Medicine

The study also found that,Patients with non-small cell lung cancer, cutaneous melanoma, mesothelioma, or anaplastic thyroid cancer are more likely to have CH mutations compared to patients with other types of tumors.(Figure 3A), exposure to chemotherapy also increases the incidence of CHIP. Compared with other cancer patients,Patients with non-small cell lung cancer are more likely toDNMT3ABlood mutations are more likely to occur in patients with melanoma or non-small cell lung cancer in China.TET2Mutation, more likely to occur in TNBC patientsTP53Mutation(Figure 3B).

Figure 3. Cancer-specific differences in CH incidence. Source:Science Translational Medicine

To analyze the expression relationship between CP and tumor and inflammatory gene signatures in the TME, researchers selected baseline tumor RNA sequencing data from 819 patients in the baseline cfDNA sequencing dataset, of which 32.8% were CH-positive patients and 39.1% were CH myeloid patients. When comparing the gene expression profiles between CH-positive and CH-negative patients, 119 genes were upregulated and 5 genes were downregulated, with the upregulated genes significantly enriched in neutrophil degranulation, extravasation, and inflammatory response (Figure 4A). In patients with CH mutations and VAF > 2%, these changes were more pronounced (Figure 4B), strongly demonstratingCH is associated with increased expression of tumor and inflammatory gene signatures in the TME.

In many types of cancer, increased NLR is associated with poor prognosis. Researchers found that CH-positive status is related to elevated NLR in CH patients.DNMT3The NLR of patients with A mutation was significantly increased.

Figure 4. Gene expression changes associated with CH status in tumor biopsies. Source:Science Translational Medicine

Researchers also explored whether CH patients respond differently to treatment. An analysis of samples from 1,731 ER+BC patients showed that those receiving ribociclib (a CDK4/6 inhibitor) combined with endocrine therapy (ET) had longer progression-free survival (PFS), with 17% and 9% of the samples testing positive forDNMT3AAndTET2Mutation (Fig. 5a). InTET2In patients with frameshift (FS) or nonsense (NON) mutations, there was no difference in PFS between those treated with ribociclib + ET and those treated with ET alone (Figure 5b), indicating that the interaction between treatment and TET2 mutation status was not significant. In summary, the study showsDifferent CH mutations may have varying impacts on treatment response, offering new research perspectives on how CH positivity affects disease progression in cancer patients.

Figure 5. Treatment response in ER+ BC patients carrying CH mutations. Source:Science Translational Medicine

cfDNA is a mixture of normal DNA and tumor-derived DNA.The source of gene mutations detected in plasma can only be clearly traced by comparing deep matched sequencing from white blood cells (WBC) and plasma.However, this method is too costly, and commercial cfDNA testing usually does not include WBC sequencing. Therefore, in the absence of matched blood sequencing data, developing computational methods to annotate CH in cfDNA sequencing data will expand the utility of plasma sequencing and enable the study of CH's impact on cancer.

Lauren Fairchild, the corresponding author of the article and a data scientist at the Novartis Institutes for BioMedical Research, stated: "Our open-source method provides a computational solution for finding CH mutations in cfDNA, reducing the need for costly WBC sequencing in clinical settings.These findings suggest that CH as an additional biomarker of inflammation in the tumor microenvironment and its impact on patient treatment response warrant further investigation. However, the exact mechanism linking CHIP to increased tumor inflammatory signaling remains a mystery, which is one of the limitations of this study. Additionally, more cancer types need to be included in machine learning models to improve their accuracy.
In summary, this study proposes an accurate machine learning method capable of detecting CHIP variants from cfDNA samples, reducing the need for WBC sequencing in clinical settings. The method can be used to characterize CH in the tumor microenvironment (TME) and correlate patients' CH status with local and systemic biomarkers of inflammation. The study shows that CH patients exhibit increased neutrophil and inflammatory activity in tumors, TME, and peripheral blood. By considering CH as an additional biomarker for TME inflammation, this research offers new insights into exploring the impact of CH on patient treatment response.
References:
FairCHIPild L, Whalen J, D'Aco K, et al. Clonal hematopoiesis detection in patients with cancer using cell-free DNA sequencing. Sci Transl Med. 2023;15(689):eabm8729. doi:10.1126/scitranslmed.abm8729
https://www.science.org/doi/10.1126/scitranslmed.abm8729