Home iMAP: A Novel Single-Cell Data Integration Method Based on Adversarial Paired Transfer Networks

iMAP: A Novel Single-Cell Data Integration Method Based on Adversarial Paired Transfer Networks

Feb 24, 2021 10:01 CST Updated 10:01
Abiosciences

Life Science Technology Developer

On February 18, the Zhang Zemin Laboratory at the Beijing Advanced Innovation Center for Genomics (BIOPIC), School of Life Sciences, Peking University, in collaboration with the Beijing International Center for Genetic Medicine (ICG), the Center for Life Sciences (CLS), and Abiosciences, published a bioinformatics methodology paper titled “iMAP: integration of multiple single-cell datasets by adversarial paired transfer networks” in the journal Genome Biology. The study proposes iMAP, a novel method for single-cell data integration based on deep autoencoders and generative adversarial networks.

A key approach to generating reliable novel insights using single-cell RNA sequencing technology is the integration of datasets from multiple sources. However, technical variations inevitably exist among datasets generated from different experimental batches. The primary challenge in developing batch effect correction methods lies in eliminating these technical variations while preserving the true biological differences present across experiments. Currently, mainstream batch effect correction methods struggle to achieve a reliable balance between these two objectives.

Wang Dongfang, a postdoctoral fellow in Zhang Zemin’s laboratory, and colleagues have developed a novel method, iMAP, offering new insights into the effective integration of single-cell data. The iMAP method combines the advantages of two state-of-the-art unsupervised deep network architectures—deep autoencoders and generative adversarial networks (GANs) (Figure 1).

The primary function of GANs is to accurately integrate the gene expression distributions of cells of the same cell type across different datasets. However, the cellular composition of real-world biological datasets is highly complex, with potential incomplete overlap in cell types, and substantial variations in the proportional distribution of the same cell type across different datasets.

Therefore, iMAP first constructs a novel autoencoder architecture to extract low-dimensional cellular representation features. These features mitigate batch effects to a certain extent while preserving genuine biological differences across datasets. Subsequently, by constructing rwMNN cell pairs, effective self-training data are generated to guide the subsequent GAN network in achieving accurate mixing of cellular gene expression distributions.

Compared with other methods, iMAP can not only match the gene expression distributions of the same cell types across different batch datasets but also identify specific cell types in each dataset. They demonstrated the effectiveness and reliability of the iMAP method on more than ten datasets of varying scales generated by different sequencing technologies. Compared to other deep learning-based methods, iMAP exhibits a significant speed advantage on large-scale datasets. They also applied iMAP to the analysis of tumor-infiltrating immune cell datasets, discovering novel intercellular interactions within the tumor microenvironment by integrating datasets generated by Smart-seq2 and 10x Genomics technologies, respectively.

iMAP provides a free Python package (https://github.com/Svvord/iMAP) that enables users to integrate single-cell transcriptomic data. With the widespread adoption of single-cell sequencing technologies and the generation of large-scale datasets, iMAP may serve as a valuable tool for integrating data from different experimental batches and offer new insights for the development of future algorithms.

Dongfang Wang, a postdoctoral fellow at the Beijing Institute for Advanced Cancer Research (BIOPIC)/School of Life Sciences, Peking University, and Siyu Hou, a doctoral student at Tsinghua University, are co-first authors of this paper. Dongfang Wang and Professor Zemin Zhang from BIOPIC/School of Life Sciences are the corresponding authors. This work was supported by the National Natural Science Foundation of China, the Beijing Advanced Innovation Center for Future Genome Diagnosis, the Joint Center for Life Sciences, and Analytical Biosciences Limited.

Paper link:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02280-8