
Artificial Intelligence Computing Service Provider
AtacWorks is a deep learning toolkit for epigenomics research that reduces the cost and time required for rare-cell and single-cell experiments.

Just as travelers carry suitcases packed with clothes, most cells in the human body carry complete copies of DNA, with billions of base pairs housed within the nucleus.
Each cell extracts only the gene segments it requires, and different genes are activated by different types of cells, such as liver, blood, or skin cells. The regions in DNA that determine a cell’s unique function are open and thus readily accessible, whereas the remaining portions are wrapped around proteins.
Researchers from NVIDIA and the Department of Stem Cell and Regenerative Biology at Harvard University have developed a deep learning toolkit to help scientists study these accessible DNA regions, including in scenarios where sample data are noisy or limited. Such situations frequently arise in the early detection of cancer and other genetic diseases.
Recently, AtacWorks was published in Nature Communications. This toolkit can both denoise sequencing data and identify accessible DNA regions. Built on NVIDIA Tensor Core GPUs, AtacWorks requires only half an hour to perform inference across the entire genome. NVIDIA Tensor Core GPUs are available through NGC, NVIDIA’s GPU-optimized software hub.
AtacWorks can be used for ATAC-seq, a common method for identifying genomic open regions in healthy and diseased cells, thereby providing key insights for drug development.
ATAC-seq typically requires tens of thousands of cells to yield a clear signal, making it exceptionally difficult to study rare cell types, such as hematopoietic stem cells that give rise to blood cells and platelets. By applying AtacWorks to ATAC-seq data, researchers can achieve results of comparable quality using only dozens of cells. This enables scientists to gain deeper insights into active regulatory sequences within rare cell populations and identify mutations that predispose individuals to infectious diseases.
“With AtacWorks, we can perform single-cell experiments that would previously have required ten times the number of cells,” said Jason Buenrostro, an associate professor at Harvard University and co-author of the paper who developed the ATAC-seq method. “Using GPU-accelerated deep learning to denoise low-coverage sequencing data effectively helps us investigate epigenetic changes associated with rare cell development and disease.”
In 2013, Buenrostro developed ATAC-seq, an epigenomic profiling method. This technique is used to identify accessible regions within chromatin. Due to its ability to measure signal intensity across every region of the entire genome, this method has gained widespread adoption among leading genomics research laboratories and pharmaceutical companies. Peaks in the signal indicate open regions of DNA.
The fewer the available cells, the noisier the data, making it difficult to determine which regions of the DNA are accessible.
AtacWorks is a convolutional neural network based on PyTorch. This neural network is trained using paired, labeled matched ATAC-seq datasets, one of which is a high-quality dataset and the other a noisy dataset. Through downsampled copies of the data, the model can predict accurate high-quality versions and identify peaks in the signal.
Researchers found that they could use AtacWorks to identify accessible chromatin in noisy sequences with 1 million reads, achieving performance nearly comparable to that of traditional methods using clean datasets with 50 million reads. With this capability, scientists can conduct studies using fewer cells, thereby significantly reducing the costs of sample collection and sequencing.
AtacWorks also reduces analysis costs while accelerating analytical speed. Running on NVIDIA Tensor Core GPUs, the model requires less than 30 minutes to infer an entire genome, whereas the same process takes 15 hours on a system with 32 CPU cores.
Avantika Lal, lead author of the paper and a researcher at NVIDIA, stated, “For very rare cell types, existing methods alone are insufficient to study their DNA differences. AtacWorks not only helps reduce the cost of acquiring chromatin accessibility data but also opens up new possibilities for drug development and diagnostics.”
Observing accessible regions of DNA can help medical researchers identify specific mutations or biomarkers that increase susceptibility to diseases such as Alzheimer’s disease, heart disease, or cancer. These insights can also inform drug development, enabling researchers to better understand disease mechanisms.
In this paper published in Nature Communications, researchers at Harvard University applied AtacWorks to stem cell datasets. Stem cells are the source of red and white blood cells and belong to rare subtypes that cannot be studied using traditional methods.
In a sample group containing only 50 cells, the team was able to use AtacWorks to identify distinct DNA regions associated with cells developing into white blood cells, as well as independent sequences linked to red blood cells.
To learn more about NVIDIA’s achievements in the healthcare and life sciences sector, you are invited to watch GTC 2021, held from April 12 to 16. Registration for the conference is free. The Healthcare and Life Sciences track features 16 live webinars, 18 special events, and over 100 on-demand session videos, including Avantika Lal’s presentation, “Deep Learning and Accelerated Computing for Epigenomic Data.”
The DOI of this Nature Communications paper is 10.1038/s41467-021-21765-5.