Home Renhe Future: A 3-Year-Old Biotech Startup Breaks World Records in Genomic Data Compression and Computing

Renhe Future: A 3-Year-Old Biotech Startup Breaks World Records in Genomic Data Compression and Computing

Jul 07, 2017 08:00 CST Updated 08:00

VCBeat once mentioned in an article titled “Gene Testing Hits a Wall with Big Data: 42 Gene Companies Offer Solutions!” that, as sequencing technology becomes increasingly market-oriented, the storage, transmission, analysis, and interpretation of big data on genetic information will be the next critical barrier to overcome after the cost of sequencing.

 

Genomic data generated during the sequencing process—often exceeding hundreds of gigabytes per individual—represents a goldmine awaiting exploitation. Consequently, the storage, transmission, analysis, and interpretation of such data are indispensable steps, imposing stringent requirements on high-ratio compression algorithms, efficient transmission strategies, ultra-fast computing platforms, and specialized interpretation services.

 

At the Intel Life Sciences IT Forum held by Intel not long ago, discussions on data compression storage and high-performance computing took center stage. The Broad Institute made its debut in China, jointly establishing the GATK China Community with Intel, BGI, Alibaba Cloud, and Inspur. FPGA emerged as a keyword throughout the event for the first time, signaling that Intel will increase its investment in heterogeneous computing in the coming years.


Beyond these industry giants, a startup called Renhe Future has also drawn significant attention: in just three years since its founding, it has developed a data compression algorithm that improves efficiency by 20-fold over traditional methods, and its cloud-based genomic data computing system has reduced whole-genome computation time from days to merely 10 minutes.


微信图片_20170630150742.jpg


In 2016, the company emerged as the dual champion in both the data compression and computational acceleration categories at the global open competition held during the 11th International Conference on Genomics (ICG), setting new world records for data compression and accelerated computing.

 

Surprisingly, it is actually a biotechnology company. So, how did a biotech firm become a tech powerhouse and break IT world records?


When Business Meets Technology: Entrepreneurship with Preparation

 

Renhe Future was established in 2014 and co-founded by Yuan Mengxi, Huang Wenjing, and Dr. Song Zhuo. The initial team of more than ten members not only brought together multiple PhDs in genetics, bioinformatics, computer science, and medicine who had studied in North America, but also seamlessly integrated the founders’ expertise in finance and business.

 

2014 was a peak year for entrepreneurship in China’s genetic testing industry, as evidenced by the year-end industry review published last year by VCBeat. That year saw the establishment of more than 35 new companies, including Renhe Future.

 

That year, the National Development and Reform Commission (NDRC) suspended all clinical applications of high-throughput sequencing services. Rather than simply capitalizing on an industry trend, the company took flight only after years of accumulation. Prior to its formal establishment, the three founders had spent five years in preparation.

 

In the United States in 2009, startups represented by 23andMe and Knome sparked the first wave of commercialization in the genetic testing industry. Inspired and encouraged by this trend, Yuan Mengxi and Song Zhuo, who were studying in the U.S. at the time, each returned to China during their summer vacations to conduct market research on the domestic testing sector.

 

As fate would have it, they visited the director of the Health Checkup Center at the Third Xiangya Hospital around the same time and met through this occasion. In a sense, this journey holds historical significance for the eventual establishment of the company.

 

Their first meeting at the KFC in Changsha Railway Station marked the beginning of the preparations for Genetalks – Renhe Future.

 

That same year, Huang Wenjing, Yuan Mengxi’s classmate at the Cornell Johnson Graduate School of Management, joined the founding team. Leveraging the technical platform provided by Chinese scientists in San Diego, the team initiated its earliest research and development and data analysis efforts in 2009. It was during this period that the company developed the initial prototypes of its first disease-and-gene information database as well as its single-molecule labeling sequencing method for detecting low-frequency mutations.

 

In 2011, the team returned to China to reassess the genetic testing market, intending to initiate commercial translation and make a significant push. However, after careful analysis, they concluded that the domestic genetic testing market was still in a state of no market access requirements and no regulatory oversight, indicating that the timing was not yet mature. The team decided to continue accumulating resources and wait for the optimal entry opportunity.

 

Subsequently, in addition to the teams that continued collaborative R&D at various research institutes and institutions across North America, Yuan Mengxi, Song Zhuo, and Huang Wenjing joined IDG Capital, Berry Genomics, and Eli Lilly and Company, respectively. From diverse perspectives—including venture capital investment management, biotechnology translation, and healthcare market operations—they led professional teams in achieving successful explorations.

 

It was not until 2014, when gene sequencing was suspended, that the team excitedly realized the prelude to the industry was about to unfold. The team swiftly recalled its U.S.-based members and, within two months, completed financing, laboratory planning, core technology deployment, and the establishment of a domestic team, marking a new milestone with the launch of its R&D and testing base in Changsha.


Proactive Vision: Anticipating Future Trends


At that time, Roche’s 454 pyrosequencing technology had been available for less than a year, and Zhuo Song was among the first cohort of doctoral students to engage in high-throughput sequencing.


During his time at Vanderbilt University, Zhuo Song’s research focused on human genetics and bioinformatics. Perhaps even then, he instinctively foresaw that data processing and analysis would become the bottleneck for the entire industry in the future. The integration of biotechnology and information technology (BT+IT) was embedded into the technical team’s core ethos from the very beginning.

 

Therefore, since its inception, the company has strategically positioned itself in the IT sector.

 

In 2014, Renhe Future partnered successively with Amazon AWS, Alibaba Cloud, and Intel. Leveraging the underlying services provided by these IT giants, Renhe Future demonstrated a strong commitment to innovation, developing multiple refreshing, high-performance bioinformatics solutions.


Architecting the Data Transmission and Analysis Workflow to Shatter World Records


Currently, the company's big data product line includes two solutions: transmission and compression of genomic data, and high-performance computing for interpretation.


GTZ Transmission Compression Solution

 

GTZ is a data transmission and compression solution jointly developed by Renhe Future and the Hunan Provincial Engineering Research Center.

 

数据方案.PNG


GTZ is a data transmission and compression solution developed by Renhe Future. By integrating data transmission, compression, and distribution into a single platform, GTZ reduces the time required for large-scale genomic data transmission by 90% while also decreasing disk storage requirements by 90%. Compared with the established transmission tool Aspera, GTZ delivers a 10-fold increase in transmission capacity under the same bandwidth conditions. In comparison with the widely used traditional compression algorithm gzip, GTZ achieves a 10-fold improvement in compression speed and a threefold increase in compression ratio. This provides a high-efficiency, low-cost solution for the transmission and storage of big genomic data.


An individual’s genomic data is approximately 3 GB in size; however, at a sequencing depth of 30x and including base quality scores and other associated data, the final whole-genome dataset exceeds 200 GB.


Storing sequencing data in the cloud (e.g., Amazon S3 object storage service) costs 400 yuan per year without compression, 140 yuan per year with gzip compression, and can be reduced to 40 yuan per year with GTZ compression.


1499261035(1).jpg

 1499261075(1).jpg


Currently, companies in China have purchased a large number of NovaSeq next-generation sequencers manufactured by Illumina. A single NovaSeq sequencer can generate 6 TB of data within 30 hours, and over 1.5 PB of data per year when operating at full capacity. If GTZ compression is adopted, it can reduce storage costs by more than RMB 1.5 million per NovaSeq instrument.


Furthermore, unlike traditional transmission solutions, GTZ employs patented technology that compresses data during transmission, enabling stable and efficient utilization of full bandwidth.


By integrating data transmission, compression, and distribution into a single platform, GTZ reduces the time required to transmit large-scale genomic data by 90%, achieving a transmission capacity 10 times that of Aspera. Meanwhile, it cuts disk storage requirements by 90%, representing a threefold improvement over the 35% compression rate offered by mainstream gzip.


High-Performance Computing Solutions


1. GT-WGS


GT-WGS is a genomic information cloud computing platform based on Amazon AWS. Leveraging distributed computing on the public cloud, the platform utilizes hundreds of high-performance computers working in concert to reduce the analysis time for 30X human whole-genome sequencing data to under 10 minutes (a reduction of 23 hours). While enabling rapid analysis, GT-WGS maintains high result accuracy; its analytical results show greater than 99% concordance when compared with the standard GATK pipeline.

 

Under normal circumstances, generating data from genetic samples involves six steps: sample extraction, library preparation, quality control, sequencing, analysis, and interpretation. In standard protocols, this process takes at least 50 hours. However, the GT-WGS protocol reduces library preparation time by 1 hour and analysis time by 23 hours, shortening the turnaround time for personal genome testing services (from sampling to result reporting) to one day.

 

捕获.PNG


The machine-hour cost for GT-WGS is approximately $16, offering a 90-fold increase in data analysis speed compared to a single standard server and reducing cloud computing costs by 75%.


2. GTX One


Despite the many advantages of cloud computing, such as elasticity and flexibility, given that a large volume of genomic data in China is still stored offline, hardware-accelerated systems suitable for local computation will inevitably become a strategic focal point for future genomic data analysis.

 

Renhe Future launched the GTX-One, an all-in-one data analysis appliance based on CPU+FPGA heterogeneous hardware acceleration, last year. The PCI-E 3.0 FPGA enables a standard PC to achieve genomic data analysis capabilities equivalent to those of hundreds of servers.

 

By customizing and optimizing computational pipelines for genetic data analysis, a single GTX One can complete alignment and mutation analysis of a 30X whole genome within 15 minutes, setting a new record for the lowest energy consumption in genetic data analysis. This year, the company has upgraded the interfaces of the GTX One.

 

In simple terms, a single GTX One device delivers the analytical power of 150 standard servers, maximizing the reduction in procurement and operational costs for computing clusters while accelerating genomic analysis.


Data Interpretation Framework Based on Text Mining

 

CNV is a text-mining database based on NCBI text mining, capable of identifying and extracting associations between human phenotypes and genotypes. This is similar to the services provided by DNA Digest and Genomenon, as previously reported by VCBeat.


Typically, only about 6.6% of publications reflect the association between diseases and gene mutations in their titles and abstracts. This means that, without specialized tools, researchers must read through full texts to locate the required information, a process that consumes a significant amount of their time.

 

By leveraging text mining to extract and restructure knowledge from literature, CNV enables an automated workflow for literature mining, freeing researchers from tedious and time-consuming literature retrieval processes.

 

Currently, CNV covers all literature abstracts in NCBI and updates these documents on a monthly basis.

 

Integration of BT and IT


At this point, you may have a question: Isn’t this an IT company?

 

I wouldn’t say entirely. After moving south from Beijing to Hunan, Renhe Future spent two years establishing a medical laboratory, an engineering center, a gene bank, a research institute, and a demonstration center. Meanwhile, the company has also built a B2B testing market through collaborations with hospitals and health check-up institutions. Currently, the company’s testing product portfolio covers both health management and clinical diagnostics.

 

BT.PNG


Overall, Renhe Future is both a biotechnology company and an IT company.


Compared to pure biotechnology companies, Renhe’s future IT capabilities will serve as a competitive advantage. However, just as IT giants choose to form cross-industry alliances with biotech firms, developing superior products that serve biotechnology inevitably requires the support of biotechnological expertise. The interdisciplinary integration of IT and BT (Biotechnology) constitutes the company’s true strength.

 

The interdisciplinary integration of disciplines will be the trend in the future.


For the biotechnology (BT) sector, information technology (IT) serves as a tool; whereas for the IT sector, BT represents a vast and complex data source, necessitating the development of specialized analytical tools to address diverse challenges.

 

This process began in 2000, following the generation of the first human whole-genome sequencing data. Since then, the IT sector has been developing analytical tools tailored to the diverse needs of the biological sciences. However, the plummeting cost of sequencing has led to an explosive growth in data volume, with current data scales gradually exceeding the processing limits of existing analytical tools. The challenge facing the IT sector is no longer “how to compute,” but rather “how to compute faster and how to store.”

 

This represents a new demand from the biotechnology (BT) sector for information technology (IT), as well as new challenges and opportunities that the IT sector must confront. Against this backdrop, bioinformatics companies with interdisciplinary backgrounds, such as Seven Bridges, DNAnexus, and CLC Bio, have emerged. Tech giants including IBM, Intel, Microsoft, and Google have also recognized the future prospects in this field and are vying to capture market share.

 

In 2016, Microsoft partnered with Spiral Genetics to launch the BioGraph™ Suite analytics tool, while Intel joined forces with BGI and the Broad Institute to focus on high-performance computing and storage optimization. These tech giants aim to leverage their IT strengths to cross over into the biological market. However, it is undeniable that their biotechnology (BT) capabilities remain a weakness. Therefore, these giants typically choose to form alliances with BT companies to maximize their strengths and mitigate their weaknesses.

 

The entry of IT giants alongside biotechnology (BT) companies indicates, on one hand, that biological data analysis will be a major future trend, with global leaders such as IBM and Intel seeking to capture a share of the market. On the other hand, it demonstrates that establishing a strong foothold in this field requires more than just IT prowess; companies like Renhe Future, which integrate IT and BT, are likely to gain a competitive advantage in the future.