
On July 26, 2017, the 2017 AWS Technology Summit, themed “Cloud • Taking Charge of the Future,” was held at the China National Convention Center in Beijing.Focusing on multiple hot topics in cloud computing, covering big data and artificial intelligence, architecture, security,IoT and Other Top Ten Technology Sub-venues,360, NVIDIA, etc.Leading Experts from Numerous Technical Teams Jointly Explore the Path of Innovative Development in Cloud Computing.
Leveraging IT (Information Technology) to Set New Records in the BT (Biotechnology) Industry,Renhe Future was therefore invited to attend the summit. This makes it the second biotechnology company to be invited, following BGI in 2014.

By integrating genetic big data with cloud computing, Song Zhuo, CTO of Renhe Future, introduced the construction of a high-performance supercomputing system for genetic data based on AWS cloud services. The highlights are as follows:
As the application of genetic testing becomes increasingly widespread, the cost of gene sequencing continues to decline, and a growing number of people are benefiting, leading toThe scale of genetic data is experiencing explosive growth.
Taking Illumina’s latest NovaSeq sequencer as an example, its data generation rate at full capacity is 6 TB per 30 hours. The file size of genomic sequencing data for a single individual is approximately 200 GB, meaning the instrument can produce one person’s worth of genomic sequencing data per hour. It is reported that domestic orders for the NovaSeq in China have exceeded 100 units. Based on estimates from various types of sequencing equipment already ordered, nationwide genomic industry data in China is projected to surpass 100 PB in 2017 (1 PB = 1 million GB).
Sequencing data are raw data that require extensive computational analysis and interpretation to reveal their clinical and health significance. Such large-scale data pose a severe challenge to computational interpretation.
Currently, the single-machine computation time required to analyze 200 GB of an individual’s genomic data is 30 hours. If data analysis proceeds at this computational speed in the face of the already arrived big data surge, the inevitable outcome is that analysis will lag behind data generation, leading to data accumulation.
Due to the inherent characteristics of genomic data and the fact that the cost of whole-genome sequencing is declining at a rate exceeding Moore’s Law, cost control in data transmission, storage, computing, and processing has become an industry-wide challenge.
Since its inception in 2014, Renhe Future has made forward-looking deployments in big data compressed storage, transmission, and high-performance computing. Furthermore, taking into account the actual conditions in China, it has developed two solutions in the field of computational acceleration: elastic cloud computing and local hardware acceleration.
Cloud computing is a virtualized computing model based on internet-related services. It features excellent dynamic scalability, capable of delivering powerful computational performance on the order of 10 trillion operations per second. Moreover, its "access over ownership" paradigm contributes to its lower costs.
Scaling up machine size does not linearly improve computational performance; the I/O wall caused by massive data transfer means that expanding computational resources beyond a certain point can significantly degrade computational performance.
Renhe Future has built a cloud computing acceleration system on the AWS Cloud platform. Leveraging innovative data distribution and data shuffling technologies, and by developing the high-performance distributed database StageDB in conjunction with biological genomic knowledge, it has successfully completed the analysis of 400 GB (55x) human genome data within 18 minutes. This achievement establishes an approximately ideal linear relationship between computational performance and the scale of computational resources.
According to Song Zhuo, the development of GTX.WGS technology was akin to fighting three major battles:
Campaign One: High-Speed Data Distribution
First, in response to ultra-large-scale data towardHigh-speed distribution across 250 AWS EC2 servers: Renhe Future developed a unique big data partitioning technology based on genomic biological characteristics and the requirements for high-performance computing data balance, compressing a task that originally took 66 minutes to 1within minutes, reducing the computational time for the overall analysis task to3-4 hours.
Phase 2: Data Shuffling
The company's R&D personnel adoptedAWS S3 object storage solution, developed data shuffling technology, rearranged massive segmented data files, achieving completion of 10 within 20-25 minutes9DNA strandThe task of arranging fragments on the genome by position compresses the entire computational time into60 minutes.
Battle Three: Breakthroughs in Storage
Although this computational performance has reached the leading level of gene big data analysis both domestically and internationally, Renhe Future remains unsatisfied with this achievement. The company further tackled challenges in data storage and developed a high-level Key-Value database, StageDB., reducing the data rearrangement time of the previous step to40 seconds, with the total time reduced to 18 minutes, securing victory in the competition and winning the championship at the 11th International Conference on Genomics (ICG) Computational Acceleration Competition in November 2016.
In terms of hardware acceleration, Renhe Weilai has independently designed and developed an FPGA hardware acceleration card for genetic data analysis and built the GTX-One, a specialized computer for genetic data analysis. This single machine can complete alignment and mutation analysis of a 30X whole genome within 15 minutes. While achieving the world’s fastest computational speed, it also sets a new record for the lowest energy consumption in genetic data analysis.
In addition, Renhe Future has developed GTX.Zip, an integrated solution that combines three core functions: compressed storage of genomic big data, full-bandwidth transmission, and data distribution. By achieving ultra-high compression efficiency for genomic data, GTX.Zip significantly reduces storage costs. Its features—including transmission of compressed data, full-load transmission, and simultaneous compression and transmission—provide an efficient and practical alternative to shipping hard drives for the distribution and transfer of genomic big data.
In fact, by integrating its proprietary gene big data interpretation solution, Renhe Future has also built a comprehensive analysis system for gene sequencing data, enabling rapid end-to-end processing from the moment data is generated by the sequencer. This addresses the bottlenecks of time-consuming and resource-intensive data analysis in genetic testing services, thereby expanding genetic testing to broader populations and wider applications, alleviating pain points in the general health industry and providing momentum for its significant advancement.