As a critical factor of production, data resources harbor immense value and are regarded as the “oil” of the future-oriented digital economy. In light of this, China officially designated data as one of the six major factors of production in 2019, alongside labor, capital, land, knowledge, technology, and management.
Since 2022, news regarding the marketization of data elements has become increasingly frequent. Medical data, long dormant, may be approaching a critical tipping point for an “exploitation boom”!
In June, Henan Province designated six pilot programs for the market-based allocation reform of production factors. Notably, it was specified that the Zhengzhou Healthcare Security Administration would lead the pilot work on the market-based allocation of data elements in Henan Province, leveraging the National Healthcare Security Administration’s Big Data Innovation Application Platform to explore the development of a market system for data elements.
In October, the Zhejiang Institute of Standardization and other entities researched and drafted the “Guidelines for Data Asset Recognition (Draft for Comment)” and solicited public feedback. This is also the first recommended local standard in China specifically developed for data asset recognition, attracting widespread attention.
Jiangsu and other regions have explored the circulation and application of medical data by hosting data application competitions, providing participating teams with de-identified medical data and a secure, compliant development environment for data applications.
After years of ups and downs, progress in the mining and application of healthcare data has been far from ideal. Can it achieve a true breakthrough by leveraging the “marketization of data as a factor of production”? VCBeat conducted interviews with industry experts to explore this question.
Although data, as a new factor of production, is regarded as the “oil” of the digital economy era, its value lies not in the data itself but in the insights generated through its application and analysis. Therefore, if data cannot be effectively utilized and mined, it will remain as difficult to leverage as crude oil buried deep underground.
Among these, medical data is widely recognized for its immense value. Clinical diagnosis and treatment, hospital management, epidemic prevention and control, medical technology innovation, health insurance, and patients can all benefit significantly from it. Taking the market size of domestic medical big data solutions as an example, the market reached RMB 17.8 billion in 2020, with an estimated compound annual growth rate (CAGR) of 39.3% from 2020 to 2026. Based on this projection, the market size was expected to reach RMB 34.7 billion in 2022.
However, contrary to the traditional view of data as an asset, it may be more accurate to describe these data as a liability for all stakeholders, given that they cannot generate value without processing.
From a balance sheet perspective alone, the physical storage of raw data is not merely an asset but also, in a sense, a “liability.” Data storage and management impose substantial economic costs on healthcare institutions and related entities; without practical application scenarios, this indeed constitutes a significant burden.
In December 2021, the General Office of the State Council issued the Overall Plan for the Comprehensive Reform Pilot of Market-based Allocation of Production Factors, stating that the layout of pilot regions and the preparation and approval of implementation plans should be completed in the first half of 2022, with efforts to achieve phased results in the pilot work by 2023. Subsequently, various localities began their corresponding layout planning, leading to the scene described at the beginning of this article.
However, an obvious reality is that the current application of medical data is far from ideal, characterized by caution and conservatism, data silos, and limited application scenarios. The most critical need at present is for clear implementation policies, leading institutions, and standardized processes: “Policy implementation is paramount. Medical data assets differ from those in other industries, as their unique nature is closely intertwined with policy. As long as policies are clearly defined, it is natural for enterprises to legally provide various services along the value chain or develop marketable products through data asset R&D in accordance with policy guidelines,” said Hong Lei, COO of Qiyi Medical, in an interview with VCBeat. Qiyi Medical is involved in both big data and digital therapeutics.
“From a legal and compliance perspective, most of the current challenges in medical data applications can certainly be resolved once the government provides clear guidance and requirements. With well-defined policies in place, a ‘good money drives out bad’ dynamic can take shape within the industry—companies that are properly qualified, technologically sound, and high-quality in service will be able to operate compliantly, thereby enhancing market efficiency and value,” she added.
On one hand, the intensive rollout of various regulatory policies has established clear red lines for data applications; however, while these regulations define what is prohibited, they fail to specify what is permissible or how to proceed, thereby rendering the marketization of data non-operational. On the other hand, the overall quality of existing medical data remains poor, still falling short of the standards required for commercialization.
In fact, the market circulation of medical data is not easy globally, facing various strict regulations.
In Europe and the United States, where data marketization began earlier, the application of medical data has reached a certain scale, giving rise to a specialized healthcare data services industry. By providing Platform-as-a-Service (PaaS) solutions to healthcare institutions, data service companies transform vast amounts of raw data into usable datasets and leverage artificial intelligence or machine learning to offer decision support. After years of development, these sectors have attracted numerous startups and industry giants, primarily operating under two business models.
One approach is to charge healthcare providers and insurance payers, delivering better clinical outcomes, improved efficiency, and cost savings through data-driven decision support. The other is to offer free or low-cost services to healthcare institutions, then monetize by charging data users such as pharmaceutical companies for access to the data collected in the background.
However, the pricing, circulation, and application of medical data should follow a planned and orderly approach with Chinese characteristics, rather than blindly following the models of specific European or American countries. “China is a socialist country with universal health insurance coverage. Its healthcare security system differs from those in Europe and the United States, which institutionally determines that the ownership, rights and interests, pricing, circulation, and application of our medical data must align with China’s unique characteristics and the interests of its entire population,” said Hong Lei.
For this reason, the circulation and application models of medical data with Chinese characteristics may need to be explored and reformed within the existing institutional framework. Merely accumulating and managing data without practical application will ultimately become a burden on the state. Therefore, it is necessary to accelerate the design, exploration, and implementation of new application models through reform-oriented practices.
According to VCBeat, some regions that have adopted a national health insurance system have been exploring the marketization of medical data for a considerable period, allowing for the observation of tangible application value. The utilization of medical data in these regions is primarily facilitated by specially established “National Health Insurance Insurer Information Integration and Application Service Centers.”
These initiatives began several years ago, driven by the data needs of large healthcare institutions aiming to enhance operational efficiency. The center initiated pilot programs for the centralized processing of medical data. Through this institution and corresponding legislation, public agencies and academic research organizations engaged in industrial applications are permitted to apply for access to National Health Insurance data. This facilitates the center’s use of AI-enhanced algorithms for data identification and processing; for instance, imaging data such as CT and MRI scans are processed through deep learning models. The refined database is then utilized to support healthcare institutions in analyzing hospital management aspects—including procurement, distribution, medication replenishment, and consumables—thereby enabling smarter and more efficient healthcare operations.
These exploratory applications may offer us some insights.
Wang Bing, General Manager of the National Health and Medical Big Data (Eastern) Center (hereinafter referred to as the Eastern Center), has long been engaged in data-related work and possesses a deep understanding of the reasons why medical data has remained difficult to utilize effectively.
She believes that the primary reason for the suboptimal utilization of medical data is its fragmented distribution across various entities. Government agencies, healthcare institutions, academic and research organizations, and certain health and medical enterprises have all accumulated corresponding health and medical data resources through their long-term operations, thereby becoming the de facto owners and controllers of such data. This has directly resulted in significant difficulties in defining data ownership rights.
"As raw data continues to be processed, ownership of medical data becomes increasingly blurred. 'For instance, when we undergo a CT scan at a hospital, the original, complete imaging data constitutes one dataset. Once physicians’ diagnostic information is added to this imaging dataset, a new dataset is generated,' explained Wang Bing."
Unclear ownership will lead to a series of issues, such as the confirmation of rights, governance, and circulation of data elements. More critically, data collection will become exceptionally difficult.
“If we regard data resources as the most fundamental factor of production, they must first be processed into tradable products before any transaction can take place—data needs to undergo a series of steps including extraction, aggregation, and processing to become a product. Although we often draw an analogy between data and oil, their essential natures are quite different. Ownership of oil is clearly defined, and it is depleted with use. In contrast, ownership of medical data is highly fragmented, with hospitals, governments, academic institutions, enterprises, and even individuals each holding portions of the data. Furthermore, data is unique in that it is not depleted by use; it can be reused indefinitely and may even generate new data at various stages as it circulates. Due to concerns over protecting the resources under their control, different entities exhibit varying willingness to share their data. All these factors make data collection and sharing difficult,” Wang Bing explained to VCBeat.
Secondly, there is significant variation in data quality. Within the same region, the quality of data submitted by tertiary hospitals, secondary hospitals, and primary care institutions varies considerably. The reasons for this include not only differences in the quality of raw data generation due to factors such as equipment and operational quality control, but primarily two additional factors: inconsistencies in data collection standards and capabilities among healthcare IT vendors.
“In the past, the development of health information systems lacked unified guidelines, resulting in inconsistent standards and varying database structures to support business digitalization. Therefore, when implementers (such as local health commissions) sought to collect specific data, they found that the system development standards and database table structures generating such data differed across institutions. Even if data were collected according to a unified standard, interpretation and cleaning remained highly challenging. This is the first reason,” explained Wang Bing.
“The second reason is the inconsistency in data collection methods. For instance, tertiary hospitals and most secondary hospitals obtain data through direct database connectivity, which helps ensure relatively high data quality. However, many primary healthcare institutions, constrained by inadequate IT infrastructure, rely on manual data entry. Given the shortage of physicians at these grassroots facilities, data reporting tasks are often delegated to nurses or non-medical personnel, resulting in highly variable data quality. This leads to significant disparities in data quality from the same entity. According to the ‘bucket effect,’ the inclusion of poor-quality data can drag down the overall data quality, rendering it difficult to utilize,” she added.
In fact, despite the strong public demand for open access to residents’ health record data, a major reason for the slow progress at this stage is the poor quality of medical data. Without significant investment in data governance and processing, the health record data returned to residents is prone to deviations.
Finally, there is the issue of economic viability. Traditional factors of production follow a “extraction–processing–product design–sales” pathway. The economic calculation for this pathway adopts a forward-looking model, where the final price is determined by summing the costs of each stage in the process and adding a profit margin.
Wang Bing believes that calculating the economic value of data production resource factors currently faces significant challenges: “It is precisely difficult to design a pathway for confirming data value. Given the aforementioned difficulties in data collection and the substantial variations in data quality, it becomes evident that there is no way to accurately measure and finely control the costs associated with the front-end processes (collection, storage, and governance). Consequently, the substantial investments made by market entities in processing semi-finished data products often fail to yield marketable products capable of attracting a sufficient user base. It is akin to panning for gold dust in the desert; while everyone recognizes the value of gold dust, enterprises lack the incentive to invest when the costs of panning and processing, product quality, and market expectations are all difficult to quantify. As a result, the gold dust remains dormant in the desert.”
Regarding how to effectively utilize medical data elements, Wang Bing put forward her perspective. She believes that for provinces to treat medical data as a factor of production and achieve governance and circulation of all data elements, the essential prerequisite is to implement digital reform, which comprises two components: building an architecture with “one top and one bottom,” and driving reform through digitalization projects.
Establish an organizational structure with “one top and one bottom”
A reality is that although central departments have issued numerous policies to develop the digital economy and advance digital reform, digital reform represents the crystallization of intelligent governance concepts from multiple commissions, offices, and bureaus, requiring strong policy guidance. Consequently, a significant number of reform policies are formulated and implemented at the provincial level. She stated that, given China’s current national conditions, a more pragmatic approach is for each provincial government to collaborate with professional digital reform industry organizations to co-establish an organizational platform supporting industry digitization and data services, thereby building “industry-specific new digital infrastructure” equipped with public digital technical capabilities and cross-sector industrial service capabilities.
The so-called industry-specific new digital infrastructure refers to the common technological foundation supporting the Industrial Internet. Data services, on the other hand, are industry-empowering services advanced by leveraging this digital infrastructure. These services are built upon competent authorities’ approval of data valorization projects for specific scenarios, the planning and implementation of multiple digitalization initiatives, and the coordinated mobilization of multi-party professional data talent resources.
“Reform measures in the fields of public health and medical services are closely aligned with provincial-level departments. Therefore, I believe that the first and most critical step is for the provincial government to uniformly formulate a top-level design, establishing an organizational platform that encompasses the entire industry and industrial chain—from the organization, acquisition, and governance of data resource elements, to supply-demand matching, and ultimately driving industrial chain development. Without this foundational step, efforts across various stakeholders will remain fragmented and unimplementable.”
“The supporting entity for the reform-oriented development of data resource elements also requires corresponding leadership task forces and management teams at the provincial level. This is because medical data resources are not only within the purview of health administrative authorities but also involve cross-departmental coordination among drug regulation, medical insurance, development and reform, industry and information technology, finance, cyberspace administration, and legal affairs, as well as industry guidance, standards, and norms.” Wang Bing explained to VCBeat what “one top-level design” means.
“I believe that the most critical prerequisites for facilitating the circulation, organization, governance, and development of data resource factors at the provincial level are ‘one foundation’ and ‘one top.’ ‘One foundation’ refers to establishing an organizational platform infrastructure, while ‘one top’ entails forming a province-wide integrated leadership task force along with corresponding working groups. Through the coordination of these two elements, long-term future plans can be formulated and subsequently broken down and implemented into relevant industry and industrial plans,” added Wang Bing.
Furthermore, the new organizational structure must maintain compatibility and coordination with existing institutions. For instance, some provinces have already established big data centers under their jurisdiction to oversee government-related data resources across the entire province. Their responsibilities also include formulating management rules for the aggregation, processing, and circulation of corresponding government-related data. Logical consistency must be ensured to facilitate future integration with big data exchanges, which operate at the downstream end of the data resource element chain.
After improving the organizational structure, it is possible to communicate and connect with various data source ends, organize these data resources, and aggregate them into the provincial data asset catalog table, continuously updating and iterating. At the same time, it is also necessary to gather all kinds of talents required for governing data resource elements (such as professional annotators and experts in various specialized fields), data resources from different parties, including new types of service providers like data auditors and data intermediaries, and more importantly, organize the data demand side.
“Through shared development and collaborative participation by all stakeholders, genuine supply-demand matching can be achieved, continuously fostering various industry applications. These include public convenience and benefit services, improvements in medical service quality or management efficiency, as well as diverse industrial applications such as AI-oriented, financial and insurance-oriented, and new drug R&D-oriented services. Large-scale datasets, stripped of sensitive information from raw data—often referred to as ‘data components’ in many regions—are accumulated and transformed into pricable, repeatedly tradable commodities. These are ultimately delivered to data trading entities, such as data asset exchanges, to promote large-scale transactions. In this way, data exchanges can further establish rules for data pricing, trading, taxation, and other aspects, thereby integrating the entire value chain,” summarized Wang Bing.
Accelerating Progress Through Digital Infrastructure Projects
However, since hospitals that hold the majority of clinical diagnosis and treatment data lack sufficient incentive to share it, the effectiveness of marketizing data resources will fall short if these obstacles cannot be overcome. To address this issue, Wang Bing believes that key problems must be tackled: “Ultimately, there are three main reasons why hospitals lack the motivation to share data. First, data sharing is not directly or clearly linked to hospitals’ KPIs and returns. Second, in the absence of clear rules, such sharing may be perceived as a loss of assets. Third, there are potential risks of data breaches.”
To address this issue, it is also necessary to find ways to break down “data silos.” Wang Bing introduced the Jiangsu experience, which involves driving data collection through the implementation of large-scale digitalization projects that benefit residents across the province, improve the quality of medical services, or enhance management efficiency, thereby achieving rapid integration of medical data throughout the province.
So-called digitalization projects refer to projects undertaken on digital infrastructure, planned and implemented in a unified manner around the digitalization goals of a specific sector across the entire province. These projects typically encompass the full-chain implementation of data acquisition, storage, governance, and utilization within the scope of the digitalization objectives for that specific sector.
These digitalization initiatives must engage multiple stakeholders and implement end-to-end digital transformation according to unified standards, such as digital reform projects in the pharmaceutical and medical device distribution sector, and projects for imaging digitalization and integrated service reform. For instance, Jiangsu Province’s imaging platform project represents a promising endeavor that can facilitate the rapid integration of healthcare data through several key steps.
First, unified standards were released to achieve the comprehensive digitalization of medical imaging. “The initial step in establishing a provincial-level imaging platform must be to standardize data aggregation, security management, and application services for medical imaging across the entire province. Corresponding unified deployment and implementation should be carried out in all healthcare institutions throughout the province, enabling the central aggregation of data in standardized formats from hospitals. Through applications such as cross-institutional image retrieval and processing, patients can access their historical clinical data from other regions and healthcare facilities at any medical institution, thereby providing convenient and beneficial services to the public,” she explained.
Second, as the data has been standardized, the quality of data aggregated at the central level is high, enabling the formation of a series of datasets. These datasets can support the development of various industry applications, such as facilitating digital supervision by medical insurance authorities. By leveraging medical insurance payments as a regulatory tool to curb overtreatment in hospitals, patients can avoid unnecessary repeated imaging examinations, thereby reducing physical harm.
Third, once data resources are truly consolidated at the central end, the steps for leveraging data services to support industrial innovation and development can be streamlined. For instance, by utilizing the data resources at the central end in conjunction with public technical support platforms and development environments, applications such as AI-assisted decision-making and innovative drug research and development can be carried out.
Fourth, the province-wide implementation of a digitalization project can establish a “digital highway” linking the government, tens of thousands of medical institutions, and the industrial chain, thereby laying the foundation for data collection, data aggregation, and future application-integrated marketing.
“Such a large-scale digitalization project with practical value can achieve multiple wins. It not only fulfills its original goal of supporting industry applications—such as benefiting the public, enhancing healthcare service governance, and improving management efficiency—but also serves as a key practice for high-quality data aggregation across the province. Through the planned launch of multiple projects, the network for digital upgrading and data services is gradually being perfected. By complementing this with the ‘one top, one bottom’ framework established by the organizational platform and the Digitalization Leadership Group to build a digital service network that promotes industrial interconnectivity, and by supporting the implementation of intelligent government governance, we can continuously strengthen the provincial data resource hub, optimize asset management and service matchmaking, and simultaneously advance both technical support and business model innovation tailored to supply-and-demand scenarios. Only in this way can the entire circulation market of the province be fully integrated. This is a vast system,” summarized Wang Bing.
Currently, various provinces are successively introducing and implementing strategic plans for the marketization of healthcare data as a factor of production. It is worth noting that while a certain degree of “diversified development” helps identify models better suited to China’s context, the lack of uniform standards across provinces raises the potential risk of creating new “data silos.” Addressing this concern may require higher-level coordination to establish unified standards.
As the cultivation of the data factor market accelerates, we may soon witness progress in the market-based circulation of healthcare data as a production factor. However, given that data factors possess characteristics distinct from traditional production factors such as labor, land, capital, and information, establishing an ideal institutional framework to support the marketization of data as a production factor will require a considerably long process.
Let us wait and see for that day to arrive.