The emergence of new technologies has had a significant impact on data capabilities and utilization in the fields of life sciences and drug development. With the increasing prevalence of portable devices and mobile health applications, as well as the widespread use of social media, researchers now have access to more data streams from which to extract valuable insights. The potential for large-scale data collection has further expanded as companies with direct-to-consumer products, services, and partnerships—such as Apple, Google, 23andMe, and ResearchKit—enter the life sciences market either directly or through collaborations. While this abundance of data provides scientists with greater opportunities to understand patients at the individual level, it also presents a formidable challenge in data management.
A key factor is that we can only effectively analyze data if it is well-organized, regardless of whether it originates from scientific journals, electronic medical records (EMRs), social media, or wearable devices. However, what is the value of possessing such vast amounts of information if you cannot make sense of it? This is where next-generation informatics solutions come into play. To extract actionable insights from this ocean of data, life sciences and pharmaceutical companies need a clear data management strategy to harmonize data and support this critical requirement.
Big Data — Real-Time Synchronization
In the R&D sector of pharmaceuticals and life sciences, big data attention has largely been focused on current data. In addition to the newer data sources mentioned above, life science and pharmaceutical companies primarily consider typical data streams such as electronic health records, genomics and screening data, clinical trial data, and mobile diagnostic and monitoring data. The compound growth of information will drive the recognition that data is transitioning from “historical facts” to “real-time,” as data can be directly applied at the individual level.
We face challenges in the diversity and quality of data. Data are captured with varying levels of detail, lacking harmonization and standardization. Trust in information also varies, as incomplete, unverified, outdated, or retracted data are frequently encountered. This lack of consistency poses significant challenges for us. How can we leverage data to make reliable clinical drug R&D decisions efficiently and rapidly, or to accurately predict research and treatment outcomes?
NewWarBriefly required—Coordinated Data
Given that the average cost of bringing a single drug to market exceeds $2 billion and can take up to 15 years, researchers must be able to derive reliable insights from data to minimize risk. Pharmaceutical and life sciences companies need a new mindset and continuous advanced data analytics to ensure data alignment across the entire enterprise. While this collective and consistent approach may inform research decisions, it is of little value when research remains trapped in “information silos.”
Many organizations rely on manual data management, with individual scientists and research institutions seeking answers from data at different times and in different ways. Therefore, harmonizing data enables researchers to search across diverse data sources using similar algorithms on different platforms, thereby establishing connections among seemingly unrelated information. This data harmonization should begin at the time of data collection, as preserving data context is crucial. Only in this way can researchers extract useful information.
Life Sciences and Technology—The Gray Zone
New technologies have exacerbated information overload. Data harmonization applies not only to multi-source and internal data but also to external, third-party, and commercial data. We have observed significant advancements in technology adoption by pharmaceutical companies; for instance, Roche has utilized smartphones to continuously collect patient data in Parkinson’s disease clinical trials. However, as tech giants like Google and Apple enter the traditional life sciences sector, pharmaceutical companies face substantial challenges. While pharmaceutical firms do not aspire to become technology companies, Google and Apple have raised public expectations regarding the rapid pace of technological innovation. The key challenge for pharmaceutical companies lies in leveraging existing data to generate parallel datasets.
The public perception is akin to that of smartwatches, which can transmit critical data from patients with specific diseases and those undergoing medication. These data can be captured and viewed at any time, and are also available for real-time utilization. In the “Google” era, the public has a blurred distinction between data availability and data actionability. With the development of data diversity and related demographic factors, data complexity has begun to grow exponentially. Google and Apple have already adopted genotyping and phenotyping approaches to accelerate the commercialization of DNA sequencing and screening. However, what holds truly significant implications for pharmaceutical companies is the generation of auxiliary datasets.
Deliver Data to Scientists
Finally, all these new technologies and methods for data collection require scientific expertise combined with relevant systematic organizational information to facilitate seamless access by researchers. Once data collection and harmonization are completed, researchers can effectively extract relevant data hidden within the vast ocean of information through semantic organization analysis and accurate text mining—a critical step in data acquisition. To successfully organize information so that it can be revealed through next-generation information technology solutions, data must be governed by professionally designed taxonomies and ontologies. Through detailed classification, data is parsed and indexed, enabling researchers to uncover new associations and trends across different datasets.
By extracting meaningful insights through next-generation informatics solutions, researchers are empowered to make reliable, data-driven decisions. Enhancing data utility enables researchers to derive actionable insights, which significantly impact critical corporate decision-making and prompt enterprises to re-evaluate their strategies. The potential power of data in the hands of life science researchers is immense; with the right tools, they can leverage this data to boost productivity while laying a solid foundation for new discoveries.
Author |Tim Hoctor——Vice President, Professional Services, R&D Solutions, Elsevier
Compiled by Chen Kun
Editor: Huang Jia