The Fragmented Landscape of Clinical Data

In today’s healthcare landscape, hospitals generate vast amounts of data daily from numerous sources. This data, typically stored electronically, is spread across different locations within a hospital. For instance, electronic reports detailing patient treatment information are stored within the oncology department, while patient images are housed separately in the radiology department’s Picture Archiving and Communication System (PACS). This segregation extends further, as different departments often employ various infrastructures, utilizing different software and data formats, leading to a lack of interoperability.

Data fragmentation, where data is broken into many pieces that are not adjacent, is a significant issue. This problem escalates in multicenter studies, where data from different institutions must be combined. The lack of standardization across institutions complicates data interoperability, making it challenging to compile relevant information spread across diverse systems.

**Factors Affecting Public Cloud Mass Data Fragmentation.** Mass Data Fragmentation in the Cloud Explained. © 2024 Cohesity, Inc. https://www.cohesity.com/what-we-do/mass-data-fragmentation/.

Types of Data Fragmentation

The primary issue with data fragmentation lies in the scattering of information and the creation of isolated silos. Different departments or teams often establish these silos independently, without considering the broader need for coordination and integration. Data fragmentation is generally categorized into two main types: physical and logical.

Physical fragmentation occurs when data is dispersed across various locations or storage devices. This scattering can make the integration of data a complex and time-consuming task, as it requires retrieving and consolidating information from multiple sources. The technical difficulties involved in this process can further hinder the efficient use of data.

**How Do You Detect Data Fragmentation?** Technical and organizational indicators can be used to detect data fragmentation. Pacheco, M. (2024). What is Data Fragmentation? 8 Strategies to Solve & Combat. TierPoint, LLC. https://www.tierpoint.com/blog/data-fragmentation/.

Logical fragmentation, on the other hand, happens when data segments are logically duplicated or divided across different applications or systems. This can result in different versions of the same data being available in various locations. Such fragmentation complicates data management, as it becomes challenging to ensure consistency and accuracy across all versions. Both types of fragmentation pose significant challenges to the seamless use and analysis of clinical data, ultimately impacting the effectiveness of healthcare delivery and research.

**8 Strategies to Solve Data Fragmentation.** Above are eight helpful tips to help resolve data fragmentation challenges. Pacheco, M. (2024). What is Data Fragmentation? 8 Strategies to Solve & Combat. TierPoint, LLC. https://www.tierpoint.com/blog/data-fragmentation/.

The Exponential Growth of Clinical Data

Over the past decade, the use and production of clinical data have surged, particularly in fields like radiation oncology. New technologies, such as advanced scanners that capture images in less than a second, have led to what is termed a ‘data explosion’. While these technological advancements have improved healthcare quality, they have also generated far more data than anticipated. However, the development of data mining techniques has not kept pace with this rapid data growth.

**Data Fragmentation Consequences for Business Operations.** Mass Data Fragmentation in the Cloud Explained. © 2024 Cohesity, Inc. https://www.cohesity.com/what-we-do/mass-data-fragmentation/.

This immense volume of data surpasses human capabilities to manage effectively. Consequently, this largely unexplored data holds tremendous potential for developing clinical prediction models by leveraging comprehensive information from imaging, genetic banks, and electronic reports. Yet, issues like missing values and unstructured data—data lacking a predefined model or organization—pose significant barriers to utilizing this data efficiently.

Understanding the ‘Big’ in Big Clinical Data

The term big data in the clinical context encompasses not only the sheer volume of data but also its complexity, unstructured nature, and fragmentation. The concept is often distilled into the four ‘Vs’: Volume, Variety, Velocity, and Veracity.

**Four Vs of Big Data with example: Deep Dive.** Medha. (2024). Four Vs of Big Data: Everything You Need To Know. https://www.quantzig.com/blog/4-vs-big-data/.

Volume. The amount of data is growing exponentially, driven by both humans and machines. Traditional storage systems struggle to accommodate this vast influx of data.

Variety. Data comes in multiple forms and from various sources, including structured databases, free text, and images. The challenge lies in storing and retrieving this diverse data efficiently and aligning it across different sources.

Velocity. Big data is generated in a continuous and massive flow, requiring real-time analytics. Understanding the temporal dimension of data velocity is crucial, as data’s utility can vary over time.

Veracity. The complexity of big data often results in inconsistencies and noise, making data veracity the most challenging aspect. Accurate representation of data is critical, especially in clinical contexts.

**Importance Of The Four Vs Of Big Data. Medha. (2024).** Four Vs of Big Data: Everything You Need To Know. https://www.quantzig.com/blog/4-vs-big-data/.

The Expanded Vs: Validity, Volatility, Viscosity, and Virality

In addition to the primary four ‘Vs’—Volume, Variety, Velocity, and Veracity—four more properties have been proposed to further encapsulate the complexities of big clinical data: Validity, Volatility, Viscosity, and Virality

Validity. Ensuring data accuracy for its intended use is crucial, particularly given the sheer volume and veracity challenges inherent in big data. Interestingly, during the initial stages of analysis, it is not always necessary to validate every single data element. Instead, the focus should be on identifying relationships between data elements within the vast dataset. This approach prioritizes the discovery of meaningful patterns and connections over the initial validity of each data point, which can be refined in subsequent analyses.

Volatility. This dimension addresses the lifespan and retention of data. With the ever-increasing capacity demands, it is essential to determine how long data needs to be stored and remain accessible. Volatility reflects the dynamic nature of data utility over time, emphasizing the importance of balancing storage costs with the relevance and necessity of data retention. Understanding when data becomes obsolete helps manage storage efficiently and ensures that only pertinent information is maintained.

Viscosity. Viscosity pertains to the resistance within the data flow, influenced by the complexity and diversity of data sources. High viscosity can result from integration challenges and the friction encountered during data processing. Transforming raw data into actionable insights often requires significant effort, as the diverse origins and formats of big clinical data can hinder smooth and rapid analysis. Overcoming viscosity involves streamlining data integration processes and improving interoperability to facilitate the seamless flow and processing of information.

Virality. Defined as the rate at which data spreads and is reused, virality measures how frequently data is shared and repurposed beyond its original context. In the clinical domain, high virality indicates that data is not only valuable but also widely applicable, benefiting multiple users and applications. Enhancing the virality of clinical data necessitates fostering an environment of open data sharing, where the benefits of broad data access and reuse are recognized and maximized.

**Expanded Vs of Big Data: **Validity, Volatility, Viscosity, and Virality**.** Artwork by Guibelondo, Dex Marco T. (2024).

Barriers to Big Data Exchange

Despite advancements in mining and retrieving meaningful information from big clinical data, several barriers hinder its exchange. First, administrative barriers arise due to the additional efforts and personnel costs required for mining such extensive data. Ethical barriers further complicate the situation, as data privacy concerns and varying privacy laws across different countries create significant challenges. Additionally, political barriers manifest in the reluctance to share data and the need for community-wide cooperation, which often proves difficult to achieve. Lastly, technical barriers pose a major obstacle, with poor data interoperability, lack of standardization, and insufficient support for standardized protocols and formats impeding effective data exchange.

**Identified barriers to depositing industry environmental data in open-access repositories.** Murray, F., Needham, K., Gormley, K., Rouse, S., Coolen, J., Billet, D., Dannheim, J., Birchenough, S., Hyder, K., Heard, R., Ferris, J., Holstein, J., Henry, L-A., McMeel, O., Calewaert, J-B. & Roberts J. (2018). Contents lists available at ScienceDirectMarine Policyjournal homepage: www.elsevier.com/locate/marpolData challenges and opportunities for environmental management of NorthSea oil and gas decommissioning in an era of blue growth. Marine Policy 97(11). doi: 10.1016/j.marpol.2018.05.021.

The Path Forward: Standardization and Collaboration

Addressing these challenges necessitates a collaborative effort across the healthcare community. Key steps include accelerating progress toward standardized data models using advanced techniques like ontologies and the Semantic Web. Ontologies provide a common terminology, overcoming language barriers and enabling data and metadata to be queried universally. Furthermore, demonstrating the advantages of using real-world clinical data through high-quality research can highlight the benefits of data exchange, fostering broader acceptance and implementation.

Conclusion

The rapid increase in data volume presents both challenges and opportunities. While big clinical data is characterized by its volume, variety, velocity, and veracity, several barriers limit its effective exchange. Overcoming these barriers through standardization and collaborative efforts is essential for harnessing the full potential of clinical big data, ultimately leading to improved healthcare outcomes.

Engr. Dex Marco Tiu Guibelondo, B.Sc. Pharm, R.Ph., B.Sc. CpE

Editor-in-Chief, PharmaFEATURES

AI, Data & Technology

December 04, 2024

Precision in Three Dimensions: A Novel Approach to Tumor Resection and Reconstruction of the Femoral Trochanter

The integration of digital modeling and personalized guides into the surgical workflow transforms the execution of tumor resection and reconstruction.

AI, Data & Technology

December 03, 2024

Blueprint for the Future: Establishing Rigorous Standards for Medical AI Data

Medical AI requires not just vast datasets but datasets of impeccable quality.

AI, Data & Technology

November 27, 2024

Halides in Focus: A Fluorometric Leap for Clinical Diagnostics

The CA-Cys system could redefine the standard of care for halide-related diagnostics, bridging the gap between laboratory precision and point-of-care accessibility.

AI, Data & Technology

November 22, 2024

Beyond Human Vision: Revolutionizing Artificial Retinas with Photonic Synaptic Transistors

The integration of vision and intelligence is a hallmark of human cognition. Inspired by this interplay, MoS₂ transistors offer a platform for neuromorphic imaging.

Interviews April 30, 2025

Setting the Benchmark: Shaping Analytical Standards to Accelerate Global Convergence in Biologics Quality Systems with Stephan Krause, Bristol Myers Squibb

About the Interviewee Stephan O. Krause is the Executive Director of Cell Therapy Global Quality of Bristol Myers Squibb. Stephan O. Krause, Ph.D., serves as Executive Director for Analytical Science and Technology in Cell Therapy Quality at Bristol Myers Squibb, where he leads global analytical and quality functions supporting the development, manufacture, and regulatory advancement […]

Interviews April 29, 2025

Harmonizing Biologics Transfer: Global Regulatory Strategy, Compliance Best Practices, and Operational Alignment with Gopi Vudathala, Incyte Corporation

About the Interviewee Gopi Vudathala is the Global Head of Regulatory Affairs and Chemistry, Manufacturing and Controls at Incyte Corporation. Gopi Vudathala, Ph.D., serves as the Global Head of Regulatory Affairs and Chemistry, Manufacturing and Controls (CMC) at Incyte Corporation, a biopharmaceutical company dedicated to the discovery, development, and commercialization of proprietary therapeutics across oncology […]

Interviews April 25, 2025

Redefining the Analytical Frontiers of Peptide Science: Innovations Shaping the Next Generation of Therapeutics with Johan Evenäs, RG Discovery

About the Interviewee Johan Evenäs is the Chief Executive Officer at RG Discovery. Johan Evenäs, Ph.D., serves as the Chief Executive Officer of RG Discovery, a life sciences company based in Lund, Sweden, specializing in drug discovery solutions including medicinal chemistry, fragment-based lead discovery, and advanced analytical services. Dr. Evenäs holds an M.Sc. in Chemical […]

Interviews April 24, 2025

Toward Industrial Impact: Scaling the Strategic Vision for Bioprocessing Excellence with Greg Papastoitsis, Ankyra Therapeutics

About the Interviewee Gregory Zarbis-Papastoitsis is the Chief Process and Manufacturing Officer at Ankyra Therapeutics. Gregory Zarbis-Papastoitsis, Ph.D., serves as the Chief Process and Manufacturing Officer at Ankyra Therapeutics, an immuno-oncology company advancing novel intratumoral anchored cytokines currently in Phase 1 clinical trials. Dr. Zarbis-Papastoitsis holds a B.S. and Ph.D. in Biochemistry from Binghamton University, […]

Interviews April 23, 2025

Enhancing Analytical Method Development: Supporting Cohesive CMC Integration Across Drug Lifecycle Management with Seshu Tyagarajan, Candel Therapeutics

About the Interviewee Seshu Tyagaran is the Chief Technical and Development Officer at Candel Therapeutics. Seshu Tyagarajan, Ph.D., serves as the Chief Technical and Development Officer at Candel Therapeutics, where she leads global technical operations across chemistry, manufacturing, and controls (CMC), driving the clinical and commercial advancement of novel oncolytic viral immunotherapies. With over two […]

Interviews April 22, 2025

Precision in Three Dimensions: A Novel Approach to Tumor Resection and Reconstruction of the Femoral Trochanter

Blueprint for the Future: Establishing Rigorous Standards for Medical AI Data

Halides in Focus: A Fluorometric Leap for Clinical Diagnostics

AI, Data & Technology

Data Deluge in Modern Healthcare: Navigating the Big Data Ocean

Related Posts

AI, Data & Technology

Precision in Three Dimensions: A Novel Approach to Tumor Resection and Reconstruction of the Femoral Trochanter

AI, Data & Technology

Blueprint for the Future: Establishing Rigorous Standards for Medical AI Data

AI, Data & Technology

Halides in Focus: A Fluorometric Leap for Clinical Diagnostics

AI, Data & Technology

Beyond Human Vision: Revolutionizing Artificial Retinas with Photonic Synaptic Transistors

Read More Articles

Setting the Benchmark: Shaping Analytical Standards to Accelerate Global Convergence in Biologics Quality Systems with Stephan Krause, Bristol Myers Squibb

Harmonizing Biologics Transfer: Global Regulatory Strategy, Compliance Best Practices, and Operational Alignment with Gopi Vudathala, Incyte Corporation

Redefining the Analytical Frontiers of Peptide Science: Innovations Shaping the Next Generation of Therapeutics with Johan Evenäs, RG Discovery

Toward Industrial Impact: Scaling the Strategic Vision for Bioprocessing Excellence with Greg Papastoitsis, Ankyra Therapeutics

Enhancing Analytical Method Development: Supporting Cohesive CMC Integration Across Drug Lifecycle Management with Seshu Tyagarajan, Candel Therapeutics

Driving Drug Discovery Innovation Through DEL Technologies with Hajnalka Hartl, Orogen Therapeutics

Myosin’s Molecular Toggle: How Dimerization of the Globular Tail Domain Controls the Motor Function of Myo5a

Invisible Couriers: How Lab-on-Chip Technologies Are Rewriting the Future of Disease Diagnosis

Precision in Three Dimensions: A Novel Approach to Tumor Resection and Reconstruction of the Femoral Trochanter

Blueprint for the Future: Establishing Rigorous Standards for Medical AI Data

Halides in Focus: A Fluorometric Leap for Clinical Diagnostics

AI, Data & Technology

Data Deluge in Modern Healthcare: Navigating the Big Data Ocean

Subscribe to get our LATEST NEWS

Related Posts

AI, Data & Technology

Precision in Three Dimensions: A Novel Approach to Tumor Resection and Reconstruction of the Femoral Trochanter

AI, Data & Technology

Blueprint for the Future: Establishing Rigorous Standards for Medical AI Data

AI, Data & Technology

Halides in Focus: A Fluorometric Leap for Clinical Diagnostics

AI, Data & Technology

Beyond Human Vision: Revolutionizing Artificial Retinas with Photonic Synaptic Transistors

Read More Articles

Setting the Benchmark: Shaping Analytical Standards to Accelerate Global Convergence in Biologics Quality Systems with Stephan Krause, Bristol Myers Squibb

Harmonizing Biologics Transfer: Global Regulatory Strategy, Compliance Best Practices, and Operational Alignment with Gopi Vudathala, Incyte Corporation

Redefining the Analytical Frontiers of Peptide Science: Innovations Shaping the Next Generation of Therapeutics with Johan Evenäs, RG Discovery

Toward Industrial Impact: Scaling the Strategic Vision for Bioprocessing Excellence with Greg Papastoitsis, Ankyra Therapeutics

Enhancing Analytical Method Development: Supporting Cohesive CMC Integration Across Drug Lifecycle Management with Seshu Tyagarajan, Candel Therapeutics

Driving Drug Discovery Innovation Through DEL Technologies with Hajnalka Hartl, Orogen Therapeutics

Myosin’s Molecular Toggle: How Dimerization of the Globular Tail Domain Controls the Motor Function of Myo5a

Invisible Couriers: How Lab-on-Chip Technologies Are Rewriting the Future of Disease Diagnosis

Subscribe
to get our
LATEST NEWS