Biopharma Companies Discover AI’s Weak Spot: Scientific Context

Marilyne Labasque, PhD – Data Stewardship Practice Lead

Apr 15, 2026

Share This Post

Image reference: / source

Biopharma organizations and other science-driven industries are at a pivotal moment. Investments in AI are accelerating across discovery, development, and manufacturing, and expectations are rising just as quickly. However, the data that should power these ambitions is often fragmented, inconsistently managed, and missing the context needed for meaningful and responsible use. AI, whether powered by agentic systems or machine learning models, does not fall short because of the technology itself. It falls short when the basics are missing. Without the minimum stewardship and contextualization that automation and reproducibility depend on, AI cannot deliver. The hard truth remains, no matter how advanced the technology becomes, it will not create value if knowledge and data are not shared.

In an AI‑native landscape, not all information can or should be fully structured. What matters is that it is discoverable, well described, and handled in ways that preserve scientific context and ethical use. The cost of falling short shows up every day as rework, repeated experiments, stalled data initiatives and digital programs, alongside increasing scrutiny around traceability, integrity, and reproducibility.

In this context, stewardship gains renewed significance. As AI increases the scale, speed, and reach of data use, the cost of ambiguity, inconsistency, and lost context rises sharply. Decisions that were once recoverable through expert memory or manual review become opaque when automated systems operate on poorly described data. Stewardship provides the discipline needed to preserve scientific intent as data moves across experiments, systems, and analytical uses. By embedding clarity, traceability, and accountability into everyday data practices, stewardship enables organizations to scale analytics and AI with confidence rather than compounding risk.

Why Stewardship Matters Now

Scientific data today forms an intricate ecosystem: multimodal assay outputs, instrument files, electronic lab notebook (ELN) records, laboratory information management system (LIMS) entries, in vivo study data, and chemistry, manufacturing and controls (CMC) packages. Each domain evolves at its own pace. Without stewardship, coherence depends on what individuals remember, where an experiment ran, how an instrument was configured, which field captured the real scientific meaning. Even large platform investments struggle when local conventions and inconsistent practices shape how data is captured, interpreted, and shared.

Expectations across the industry now go far beyond correctness. Teams need auditable lineage, validated processing, robust description, reproducible results, and analytics they can explain with confidence. Stewardship turns these expectations into daily practice. It clarifies meaning, harmonizes descriptions, curates context, strengthens traceability, and makes reuse the rule rather than the exception. Applied systematically, stewardship shortens the time scientists spend searching for information or reconstructing decisions, slows data decay, prevents drift between silos, and lays the foundation for AI that can be trusted.

Stewardship as an Operational Capability

Much of the literature describes what stewardship should be. Fewer address what stewardship looks like in practice, within the constraints of real scientific environments. Effective stewardship requires working across roles – scientists, data owners, architects, and engineers – so that decisions reflect how experiments are designed, how instruments are used, and how results are consumed. The outcome is practical change that improves how data is captured, described, and reused, without adding friction to scientific work.

Stewardship efforts are most effective when informed by scientific depth and operational consistency. Across data domains such as biospecimen, assay, omics, biologics, CMC, and clinical data, stewardship decisions must reflect real experimental practice rather than theoretical assumptions. Leaders also need a clear view of where they stand and what to address first. Structured diagnostic approaches and capability assessments can provide visibility into current stewardship maturity and help prioritize the most impactful improvements.

Stewardship operating models vary by organization and context. Some functions are embedded within scientific teams, others operate as shared responsibility models, and some are centralized. Regardless of structure, successful efforts follow a consistent pattern: diagnosis of current practices, deliberate design of standards and workflows, development of enabling processes, and sustained delivery and improvement. When approached this way, stewardship holds up in real laboratories and scientific programs.

The Layered Model for AI‑Ready Scientific Data

Becoming AI‑ready requires more than cleaner data. It requires a layered foundation that connects scientific defensibility with machine‑actionable design and responsible use. At the base is trusted data: information that is accurate, attributable, traceable, and handled with the discipline expected in scientific settings. This ensures every data point has a clear origin and every transformation can be explained.

Above that are FAIR‑driven enhancements: semantic alignment, richer description, standardized terminologies, interoperable formats, and normalization practices. These ensure data is understandable to people and usable by automation and machine‑learning pipelines. Without this layer, even high‑quality data remains largely invisible to AI.

Finally, the human layer ensures accountability over time. Data stewards and data owners monitor drift, guide lifecycle management, and safeguard ethical use as new assays, instruments, and workflows emerge.

When these layers work together, organizations create datasets that are not only usable but also suitable for analytics and AI that must be explainable and defensible.

FAIR, Data Quality, and Stewardship: A Unified Perspective

A persistent misconception is that FAIR and data quality measure the same thing. They do not. FAIR – standing for Findable, Accessible, Interoperable, and Reusable – describes how well data and its description support discovery and reuse, especially by machines (1). Data quality focuses on characteristics such as accuracy, completeness, consistency, and timeliness. A dataset may excel at quality yet fail FAIR if its description is sparse or identifiers are not machine‑actionable. The reverse can also be true.

Both dimensions are essential for AI readiness, and stewardship is the discipline that brings them together. With trained evaluators, structured scoring rubrics, and aligned maturity models, FAIR and quality assessments can reinforce one another rather than duplicate effort. This integration makes FAIR durable, preventing the drift that occurs when assessments are infrequent, inconsistent, or detached from daily practice.

Real‑World Impact: A Practical Example

A large pharmaceutical organization sought to resolve longstanding issues in its CMC equipment reference and master data. Thousands of equipment records had accumulated across sites, each with different naming conventions and inconsistent descriptions. Scientists struggled to select the appropriate record, integrations frequently broke, and teams could not rely on a single, authoritative view of the equipment used in critical CMC processes.

Working across CMC, IT, and digital teams, legacy records were standardized and harmonized, and a dynamic data collection template was introduced to ensure new entries were captured consistently at creation. This effort reconciled thousands of records, established clear naming and de‑duplication rules, and introduced preventive checks so issues were identified at source rather than downstream.

Within months, the organization gained a unified equipment data model across multiple CMC data environments, with improved FAIR characteristics and reliable interoperability. Scientists and engineers could find and reuse equipment information in hours rather than days, integration issues decreased, and longstanding traceability gaps were closed. The result was not only cleaner and more reliable data, but a sustainable way of working that supported future analytics and model‑driven CMC initiatives.

Assessing Stewardship Capability

Structured stewardship assessments offer a way to understand how scientific data is managed in practice. Adapted from existing stewardship maturity approaches and aligned with FAIR and established quality principles, these assessments evaluate both individual datasets and the conditions surrounding their management (2).

Key components – preservability, accessibility, usability, preventive quality practices, lifecycle monitoring, transparency, integrity, and sustainability – create an evidence‑based view of current capability and the most impactful next steps (2, 3). Clear maturity levels provide a progression path, and alignment with FAIR maturity indicators ensures that improvements support both human understanding and machine‑actionable reuse (4).

Evidence That Matters to Executives

Stewardship must demonstrate tangible impact. Indicators such as reduced time‑to‑find, increased reuse, faster remediation turnaround, shorter analytics cycles, and stronger audit outcomes translate stewardship from concept to performance. These measures connect stewardship to outcomes executives care about: productivity, risk reduction, and return on scientific and digital investments.

Organizations that begin in one domain, demonstrate value, and then expand to adjacent areas often experience faster returns and more durable improvement, as teams build confidence and adopt new ways of working. Over time, the foundation strengthens and AI initiatives progress beyond isolated pilots.

The Leadership Mandate

AI‑ready data cannot be bolted onto the end of a project. It must be designed, curated, managed, and monitored from the outset. Leaders who expect AI to play a strategic role must treat stewardship as a core enabler: establish repeatable assessment practices, ensure stewards and related roles receive formal training, embed stewardship into the broader scientific data foundation, and make value visible through shared metrics.

Stewardship is not optional for organizations that aim for scientific excellence. It is the operational backbone that turns data into a durable, strategic asset, supporting innovation, compliance confidence, and AI at scale.

References

Wilkinson MD et al., The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18. Erratum in: Sci Data. 2019 Mar 19;6(1):6. DOI: 10.1038/s41597-019-0009-6
Peng G et al., Crosswalks among stewardship maturity assessment approaches promoting trustworthy FAIR data and repositories. Sci Data. 2022 Sep 21;9(1):576. DOI: 10.1038/s41597-022-01683-x
Peng G et al., Practical Application of a Data Stewardship Maturity Matrix for the NOAA OneStop Project, 2019, Data Science Journal, DOI: 10.5334/dsj-2019-041
FAIR Maturity Matrix | An organisational maturity model of FAIR implementation