Skip to content

Vision & Business Value

Roche Global IT manages a growing ecosystem of data products across manufacturing sites, supply chain operations, and research facilities worldwide. Today, each new data product requires:

ActivityTypical effortWho
Define data contract1–2 weeksData architect
Create Snowflake schemas3–5 daysData engineer
Build dbt models (Bronze/Silver/Gold)1–2 weeksAnalytics engineer
Write quality checks1 weekData quality specialist
Create API specification3–5 daysAPI developer
Set up governance metadata1 weekData steward
Total6–8 weeks4–6 specialists

This process repeats for every entity in every domain. With hundreds of isolated data products and growing, the cost compounds:

  • Inconsistency — Each team interprets standards differently. KPI definitions diverge across systems.
  • Fragility — No data contracts means schema changes cascade unpredictably. A column rename in one system breaks three downstream dashboards.
  • No AI readiness — Zero semantic definitions exist for natural-language queries. Generative AI tools cannot access governed data.
  • Linear scaling — Every new business question requires the same 6–8 week cycle with the same specialist bottleneck.

roche-data is a data infrastructure compiler. It takes a canonical data model from RTiS (Roche’s Terminology and Information System) and automatically generates the complete artifact stack:

Artifact generation — RTiS model compiled into storage, interface, and policy artifacts

One command. One source of truth. Complete automation.

The same entity model that defines “what this data means” in RTiS now drives every layer of the data platform — from physical storage through quality assurance to AI-ready APIs.

What previously required 6–8 weeks of specialist work now happens in a single automated pipeline run. A model change in RTiS triggers regeneration of all downstream artifacts, tested and deployed through CI/CD.

Every data product follows identical patterns because they are generated from the same templates. There is no room for interpretation drift — the compiler enforces the standard.

The four-gate data quality architecture (G1–G4) is not bolted on after the fact. Quality predicates are embedded directly into the generated dbt views. Every record earns its trust level through progressive validation.

Every generated artifact includes semantic definitions that Snowflake Cortex Analyst understands natively. The moment data passes quality certification, it becomes queryable through natural language — no additional integration work required.

Every artifact is version-controlled in git. Every policy is compiled from a declarative YAML definition into enforceable OPA rules. Every data access is logged in Snowflake. The compliance story writes itself.

Onboarding a new data domain means defining entities in RTiS and running the compiler. The same tooling, same quality gates, same governance model — applied uniformly across every domain at Roche.

StakeholderValue delivered
Data domain ownersSelf-service data product creation. Define the model, the platform handles the rest.
Data engineersNo more hand-writing repetitive dbt models and DDL. Focus on complex transformations.
Data stewardsGovernance metadata generated automatically. Quality gates enforced consistently.
Business analystsSemantic layer available from day one. Natural-language queries through Cortex Analyst.
AI/ML teamsQuality-certified, semantically rich data accessible through MCP tools and APIs.
Compliance & auditFull git-based audit trail. Policy enforcement through OPA. Access logging in Snowflake.
LeadershipPredictable timelines. Consistent quality. Measurable ROI on data platform investment.

roche-data directly enables three strategic priorities:

  1. Roche AI Journey — Generative AI requires trusted, semantically defined data. roche-data is the foundation that makes this possible across all domains.

  2. Operational excellence — Automating the data pipeline lifecycle reduces time-to-value from weeks to minutes while eliminating human error in repetitive engineering tasks.

  3. Global standardization — A single compiler ensures every data domain at Roche follows identical patterns for quality, governance, and access — regardless of which team or site operates it.