Skip to content

From Model Change to "Talk to Your Data"

When a data model changes in RTiS, the roche-data platform automatically propagates that change through every layer of the data infrastructure. What used to require weeks of coordination across multiple teams now happens in a single automated pipeline.

A domain expert updates an entity definition in RTiS — Roche’s canonical terminology and information system. This might be adding a new attribute to “Waste Tracking,” refining a classification hierarchy, or introducing a new entity for a manufacturing process.

RTiS is the single source of truth. Every downstream artifact derives from this definition.

The pipeline detects the change and pulls the updated model:

rdt-model-pull --entity waste-tracking
rdt-model-compile --entity waste-tracking

This generates a data contract — a versioned, schema-validated YAML document that formally specifies what this data looks like, what quality rules apply, and who owns it. The contract is the handshake between data producers and consumers.

From the contract, the compiler generates:

LayerWhat’s generatedPurpose
Bronze DDLCREATE ICEBERG TABLE statementIceberg landing zone — append-only, immutable, S3-backed open format
dbt BronzeModel + schema YAMLMaterializes raw data with G1 completeness checks
dbt SilverView + schema YAMLVirtual view with G2 validity checks (MRHub identity, referential integrity)
dbt GoldView + schema YAMLVirtual view with G3 business rules (range checks, cross-field consistency)

Key innovation: Silver and Gold are virtual views, not physical tables. Data is written exactly once to Bronze. Quality gates are embedded as view predicates that execute at query time. A correction in Bronze propagates instantly — no reprocessing required.

The compiler generates Snowflake Semantic YAML — machine-readable definitions of every metric, dimension, and entity:

  • What does “cycle time” mean? → Defined in the semantic layer
  • How is “waste rate” calculated? → Formula embedded in the semantic definition
  • What dimensions can I filter by? → Enumerated with business-friendly labels

This semantic layer is what makes the data AI-readable. Without it, a language model cannot know that “How much waste did Site X produce last quarter?” maps to a specific table, column, and aggregation.

Two API artifacts are generated:

  • OpenAPI specification — A standard REST API definition that can be published through Mulesoft as a managed API proxy. Enables programmatic access with standard authentication, rate limiting, and monitoring.

  • MCP tool definition — A Model Context Protocol tool specification that AI assistants (Claude, Cortex Analyst, custom agents) can discover and invoke. This is the bridge between governed data and conversational AI.

From declarative YAML definitions, the compiler generates:

  • OPA Rego policies — Executable policy rules for validation, access control, workflow enforcement, API authorization, audit triggers, and deployment gates
  • Kubernetes manifests — One OPA deployment per entity, automatically configured with the right policies and lookup data

These policies run as live services on Roche’s managed Kubernetes platform (CaaS), enforcing governance rules at runtime — not just at build time.

The final gate compares incoming data against 30-day trends using statistical analysis and AI guardrails:

  • Is this value within expected range?
  • Does this pattern match historical behavior?
  • Are there anomalies that need human review?

Data that passes G4 earns Certified status — the highest trust level. Only certified data is surfaced to Cortex Analyst and external-facing APIs.

With all layers in place, a business user can open Snowflake Cortex Analyst and ask:

“What was the waste tracking compliance rate at the Basel site last month?”

Cortex Analyst:

  1. Resolves the question against the semantic layer (generated in step 4)
  2. Queries the Gold view (generated in step 3, filtered by G3 quality rules)
  3. Returns a governed, quality-assured answer with full data lineage

The same query works through MCP-compatible AI tools, custom dashboards, or the generated REST API.


StageTraditional approachWith roche-data
Model definitionManual documentationRTiS (already exists)
Contract creation1–2 weeks, manualSeconds, automated
Schema + dbt models2–3 weeks, 2 engineersSeconds, automated
Quality rules1 week, specialistCompiled from YAML
Semantic layerOften never createdSeconds, automated
API specification3–5 days, developerSeconds, automated
Policy enforcementManual checklistsCompiled to OPA, live enforcement
AI readinessSeparate project, monthsBuilt-in from day one
Total6–8 weeksMinutes

For data producers: Define your model in RTiS. The platform handles everything else. You get a data contract that formally describes your commitment to consumers — versioned, validated, and enforced.

For data consumers: Every data product comes with the same quality guarantees, the same semantic definitions, the same API patterns. No more guessing what a column means or whether the data has been validated.

For AI initiatives: Every data product is AI-ready from the moment it’s deployed. Semantic definitions, quality certification, and MCP tool integrations are generated automatically — not retrofitted months later.

For compliance: Every change is tracked in git. Every access is logged in Snowflake. Every policy is declared in YAML and enforced by OPA. The audit trail is generated as a byproduct of normal operations.