From Model Change to "Talk to Your Data"

The End-to-End Pipeline

When a data model changes in RTiS, the roche-data platform automatically propagates that change through every layer of the data infrastructure. What used to require weeks of coordination across multiple teams now happens in a single automated pipeline.

Step-by-Step Journey

1. Model Change in RTiS

A domain expert updates an entity definition in RTiS — Roche’s canonical terminology and information system. This might be adding a new attribute to “Waste Tracking,” refining a classification hierarchy, or introducing a new entity for a manufacturing process.

RTiS is the single source of truth. Every downstream artifact derives from this definition.

2. Model Pull & Contract Generation

The pipeline detects the change and pulls the updated model:

rdt-model-pull --entity waste-tracking
rdt-model-compile --entity waste-tracking

This generates a data contract — a versioned, schema-validated YAML document that formally specifies what this data looks like, what quality rules apply, and who owns it. The contract is the handshake between data producers and consumers.

3. Snowflake Schema & dbt Models

From the contract, the compiler generates:

Layer	What’s generated	Purpose
Bronze DDL	`CREATE ICEBERG TABLE` statement	Iceberg landing zone — append-only, immutable, S3-backed open format
dbt Bronze	Model + schema YAML	Materializes raw data with G1 completeness checks
dbt Silver	View + schema YAML	Virtual view with G2 validity checks (MRHub identity, referential integrity)
dbt Gold	View + schema YAML	Virtual view with G3 business rules (range checks, cross-field consistency)

Key innovation: Silver and Gold are virtual views, not physical tables. Data is written exactly once to Bronze. Quality gates are embedded as view predicates that execute at query time. A correction in Bronze propagates instantly — no reprocessing required.

4. Semantic Layer

The compiler generates Snowflake Semantic YAML — machine-readable definitions of every metric, dimension, and entity:

What does “cycle time” mean? → Defined in the semantic layer
How is “waste rate” calculated? → Formula embedded in the semantic definition
What dimensions can I filter by? → Enumerated with business-friendly labels

This semantic layer is what makes the data AI-readable. Without it, a language model cannot know that “How much waste did Site X produce last quarter?” maps to a specific table, column, and aggregation.

5. API & MCP Tool Generation

Two API artifacts are generated:

OpenAPI specification — A standard REST API definition that can be published through Mulesoft as a managed API proxy. Enables programmatic access with standard authentication, rate limiting, and monitoring.
MCP tool definition — A Model Context Protocol tool specification that AI assistants (Claude, Cortex Analyst, custom agents) can discover and invoke. This is the bridge between governed data and conversational AI.

6. Policy Compilation & Deployment

From declarative YAML definitions, the compiler generates:

OPA Rego policies — Executable policy rules for validation, access control, workflow enforcement, API authorization, audit triggers, and deployment gates
Kubernetes manifests — One OPA deployment per entity, automatically configured with the right policies and lookup data

These policies run as live services on Roche’s managed Kubernetes platform (CaaS), enforcing governance rules at runtime — not just at build time.

7. Quality Certification (G4)

The final gate compares incoming data against 30-day trends using statistical analysis and AI guardrails:

Is this value within expected range?
Does this pattern match historical behavior?
Are there anomalies that need human review?

Data that passes G4 earns Certified status — the highest trust level. Only certified data is surfaced to Cortex Analyst and external-facing APIs.

8. “Talk to Your Data”

With all layers in place, a business user can open Snowflake Cortex Analyst and ask:

“What was the waste tracking compliance rate at the Basel site last month?”

Cortex Analyst:

Resolves the question against the semantic layer (generated in step 4)
Queries the Gold view (generated in step 3, filtered by G3 quality rules)
Returns a governed, quality-assured answer with full data lineage

The same query works through MCP-compatible AI tools, custom dashboards, or the generated REST API.

Timeline Comparison

Stage	Traditional approach	With roche-data
Model definition	Manual documentation	RTiS (already exists)
Contract creation	1–2 weeks, manual	Seconds, automated
Schema + dbt models	2–3 weeks, 2 engineers	Seconds, automated
Quality rules	1 week, specialist	Compiled from YAML
Semantic layer	Often never created	Seconds, automated
API specification	3–5 days, developer	Seconds, automated
Policy enforcement	Manual checklists	Compiled to OPA, live enforcement
AI readiness	Separate project, months	Built-in from day one
Total	6–8 weeks	Minutes

What Changes for Your Team

For data producers: Define your model in RTiS. The platform handles everything else. You get a data contract that formally describes your commitment to consumers — versioned, validated, and enforced.

For data consumers: Every data product comes with the same quality guarantees, the same semantic definitions, the same API patterns. No more guessing what a column means or whether the data has been validated.

For AI initiatives: Every data product is AI-ready from the moment it’s deployed. Semantic definitions, quality certification, and MCP tool integrations are generated automatically — not retrofitted months later.

For compliance: Every change is tracked in git. Every access is logged in Snowflake. Every policy is declared in YAML and enforced by OPA. The audit trail is generated as a byproduct of normal operations.