From Model Change to "Talk to Your Data"
The End-to-End Pipeline
Section titled “The End-to-End Pipeline”When a data model changes in RTiS, the roche-data platform automatically propagates that change through every layer of the data infrastructure. What used to require weeks of coordination across multiple teams now happens in a single automated pipeline.
Step-by-Step Journey
Section titled “Step-by-Step Journey”1. Model Change in RTiS
Section titled “1. Model Change in RTiS”A domain expert updates an entity definition in RTiS — Roche’s canonical terminology and information system. This might be adding a new attribute to “Waste Tracking,” refining a classification hierarchy, or introducing a new entity for a manufacturing process.
RTiS is the single source of truth. Every downstream artifact derives from this definition.
2. Model Pull & Contract Generation
Section titled “2. Model Pull & Contract Generation”The pipeline detects the change and pulls the updated model:
rdt-model-pull --entity waste-trackingrdt-model-compile --entity waste-trackingThis generates a data contract — a versioned, schema-validated YAML document that formally specifies what this data looks like, what quality rules apply, and who owns it. The contract is the handshake between data producers and consumers.
3. Snowflake Schema & dbt Models
Section titled “3. Snowflake Schema & dbt Models”From the contract, the compiler generates:
| Layer | What’s generated | Purpose |
|---|---|---|
| Bronze DDL | CREATE ICEBERG TABLE statement | Iceberg landing zone — append-only, immutable, S3-backed open format |
| dbt Bronze | Model + schema YAML | Materializes raw data with G1 completeness checks |
| dbt Silver | View + schema YAML | Virtual view with G2 validity checks (MRHub identity, referential integrity) |
| dbt Gold | View + schema YAML | Virtual view with G3 business rules (range checks, cross-field consistency) |
Key innovation: Silver and Gold are virtual views, not physical tables. Data is written exactly once to Bronze. Quality gates are embedded as view predicates that execute at query time. A correction in Bronze propagates instantly — no reprocessing required.
4. Semantic Layer
Section titled “4. Semantic Layer”The compiler generates Snowflake Semantic YAML — machine-readable definitions of every metric, dimension, and entity:
- What does “cycle time” mean? → Defined in the semantic layer
- How is “waste rate” calculated? → Formula embedded in the semantic definition
- What dimensions can I filter by? → Enumerated with business-friendly labels
This semantic layer is what makes the data AI-readable. Without it, a language model cannot know that “How much waste did Site X produce last quarter?” maps to a specific table, column, and aggregation.
5. API & MCP Tool Generation
Section titled “5. API & MCP Tool Generation”Two API artifacts are generated:
-
OpenAPI specification — A standard REST API definition that can be published through Mulesoft as a managed API proxy. Enables programmatic access with standard authentication, rate limiting, and monitoring.
-
MCP tool definition — A Model Context Protocol tool specification that AI assistants (Claude, Cortex Analyst, custom agents) can discover and invoke. This is the bridge between governed data and conversational AI.
6. Policy Compilation & Deployment
Section titled “6. Policy Compilation & Deployment”From declarative YAML definitions, the compiler generates:
- OPA Rego policies — Executable policy rules for validation, access control, workflow enforcement, API authorization, audit triggers, and deployment gates
- Kubernetes manifests — One OPA deployment per entity, automatically configured with the right policies and lookup data
These policies run as live services on Roche’s managed Kubernetes platform (CaaS), enforcing governance rules at runtime — not just at build time.
7. Quality Certification (G4)
Section titled “7. Quality Certification (G4)”The final gate compares incoming data against 30-day trends using statistical analysis and AI guardrails:
- Is this value within expected range?
- Does this pattern match historical behavior?
- Are there anomalies that need human review?
Data that passes G4 earns Certified status — the highest trust level. Only certified data is surfaced to Cortex Analyst and external-facing APIs.
8. “Talk to Your Data”
Section titled “8. “Talk to Your Data””With all layers in place, a business user can open Snowflake Cortex Analyst and ask:
“What was the waste tracking compliance rate at the Basel site last month?”
Cortex Analyst:
- Resolves the question against the semantic layer (generated in step 4)
- Queries the Gold view (generated in step 3, filtered by G3 quality rules)
- Returns a governed, quality-assured answer with full data lineage
The same query works through MCP-compatible AI tools, custom dashboards, or the generated REST API.
Timeline Comparison
Section titled “Timeline Comparison”| Stage | Traditional approach | With roche-data |
|---|---|---|
| Model definition | Manual documentation | RTiS (already exists) |
| Contract creation | 1–2 weeks, manual | Seconds, automated |
| Schema + dbt models | 2–3 weeks, 2 engineers | Seconds, automated |
| Quality rules | 1 week, specialist | Compiled from YAML |
| Semantic layer | Often never created | Seconds, automated |
| API specification | 3–5 days, developer | Seconds, automated |
| Policy enforcement | Manual checklists | Compiled to OPA, live enforcement |
| AI readiness | Separate project, months | Built-in from day one |
| Total | 6–8 weeks | Minutes |
What Changes for Your Team
Section titled “What Changes for Your Team”For data producers: Define your model in RTiS. The platform handles everything else. You get a data contract that formally describes your commitment to consumers — versioned, validated, and enforced.
For data consumers: Every data product comes with the same quality guarantees, the same semantic definitions, the same API patterns. No more guessing what a column means or whether the data has been validated.
For AI initiatives: Every data product is AI-ready from the moment it’s deployed. Semantic definitions, quality certification, and MCP tool integrations are generated automatically — not retrofitted months later.
For compliance: Every change is tracked in git. Every access is logged in Snowflake. Every policy is declared in YAML and enforced by OPA. The audit trail is generated as a byproduct of normal operations.