Skip to content

From Data Model to AI-Ready Insights — In Minutes

A data infrastructure compiler that turns Roche's canonical data models into production-ready pipelines, governed APIs, and conversational AI — fully automated, fully auditable, fully compliant.

RDT MODEL is a data infrastructure compiler for Roche Global IT. A single command — rdt-model-compile run --entity <name> — takes an RTiS ontology model, assembles metadata from RTiS (schema), Collibra (governance), and platform config (infrastructure), then orchestrates 18 specialized CLI modules across 6 pipeline phases to produce a complete data product: Snowflake layers, data contracts, OPA policies, SDKs, MCP tools, API specs, documentation, and audit trail. All artifacts are committed to git and deployed through GitHub Actions.

RTiS is the grammar. RDT MODEL is the compiler. Git is the object store. dbt is the runtime.


ModulePurpose
rdt-model-pullPull RTiS entity model (ontologies, terminologies, synonyms)
rdt-model-profileProfile existing database tables for upstream discovery
ModulePurpose
rdt-model-governPull governance metadata from Collibra (ownership, SLAs, PII flags)
rdt-model-inferOptional — LLM-based term mappings, descriptions, and DQ rules
ModulePurpose
rdt-model-compilePipeline orchestrator — compile all artifacts from enriched model
rdt-model-validateValidate all generated artifacts against schemas, syntax rules, and contracts
ModulePurpose
rdt-model-storeDeploy Snowflake DDL — Bronze table, Silver/Gold/Semantic views via dbt
rdt-model-policyGenerate OPA Rego policies, Kubernetes manifests, and dbt DQ tests
rdt-model-apiPublish OpenAPI spec to Mulesoft Anypoint
rdt-model-mcpBuild and register MCP tool interface for AI agents
rdt-model-sdkGenerate Python + CLI SDKs with typed access to Snowflake artifacts
rdt-model-contractGenerate data contract (datacontract.com 1.1.0) for pipeline validation
ModulePurpose
rdt-model-registerPush lineage to Collibra, register in Horizon + RDM
rdt-model-gupriRegister persistent identifiers with GUPRI
rdt-model-searchPush offline documentation to Sinequa enterprise search
ModulePurpose
rdt-model-docsGenerate Starlight reference documentation from all sources
rdt-model-cidbCreate ServiceNow change records for production deployments
rdt-model-eventPublish data product creation/update events to Solace event bus

Roche operates hundreds of isolated data products across global sites — each with its own schema definitions, quality rules, and access patterns. Adding a new business question today takes 6–8 weeks and requires specialist intervention at every layer.

RDT MODEL changes this equation. One command. Minutes, not months.

Automated Pipeline Generation

A single rdt-model-compile run command reads a canonical data model from RTiS and generates every downstream artifact: data contracts, Snowflake schemas, dbt models across Bronze/Silver/Gold layers, semantic definitions, OpenAPI specs, and AI tool definitions. All committed to git. All deployed through CI/CD.

Four-Gate Data Quality

Every data record passes through four progressive quality gates — from technical completeness (G1) through business validity (G2), domain-specific rules (G3), to AI-readiness certification (G4). No record reaches a dashboard or AI model without earning its trust level.

Enterprise Policy Enforcement

A unified policy engine powered by Open Policy Agent (OPA) governs six domains from a single YAML definition: validation rules, row/column access control, workflow state machines, API authorization, audit compliance triggers, and deployment gates.

AI-Ready from Day One

Every generated artifact includes semantic definitions that Snowflake Cortex Analyst and MCP-compatible AI tools understand natively. The moment data passes all four quality gates, it’s available for natural-language queries.