Skip to content

Solution Architecture

FieldValue
Document TitleRDT MODEL — Solution Architecture
Version1.0
Date2026-05-03
StatusDraft
ClassificationRoche Internal
RoleName
AuthorSebastian Streit
ReviewerXavier Gutierrez
ApproverNick Perry
ApproverPaulina Maria Swiecicka
VersionDateAuthorChanges
0.12026-05-03Sebastian StreitInitial draft — all sections

This document describes the solution architecture for RDT MODEL — a data infrastructure compiler built as a Rust CLI platform for Roche Global IT. The system takes an RTiS ontology model as input and produces a complete, certified data product as output: all Snowflake layers (Bronze/Silver/Gold/Semantic), data contract, OPA policies, SDK, MCP tool, API specification, documentation, and audit trail.

A single command — rdt-model-compile run --entity <name> — orchestrates 18 specialised CLI modules across 6 pipeline phases to deliver every artifact required for a data product to be discoverable, governed, quality-assured, and AI-ready.

This architecture covers:

  • The 18-module CLI pipeline and its orchestration model
  • All 21 external system integrations (sources, targets, bidirectional)
  • The Snowflake medallion architecture (Bronze/Silver/Gold/Semantic)
  • Data quality enforcement (OPA real-time + dbt batch, gates G1–G4)
  • The shared Rust library (rdt-model-common) and cross-cutting patterns
  • Two Streamlit in Snowflake UI applications (CRUD, Ratification)
  • Physical infrastructure: Snowflake, CaaS Kubernetes, Vault, GitHub Actions
  • Environment strategy: dev/test/prod through configuration, not separate systems
#Assumption
A1RTiS is the canonical source of truth for all data model definitions. Every data product originates from an RTiS entity.
A2Snowflake is the sole target analytics platform. All medallion layers deploy to a single Snowflake account with schema-level isolation per environment.
A3Stub-first development: modules implement the full interface using stub clients until access tasks (A01–A19) are resolved. The pipeline is runnable in --dry-run mode without credentials.
A4All environments (dev/test/prod) share the same physical infrastructure. Separation is achieved through configuration (schema prefixes, K8s namespaces, Vault paths).
A5GitHub Actions is the CI/CD platform. All deployment flows run through GitHub Actions workflows.
A6Collibra is the enterprise data governance platform. Stewardship metadata is pulled at generation time; lineage is pushed at deployment time.
A7PingFederate (via Snowflake WAM) is the OAuth identity provider for all Snowflake access.
#ConstraintImpact
C1Access tasks (A01–A19) block live integrations with external systems. Until resolved, all modules use StubClient implementations returning fixture data.Modules are developed and tested against stubs; live integration is a configuration change, not a code change.
C2Single Snowflake account for all environments. No separate accounts for dev/test/prod.Environment isolation relies on schema naming conventions (DEV_BRONZE, TEST_BRONZE, PROD_BRONZE).
C3Roche VPN required for RTiS, GUPRI, MRHub, and Vault access. GitHub Actions runners must be on-network.CI/CD runners must be self-hosted or use Roche’s VPN-connected runner pool.
C4Collibra deployment model (on-prem vs. cloud) not yet confirmed. Network path may require async batch sync.Architecture supports both real-time REST and batch file exchange patterns.
C5Rust expertise required for CLI development.Mitigated by LLM-assisted development and comprehensive ADR documentation.
DocumentLocation
ADR 0001 — Project Visionadr/0001-project-vision.md
ADR 0007 — Data Product Lifecycleadr/0007-data-product-lifecycle.md
ADR 0009 — Module I/O Contractsadr/0009-module-io-contracts.md
ADR 0010 — Environment Strategyadr/0010-environment-strategy.md
ADR 0011 — Pipeline Restructureadr/0011-pipeline-restructure.md
Pipeline Overviewdocs/src/content/docs/architecture/pipeline-overview.md

TermDefinition
RTiSRoche Terminology and Information Services — the canonical source of data model definitions, ontologies, terminologies, and synonyms. Deployed on AWS behind Roche VPN.
GUPRIGlobally Unique Persistent Roche Identifier — a persistent identifier system that assigns resolvable URIs to every artifact. Ensures global uniqueness across Roche systems.
CollibraEnterprise data governance platform providing stewardship, ownership, classification, SLAs, PII flags, and lineage tracking. Bidirectional: provides metadata at generation, receives lineage on deployment.
Data ProductThe complete output of one pipeline run for one entity: all Snowflake layers, data contract, policies, SDK, MCP tool, API spec, documentation, and audit trail. There is no partial product.
Medallion ArchitectureA layered data architecture pattern: Bronze (raw, append-only), Silver (curated, validity-checked), Gold (business-ready, rule-checked), Semantic (AI-queryable, metric definitions). See ADR 0004.
OPAOpen Policy Agent — a general-purpose policy engine. Used here for real-time data quality enforcement and access control, deployed as containers on CaaS Kubernetes.
RegoThe declarative policy language used by OPA. Generated from YAML rule definitions by rdt-model-policy.
dbtData Build Tool — a SQL-first transformation framework. Used here for batch data quality enforcement and view materialisation in Snowflake.
Data ContractA machine-readable specification (datacontract.com 1.1.0) defining the schema, SLA, quality expectations, and ownership of a data product. Generated by rdt-model-contract.
MCPModel Context Protocol — an open standard for AI tool definitions. Generated MCP tools expose Gold data products to AI agents (Cortex Analyst, Claude).
CaaSContainer as a Service — Roche’s managed Kubernetes platform (Rancher-based). Hosts OPA policy containers and bundle refresh jobs.
Cortex AnalystSnowflake’s AI-powered natural language query engine. Consumes Semantic view definitions to answer business questions in natural language.
PingFederateRoche’s enterprise identity provider. All OAuth flows (including Snowflake WAM) route through PingFederate for authentication.
WAMWeb Access Management — Snowflake’s OAuth integration layer that delegates authentication to PingFederate via client_credentials grant.
DQ GateData Quality Gate — one of four mandatory quality checkpoints (G1–G4) that data passes through before certification. Each gate has specific checks and failure consequences.
Stub ClientA test implementation of a system integration trait that returns fixture data from local JSON files. Enables full pipeline execution without live credentials.
EntityA logical data object defined in RTiS (e.g., “waste-tracking”, “site-energy”). The unit of work for the pipeline — one entity produces one complete data product.
MRHubMaster Reference Hub — Roche’s master data system providing reference data for G2 validity checks and Solace change events.
SolaceEnterprise event bus for publishing data product lifecycle events (creation, update, deprecation).
SinequaEnterprise search engine. Receives offline documentation for data product discovery across Roche.
MulesoftAPI management platform (Anypoint). Publishes generated OpenAPI specifications as managed, governed APIs.
Snowflake HorizonSnowflake’s cross-account data governance and discovery layer. Used for registering data products for cross-account access.

No formal data product architecture exists today. The current state across Roche data domains is characterised by:

Manual, linear process. Creating a new data product takes 3–6 months of specialist involvement. Each business question requires a dedicated data engineer, manual Snowflake provisioning, hand-written dbt models, and ad-hoc quality checks. There is no reusable infrastructure.

Disconnected systems. RTiS holds ontology definitions, Collibra holds governance metadata, Snowflake hosts the data — but no automated pipeline connects them. Metadata flows are manual, inconsistent, and frequently stale.

No shared semantic layer. KPI definitions diverge across teams. The same metric exists in 4+ variants. Global reporting requires manual Excel reconciliation between domain teams.

No data contracts. Upstream schema changes cascade unpredictably into downstream consumers. Breakages surface in production dashboards and board presentations. There is no machine-readable contract between producer and consumer.

Inconsistent quality assurance. Some data products have strict dbt tests; others have none. Users cannot distinguish certified data from unchecked data. This creates a false sense of quality that is more dangerous than having no quality gates at all.

No AI readiness. Zero data products have semantic definitions suitable for natural language queries. Cortex Analyst cannot be deployed. AI agents have no MCP tools to access governed data.

The immediate trigger is the Global Sites Network: 100+ operational tools that cannot communicate, producing siloed data with no shared semantics. The same structural problems exist across all Roche data domains. The platform is designed as a horizontal solution serving all domains, with Global Sites Network as the pilot.


RDT MODEL is a data infrastructure compiler: it takes a declarative model definition as input and produces a complete, deployable data product as output.

RTiS is the grammar. roche-data is the compiler. Git is the object store. dbt is the runtime.

The compiler is implemented as a Rust CLI workspace containing 18 specialised binary modules plus one shared library. A single orchestration command — rdt-model-compile run --entity <name> — invokes all modules in dependency order across 6 sequential phases. Within each phase, modules that share no data dependency execute in parallel.

From one RTiS entity definition, the compiler produces:

#ArtifactPurpose
1Bronze table + G1 DQPhysical append-only landing, schema enforcement
2Silver view + G2 DQCurated data, validity-checked against master data
3Gold view + G3 DQBusiness-ready data, rule-checked and SLA-governed
4Semantic viewAI-queryable metrics for Cortex Analyst
5Data contractMachine-readable schema + SLA + quality spec (datacontract.com 1.1.0)
6OPA policiesReal-time DQ enforcement + access control (6 policy domains)
7dbt testsBatch DQ enforcement, aligned with OPA rules
8Python + CLI SDKType-safe programmatic access for consumers
9MCP toolAI agent tool definition (Cortex Analyst, Claude)
10OpenAPI specREST API contract for managed publication
11DocumentationOffline docs for enterprise search (Sinequa)
12Platform eventsSolace creation/update events + ServiceNow change records
13Audit trailCSRD/GDPR/GxP-compliant creation audit

All artifacts are committed to git, deployed through GitHub Actions CI/CD, and registered with GUPRI persistent identifiers. There is no partial product — every entity gets the full stack.

The platform is designed to scale across three dimensions:

Multi-domain scaling. Global Sites Network is the pilot domain. The same CLI, templates, and pipeline serve every Roche data domain. Adding a new domain requires only RTiS model definitions and domain-specific business rules — no platform changes.

LLM enrichment. Claude on AWS Bedrock enriches metadata where RTiS coverage is insufficient: terminology mappings, synonym generation, field descriptions, and DQ rule suggestions. All LLM output is human-reviewable and subject to the four-eyes PR rule.

Self-service. Two Streamlit in Snowflake UI applications provide non-technical users with CRUD and ratification capabilities. The Starlight documentation site auto-generates from pipeline artifacts — it cannot drift from implementation.

Iterative quality. Data products ship structurally complete on day one. Quality rules improve iteratively: Bronze rules are mechanical (from schema), Silver rules add master data checks, Gold rules incorporate business logic from domain experts. The pipeline re-runs on model changes without re-architecture.

AlternativeReason for rejection
Commercial data catalog (DataHub, Atlan, Unity Catalog)These catalog what already exists. They do not generate the artifacts that make data AI-ready. RDT MODEL is upstream — it generates the metadata catalogs consume. Not mutually exclusive: Collibra remains as the governance layer.
Python CLI toolboxPython CLIs carry virtualenv and dependency management debt into every CI pipeline. A Rust binary ships as a single file with no runtime dependencies, starts an order of magnitude faster in CI, and eliminates “it works on my machine” failures.
Central team builds all productsPreserves linear scalability: new business question = new ticket = 6–8 weeks. The platform shifts this to exponential: new question = one CLI command = one CI cycle.
Separate repos per domainArtifact types are deeply coupled (contract YAML that Bronze depends on comes from the same model as Semantic YAML). Monorepo keeps versions consistent and makes template changes propagate to all domains simultaneously.

A data product follows this lifecycle (see ADR 0007):

graph LR
DEFINE["**DEFINE**<br/>RTiS + Collibra"]
GENERATE["**GENERATE**<br/>CLI runs 18 mods"]
VALIDATE["**VALIDATE**<br/>Schemas + DQ"]
DEPLOY["**DEPLOY**<br/>CI/CD promote"]
CERTIFY["**CERTIFY**<br/>G4 pass → stable"]
REFINE["**REFINE**<br/>PR-based rule updates"]
DEFINE --> GENERATE --> VALIDATE --> DEPLOY --> CERTIFY
CERTIFY --> REFINE
REFINE --> DEFINE
  1. Define — Data stewards define the entity in RTiS (schema, ontology, relationships) and configure governance in Collibra (ownership, SLA, classification).
  2. Generaterdt-model-compile run orchestrates all 18 modules to produce the complete artifact set.
  3. Validaterdt-model-validate checks all artifacts against JSON Schemas, syntax rules, and cross-references.
  4. Deploy — GitHub Actions CI/CD promotes validated artifacts through dev → test → prod.
  5. Certify — G4 Consistency gate confirms data meets trend baselines and AI guardrails.
  6. Refine — Domain experts improve quality rules and business logic through normal PR workflow. Re-run generates updated artifacts.

The 18 modules are organised into 6 sequential phases. Phases run in order; modules within a phase run in parallel when they share no data dependency.

graph LR
subgraph Phase1["**Phase 1: INGEST**"]
Pull
Profile
end
subgraph Phase2["**Phase 2: ENRICH**"]
Govern
Infer
end
subgraph Phase3["**Phase 3: PREPARE**"]
Compile
Validate
end
subgraph Phase4["**Phase 4: DEPLOY**"]
Store
Policy
Api
Mcp
Sdk
Contract
end
subgraph Phase5["**Phase 5: REGISTER**"]
Register
Gupri
Search
end
subgraph Phase6["**Phase 6: SUPPORT**"]
Docs
Cidb
Event
end
Phase1 --> Phase2 --> Phase3 --> Phase4 --> Phase5 --> Phase6
PhaseNamePurposeModulesParallelism
1IngestAcquire source data from upstream systemspull, profileFull (no dependencies between modules)
2EnrichAdd governance metadata and LLM intelligencegovern, inferFull (no dependencies between modules)
3PrepareCompile artifacts and validate correctnesscompile, validateSequential (validate depends on compile)
4DeployPush artifacts to target systemsstore, policy, api, mcp, sdk, contractFull (6 modules in parallel)
5RegisterAnnounce to enterprise catalogsregister, gupri, searchFull (3 modules in parallel)
6SupportGenerate docs, compliance records, eventsdocs, cidb, eventFull (3 modules in parallel)
ActorResponsibilityInteraction
Data EngineerDefines entity models in RTiS, authors rules.yaml and policies.yaml, runs the pipeline, reviews generated artifactsCLI + Git + PR workflow
Data StewardMaintains governance metadata in Collibra (ownership, SLA, classification, PII flags). Ratifies changes via Streamlit UI.Collibra + Ratification UI
Platform TeamMaintains the CLI codebase, templates, and infrastructure. Resolves access tasks. Manages CI/CD workflows.Rust development + GitHub Actions
Domain ExpertRefines Gold business rules and Semantic view definitions. Reviews LLM enrichment suggestions.PR review + rules.yaml authoring
ConsumerQueries data products via SDK, API, Cortex Analyst, or direct Snowflake access. Relies on data contracts for stability guarantees.SDK + API + SQL + natural language
AI AgentAccesses Gold data products via MCP tools. Uses Semantic view definitions for natural language understanding.MCP protocol + Cortex Analyst
sequenceDiagram
participant DE as Data Engineer
participant PL as Platform (automated)
participant DS as Data Steward
DE->>PL: Define entity in RTiS
DS->>PL: Configure governance in Collibra
DE->>PL: Run pipeline (single cmd)
PL->>PL: Pull model + governance
PL->>PL: Enrich with LLM
PL->>PL: Compile all artifacts
PL->>PL: Validate schemas
PL->>PL: Deploy to Snowflake
PL->>PL: Register in catalogs
PL->>PL: Publish events
PL->>DE: Commit artifacts to git, open PR
DE->>DE: Review PR (4-eyes rule)
DE->>PL: Merge to main
PL->>PL: CI/CD promotes: dev → test → prod
PL->>DS: G4 certification
DS->>DS: Ratify changes via Streamlit UI

The platform operates on a three-stage conceptual model: source ontology → enriched entity model → generated artifacts.

graph TD
RTiS["**RTiS Ontology**<br/>Schema, fields, types,<br/>relationships, terminologies"]
Collibra["**Collibra Governance**<br/>Ownership, SLA, PII,<br/>classification"]
LLM["**Claude on Bedrock**<br/>Term mapping, synonyms,<br/>descriptions"]
Entity["**Entity Model (enriched)**<br/>model.json + governance.json<br/>+ suggestions.json"]
RTiS --> Entity
Collibra --> Entity
LLM --> Entity
Entity --> Snowflake["**Snowflake**<br/>Bronze DDL, Silver SQL,<br/>Gold SQL, Semantic"]
Entity --> Quality["**Quality**<br/>OPA Rego, dbt tests,<br/>K8s deploy, Bundle cfg"]
Entity --> Consumer["**Consumer Access**<br/>SDK, API spec,<br/>MCP tool, Docs"]
Entity --> Platform["**Platform**<br/>Data contract, GUPRI URI,<br/>Solace event, Audit trail"]

Key entities:

EntityDescriptionCardinality
RTiS EntityA logical data object (e.g., “waste-tracking”, “site-energy”). The unit of work.1 per data product
FieldA typed attribute within an entity. Carries RTiS metadata (type, terminology, synonyms).N per entity
Rule GroupA collection of validation rules authored in rules.yaml. Compiled to both OPA and dbt.1 per entity
Policy SetAccess, workflow, API, audit, and deployment policies in policies.yaml.1 per entity
Governance RecordCollibra-sourced stewardship: owner, steward, SLA, classification, PII flags.1 per entity
Data ProductThe complete output bundle: all Snowflake layers + all artifacts.1 per entity
GUPRI RecordPersistent identifier registration. Every artifact and entity gets a resolvable URI.N per entity

Relationships:

graph LR
Entity["**RTiS Entity**"]
Entity -->|1:N| Field
Entity -->|1:1| RuleGroup["Rule Group"]
Entity -->|1:1| PolicySet["Policy Set"]
Entity -->|1:1| GovRecord["Governance Record<br/>(from Collibra)"]
Entity -->|1:1| DataProduct["Data Product<br/>(generated)"]
DataProduct -->|1:N| GUPRI["GUPRI Record<br/>(registered)"]
DataProduct -->|1:1| Contract["Data Contract<br/>(generated)"]
DataProduct -->|1:4| Layers["Snowflake Layers<br/>(Bronze/Silver/Gold/Semantic)"]

Data governance is architecturally embedded, not bolted on. Two systems enforce quality, and one system provides governance metadata.

Governance metadata source: Collibra

Collibra is the authoritative source for all governance metadata. Ownership, SLAs, data classification, and PII flags are not authored locally — they are pulled from Collibra where data stewards maintain them. The pipeline pulls governance at generation time and pushes lineage at deployment time.

MetadataSourceUsed by
Data ownerCollibraData contract, documentation
Data stewardCollibraRatification UI, change notifications
SLA (availability, freshness)CollibraData contract, G4 monitoring
Data classificationCollibraOPA access policies, PII handling
PII flagsCollibraColumn-level masking policies
Terms of useCollibraData contract, SDK documentation
LineageGenerated → CollibraEnterprise lineage graph

Quality enforcement: four gates

Data passes through four mandatory quality gates before certification:

graph LR
G1["**G1 COMPLETENESS**<br/>Bronze Layer<br/><br/>Schema match<br/>Type conformance<br/>NOT NULL enforced<br/><br/>_FAIL: file rejected_"]
G2["**G2 VALIDITY**<br/>Silver View<br/><br/>MRHub identity<br/>Orphan detection<br/>Referential integrity<br/><br/>_FAIL: row excluded from Silver_"]
G3["**G3 BUSINESS RULES**<br/>Gold View<br/><br/>Range checks<br/>Cross-field<br/>Freshness SLA<br/><br/>_FAIL: row excluded, steward alerted_"]
G4["**G4 CONSISTENCY**<br/>Certified Product<br/><br/>Trend deviation<br/>AI guardrail<br/>30-day baseline<br/><br/>_FAIL: visible w/ Warning badge_"]
G1 --> G2 --> G3 --> G4

Dual enforcement: OPA (real-time) + dbt (batch)

Rules are defined once in rules.yaml and compiled to two execution targets:

TargetEngineContextLatencyUsed by
OPARego policies on KubernetesAPI boundary, form validation, real-time checksMillisecondsUI, API consumers, integration tests
dbtSQL tests in SnowflakePipeline execution, batch validation, regression detectionSeconds–minutesCI/CD, scheduled runs, monitoring

Both targets are generated from the same rules.yaml source by rdt-model-policy. They must stay in sync — a rule that passes in OPA must also pass in dbt, and vice versa.

The platform integrates with 21 external systems across source, target, and bidirectional roles.

graph TD
Platform["**RDT MODEL Platform**<br/>18 CLI Modules + rdt-model-common<br/>+ 2 Streamlit UI apps"]
subgraph Sources["SOURCES"]
RTiS["RTiS (ontology)"]
Aurora["Aurora PostgreSQL"]
MRHub["MRHub (master data)"]
Bedrock["Claude/Bedrock (LLM)"]
Vault["Vault (secrets)"]
end
subgraph Bidirectional["BIDIRECTIONAL"]
Collibra["Collibra (governance ↔ lineage)"]
GUPRI["GUPRI (register ↔ resolve)"]
Horizon["Snowflake Horizon (discovery ↔ governance)"]
end
subgraph Targets["TARGETS"]
Snowflake["Snowflake (DDL + views)"]
K8s["CaaS/Kubernetes (OPA pods)"]
Mulesoft["Mulesoft (API)"]
Solace["Solace (events)"]
Sinequa["Sinequa (search)"]
ServiceNow["ServiceNow (CIDM)"]
Artifactory["Artifactory (Docker images)"]
DataMP["Data Marketplace"]
MCPReg["MCP Registry"]
end
subgraph PlatformSvc["PLATFORM SERVICES"]
GHA["GitHub Actions (CI/CD)"]
Ping["PingFederate/WAM (OAuth)"]
Starlight["Starlight/Astro (docs)"]
end
Sources --> Platform
Platform <--> Bidirectional
Platform --> Targets
PlatformSvc -.-> Platform

System classification:

RoleSystemsData Flow
SourceRTiS, Aurora PostgreSQL, MRHub, Claude/Bedrock, VaultSystem → Platform
BidirectionalCollibra, GUPRI, Snowflake HorizonSystem ↔ Platform
TargetSnowflake, CaaS/K8s, Mulesoft, Solace, Sinequa, ServiceNow, Artifactory, Data Marketplace, MCP RegistryPlatform → System
PlatformGitHub Actions, PingFederate/WAM, Starlight/AstroInfrastructure

Each row represents a data exchange between the RDT MODEL platform and an external system. Interface IDs are used for traceability to access tasks.

IF-IDSourceTargetData ExchangedFrequencyProtocolAuthModuleAccess TaskStatus
IF-01RTiSPlatformEntity definitions, ontologies, terminologies, synonymsOn-demand (pipeline trigger)REST / GraphQLOAuth (PingFederate)pullA01Stub
IF-02Aurora PostgreSQLPlatformUpstream table metadata (columns, types, constraints)On-demand (profiling)PostgreSQL wire protocolUsername/password (Vault)profileA18Stub
IF-03Snowflake WAMPlatformOAuth access tokens for Snowflake operationsPer-sessionREST OIDC (client_credentials)OAuth (PingFederate)All Snowflake opsA19Live
IF-04PlatformSnowflakeBronze DDL, dbt models (Silver/Gold/Semantic views)On deploymentSnowflake REST APIOAuth (WAM token)storeA05/A06Partial
IF-05CollibraPlatformGovernance metadata: ownership, SLA, classification, PIIOn-demand (pipeline trigger)REST APIOAuthgovernA07Stub
IF-06PlatformCollibraLineage records after deploymentOn deploymentREST APIOAuthregisterA08Stub
IF-07Claude (Bedrock)PlatformLLM-generated term mappings, descriptions, DQ suggestionsOn-demand (enrichment)AWS Bedrock APIAWS IAM (SigV4)inferPlanned
IF-08PlatformCaaS/K8sOPA deployment manifests, Rego bundles, ConfigMapsOn deploymentKubernetes APIRancher tokenpolicyA13Active
IF-09PlatformArtifactoryOPA container imagesOn buildDocker Registry v2Tokenpolicy (build)Planned
IF-10PlatformMulesoftOpenAPI specifications for managed API publicationOn deploymentAnypoint Platform APIOAuthapiA09Stub
IF-11PlatformMCP RegistryMCP tool definitions for AI agent registrationOn deploymentTBDTBDmcpPlanned
IF-12GUPRIPlatformPersistent identifier resolution (existing URIs)On-demandREST APIOAuth (PingFederate)gupriA02Stub
IF-13PlatformGUPRIPersistent identifier registration (new URIs)On deploymentREST APIOAuth (PingFederate)gupriA02Stub
IF-14PlatformSnowflake HorizonCross-account discovery and governance metadataOn deploymentSnowflake APIOAuth (WAM token)registerStub
IF-15PlatformData MarketplaceData product catalog registrationOn deploymentREST APITBDregisterA15Stub
IF-16PlatformSinequaOffline documentation for enterprise search indexingOn deploymentTBDTBDsearchA17Stub
IF-17MRHubPlatformMaster reference data for G2 validity lookupsOn-demandREST APIOAuthpolicyA03Not started
IF-18MRHub (Solace)PlatformChange events for master data updatesContinuousSolace event subscriptionTokenpolicyA04Not started
IF-19PlatformSolaceData product creation/update lifecycle eventsOn deploymentSolace event publishTokeneventA04Stub
IF-20PlatformServiceNowChange management records (CIDM)On deploymentREST Table APIOAuthcidbA12Stub
IF-21VaultPlatformSecrets (database credentials, API keys, tokens)On CI/CD runREST API (AppRole / OIDC)AppRole / OIDC JWTAll (via CI)A16Live

Interface status legend:

StatusMeaning
LiveIntegration is operational with real credentials.
PartialAuthentication works; functional integration pending (e.g., Snowflake auth live, schema provisioning pending).
ActiveInfrastructure access confirmed; integration under development.
StubModule implements the interface using StubClient with fixture data. Switching to live is a configuration change.
PlannedModule exists but integration work has not started.
Not startedAccess task not yet filed or investigated.

This is a greenfield platform — there is no legacy system to migrate from. However, entity onboarding involves importing existing data structures:

Entity onboarding via rdt-model-profile

For entities that exist in upstream databases but lack RTiS representation, rdt-model-profile provides a discovery path:

graph LR
DB["**Upstream DB**<br/>Aurora PG / Snowflake"]
Profile["**rdt-model-profile**<br/>Discover tables<br/>Extract schema<br/>Suggest model"]
RTiS["**RTiS**<br/>Register as<br/>new entity<br/>(manual step)"]
DB -->|profile| Profile -->|suggest| RTiS
  1. rdt-model-profile connects to the upstream database (Aurora PostgreSQL or Snowflake).
  2. Extracts table metadata: column names, types, constraints, sample data statistics.
  3. Produces a suggested entity model that a data engineer reviews and registers in RTiS.
  4. Once registered in RTiS, the standard pipeline takes over.

This is a discovery aid, not an automated migration. The data engineer makes all decisions about entity structure, naming, and classification. The profile output is a suggestion that accelerates the manual RTiS registration process.

Data backfill

Historical data from upstream systems is loaded into Bronze tables through the standard ingestion path. There is no special migration tool — Bronze tables are append-only, and historical data is simply the first batch of appended records. The Silver/Gold/Semantic views immediately operate over this data once loaded.


The logical architecture is organised by pipeline phase. Each module follows the conventions in ADR 0008 (CLI module standards) and ADR 0009 (module I/O contracts). See also the Pipeline Overview for data flow and workspace isolation.

Phase: 1 — Ingest Purpose: Fetch an entity definition from RTiS and write a frozen JSON snapshot to the pipeline workspace. This is the entry point for every data product — all downstream modules consume the snapshot. Parallelizable: Yes (parallel with profile within Phase 1)

DirectionArtifactPathFormat
InputRTiS entity IDCLI argument (--entity)String
OutputFrozen entity snapshotmodels/{entity}/model.jsonJSON
OutputModule result envelopestdout (when --json)JSON
SystemClient TraitAuthInterfaceAccess TaskStatus
RTiSRTisClientOAuth (PingFederate) / Basic AuthIF-01A01Stub
  • Language: Rust (edition 2021)
  • Async: Yes — tokio::runtime::Runtime (Pattern A)
  • Key crates: reqwest (HTTP), async-trait, chrono, uuid (UUIDv7 run correlation)
  • Templates: None (data-only module)
CommandDescription
pullFetch entity from RTiS and write model.json
diffShow changes between local snapshot and RTiS (planned)
listList available entities in RTiS
snapshotCreate versioned snapshot (planned)

Stub — Full command structure implemented. StubRTisClient returns fixture data from cli/common/src/clients/fixtures/rtis/. HttpRTisClient implemented with JSON-LD response mapping, pending A01 resolution for live RTiS access.


Phase: 1 — Ingest (optional support module) Purpose: Discover and profile existing database tables in upstream systems (Aurora PostgreSQL, Snowflake) to suggest entity models for RTiS registration. Used for onboarding entities that lack RTiS representation. Parallelizable: Yes (parallel with pull within Phase 1)

DirectionArtifactPathFormat
InputDatabase connection + table identifierCLI arguments (--database-type, --schema, --table)String
InputSample row countCLI argument (--sample-rows, default 100, max 10,000)Integer
OutputTable structural metadata{output_dir}/{db_type}/{schema}.{table}.profile.jsonJSON
OutputSample dataEmbedded in profile JSONJSON
SystemClient TraitAuthInterfaceAccess TaskStatus
Aurora PostgreSQLDatabaseProbeUsername/password (Vault)IF-02A18Stub
SnowflakeDatabaseProbeOAuth (WAM token)IF-04A19Stub
  • Language: Rust (edition 2021)
  • Async: Yes — tokio::runtime::Runtime (Pattern A)
  • Key crates: reqwest, async-trait, regex (identifier validation), flate2 (optional gzip)
  • Templates: None (data-only module)
CommandDescription
profileProfile a database table: extract schema, types, constraints, sample data

StubStubDatabaseProbe returns deterministic fixture data. SQL identifier validation implemented. Real Snowflake and Aurora PostgreSQL probes pending access tasks A18/A19.


Phase: 2 — Enrich Purpose: Pull governance metadata from Collibra for an entity — ownership, stewardship, data classification, SLAs, PII flags, and terms of use. This metadata feeds into the data contract, documentation, and access policies. Parallelizable: Yes (parallel with infer within Phase 2)

DirectionArtifactPathFormat
InputEntity IDCLI argument (--entity)String
Inputmodel.json (from Phase 1)models/{entity}/model.jsonJSON
OutputGovernance metadatamodels/{entity}/governance.jsonJSON
OutputModule result envelopestdout (when --json)JSON
SystemClient TraitAuthInterfaceAccess TaskStatus
CollibraCollibraClientOAuth (client_id + client_secret + x-meta-bridge-key)IF-05A07Stub
  • Language: Rust (edition 2021)
  • Async: Yes — #[tokio::main] macro
  • Key crates: tokio, tracing
  • Templates: None (data passthrough)
CommandDescription
pullFetch governance metadata from Collibra
statusShow Collibra sync status (planned)

StubStubCollibraClient returns fixture CollibraMetadata. HttpCollibraClient implemented with pagination support. Blocked on A07 (Collibra access task).


Phase: 2 — Enrich (optional) Purpose: Enrich entity metadata using Claude on AWS Bedrock — generate term mappings, business-friendly synonyms, field descriptions, and DQ rule suggestions where RTiS coverage is insufficient. All suggestions are human-reviewable. Parallelizable: Yes (parallel with govern within Phase 2)

DirectionArtifactPathFormat
InputEntity IDCLI argument (--entity)String
Inputmodel.json (from Phase 1)models/{entity}/model.jsonJSON
InputScope filter (optional)CLI argument (--scope: terms, descriptions, rules)String
OutputLLM enrichment suggestionsmodels/{entity}/suggestions.jsonJSON
OutputModule result envelopestdout (when --json)JSON
SystemClient TraitAuthInterfaceAccess TaskStatus
Claude (AWS Bedrock)LlmClientAWS IAM (SigV4)IF-07Planned
  • Language: Rust (edition 2021)
  • Async: Yes — #[tokio::main] macro
  • Key crates: tokio, serde, tracing
  • Templates: None (data-only module)
CommandDescription
suggestGenerate LLM suggestions with optional scope filter

Planned — Async skeleton implemented. StubLlmClient returns hardcoded suggestions. Real Bedrock integration not yet started. LLM provider confirmed as Claude on AWS Bedrock.


Phase: 3 — Prepare Purpose: Pipeline orchestrator — invokes all downstream modules in dependency order, manages workspace lifecycle, aggregates results, and handles artifact promotion from workspace to repository paths. Parallelizable: No (sequential orchestrator; spawns parallel modules within phases)

DirectionArtifactPathFormat
InputEntity IDCLI argument (--entity)String
Inputmodel.json, governance.json, suggestions.jsonFrom Phase 1–2 outputsJSON
Inputrules.yaml, policies.yamlmodels/{entity}/YAML
OutputOrchestration resultcompile-result.json (workspace)JSON
OutputAll artifacts (20+)compile/artifacts/ (workspace)Mixed

None — the orchestrator delegates all external calls to downstream modules.

  • Language: Rust (edition 2021)
  • Async: No — synchronous (Pattern B). Spawns child processes via std::process::Command.
  • Key crates: serde, serde_json, tracing
  • Templates: None (orchestrator has no template responsibility)
CommandDescription
runExecute full pipeline (or single stage with --stage)
statusShow pipeline status (planned)
semanticGenerate Semantic YAML (delegated, planned)
openapiGenerate OpenAPI spec (delegated, planned)
mcpGenerate MCP tool definition (delegated, planned)
rulesCompile rules.yaml to OPA Rego (delegated, planned)
policiesCompile policies.yaml to OPA Rego (delegated, planned)
k8sGenerate OPA K8s manifests (delegated, planned)

Planned — CLI structure defined with 8 subcommands. Orchestration logic not yet implemented. Will spawn modules as child processes with --json to capture result envelopes.


Phase: 3 — Prepare Purpose: Validate all generated artifacts against JSON Schemas, syntax rules, and cross-references. Acts as a quality gate — the pipeline does not proceed to Phase 4 (Deploy) unless validation passes. Parallelizable: No (must run after compile completes)

DirectionArtifactPathFormat
InputAll artifacts from Phase 3compile/artifacts/ (workspace)Mixed
InputJSON SchemasEmbedded at compile time via include_str!JSON Schema
OutputValidation reportstdoutText / JSON (with --json)
OutputExit code0 (pass) or non-zero (fail)Process exit

None — pure local file validation.

  • Language: Rust (edition 2021)
  • Async: No — synchronous (Pattern B). Pure file I/O.
  • Key crates: jsonschema (JSON Schema validation), serde_json, tracing
  • Templates: None (validation only)
CommandDescription
allValidate all artifacts for an entity
contractValidate data contract YAML
schemaValidate any JSON/YAML against a schema
dbtValidate dbt model files
semanticValidate semantic YAML
rulesValidate rules against a rule group using OPA

Planned — CLI structure defined with 6 subcommands. Validation logic not yet implemented. Will use jsonschema crate with embedded schemas.


Phase: 4 — Deploy Purpose: Generate all Snowflake storage artifacts — Bronze DDL, dbt models for Bronze/Silver/Gold layers, and Semantic view definitions. Optionally deploys DDL to Snowflake. Parallelizable: Yes (parallel with policy, api, mcp, sdk, contract within Phase 4)

DirectionArtifactPathFormat
InputEntity modelmodels/{entity}/model.jsonJSON
InputGovernance metadatamodels/{entity}/governance.jsonJSON
InputRules definitionmodels/{entity}/rules.yamlYAML
OutputBronze DDLsnowflake/ddl/{entity}.sqlSQL
Outputdbt Bronze model + schemadbt/models/bronze/{entity}.sql + .ymlSQL + YAML
Outputdbt Silver view + schemadbt/models/silver/{entity}_silver.sql + .ymlSQL + YAML
Outputdbt Gold view + schemadbt/models/gold/{entity}_gold.sql + .ymlSQL + YAML
OutputSemantic viewdbt/models/semantic/{entity}.semantic.ymlYAML
SystemClient TraitAuthInterfaceAccess TaskStatus
SnowflakeSnowflakeClientOAuth (WAM token)IF-04A05/A06Stub
  • Language: Rust (edition 2021)
  • Async: No — synchronous (Pattern B). Pure template rendering.
  • Key crates: tera (templates), serde_yaml, chrono
  • Templates (8): bronze_ddl.sql.tera, dbt_bronze.sql.tera, dbt_bronze.yml.tera, silver.sql.tera, silver.yml.tera, gold.sql.tera, gold.yml.tera, semantic.yml.tera
CommandDescription
generateGenerate all Snowflake artifacts (optionally filter by --layer)
applyDeploy DDL to Snowflake (planned)

Stub — All 8 Tera templates present. Framework implemented with template rendering logic. StubSnowflakeClient for deployment. Execution pending full wiring.


Phase: 4 — Deploy Purpose: Compile rules.yaml and policies.yaml into OPA Rego policies, generate Kubernetes deployment manifests, and produce dbt test files. Handles both real-time (OPA) and batch (dbt) DQ enforcement from a single rule source. Parallelizable: Yes (parallel within Phase 4)

DirectionArtifactPathFormat
InputRules definitionmodels/{entity}/rules.yamlYAML
InputPolicies definitionmodels/{entity}/policies.yamlYAML
InputEntity modelmodels/{entity}/model.jsonJSON
OutputOPA Rego policiesk8s/{entity}/*.regoRego
OutputK8s Deploymentk8s/{entity}/opa-deployment.yamlYAML
OutputK8s Servicek8s/{entity}/opa-service.yamlYAML
OutputBundle ConfigMapk8s/{entity}/bundle-configmap.yamlYAML
OutputBundle refresh CronJobk8s/{entity}/bundle-refresh-cronjob.yamlYAML
Outputdbt test filesdbt/tests/{entity}/SQL
SystemClient TraitAuthInterfaceAccess TaskStatus
CaaS/Kubernetes(pending)Rancher tokenIF-08A13Active
MRHub(pending)OAuthIF-17A03Not started
  • Language: Rust (edition 2021)
  • Async: Mixed — sync for generation (Pattern B), async for deploy
  • Key crates: tera (templates), jsonschema (rule validation), serde_yaml, tokio, reqwest
  • Templates (7): rego-policy.rego.tera, rego-validation.rego.tera, opa-deployment.yaml.tera, opa-service.yaml.tera, bundle-configmap.yaml.tera, bundle-refresh-cronjob.yaml.tera, plus dbt test template
  • Schemas (3): rules.schema.json, policies.schema.json, validation-response.schema.json
CommandDescription
generateGenerate Rego policies and K8s manifests (optionally filter by --domain)
deployDeploy to CaaS cluster (planned)
evaluateLocal OPA policy evaluation against input data
dbtGenerate dbt test files (optionally filter by --gate)

Stub — All 7 templates and 3 schemas present. CaaS access confirmed (Rancher token active). Framework implemented; execution pending full wiring.


Phase: 4 — Deploy Purpose: Generate OpenAPI 3.x specifications from entity models and publish to Mulesoft Anypoint Platform for managed API exposure. Parallelizable: Yes (parallel within Phase 4)

DirectionArtifactPathFormat
InputEntity modelmodels/{entity}/model.jsonJSON
InputGovernance metadatamodels/{entity}/governance.jsonJSON
OutputOpenAPI specapis/{entity}/openapi.yamlYAML (OpenAPI 3.x)
SystemClient TraitAuthInterfaceAccess TaskStatus
MulesoftMulesoftClientAnypoint OAuthIF-10A09Stub
  • Language: Rust (edition 2021)
  • Async: Yes — #[tokio::main] (Pattern A)
  • Key crates: tera (templates), tokio, serde
  • Templates: Planned (OpenAPI YAML template)
CommandDescription
generateGenerate OpenAPI 3.x specification
publishPublish to Mulesoft Anypoint (planned)

Planned — Framework in place. No templates yet. StubMulesoftClient provides fixture responses. Blocked on A09 (Mulesoft access).


Phase: 4 — Deploy Purpose: Generate MCP (Model Context Protocol) tool definitions that expose Gold data products to AI agents — Cortex Analyst, Claude, and other MCP-compatible agents. Parallelizable: Yes (parallel within Phase 4)

DirectionArtifactPathFormat
InputEntity modelmodels/{entity}/model.jsonJSON
InputSemantic view definitiondbt/models/semantic/{entity}.semantic.ymlYAML
OutputMCP tool definitionapis/{entity}/mcp_tool.jsonJSON
SystemClient TraitAuthInterfaceAccess TaskStatus
MCP RegistryMcpRegistryClientTBDIF-11Planned
  • Language: Rust (edition 2021)
  • Async: Yes — #[tokio::main] (Pattern A)
  • Key crates: tera (templates), tokio, serde
  • Templates: Planned (MCP tool JSON template)
CommandDescription
generateGenerate MCP tool definition
registerRegister with MCP registry (hosting TBD)

Planned — Framework in place. MCP hosting model under investigation (Snowflake MCP not available on company account).


Phase: 4 — Deploy Purpose: Generate type-safe SDK clients for programmatic data product access — Python package and cross-compiled Rust CLI targeting 5 platforms. Parallelizable: Yes (parallel within Phase 4)

DirectionArtifactPathFormat
InputEntity modelmodels/{entity}/model.jsonJSON
InputData contractmodels/{entity}/datacontract.yamlYAML
OutputPython SDKsdks/{entity}/python/Python package
OutputCLI SDK sourcesdks/{entity}/cli/Rust source

None — pure code generation module.

  • Language: Rust (edition 2021)
  • Async: No — synchronous (Pattern B). Pure code generation.
  • Key crates: tera (templates), serde_json, serde_yaml, chrono
  • Templates: Planned (Python + Rust CLI templates)
CommandDescription
pythonGenerate Python SDK package
cli generateGenerate Rust CLI SDK source
cli buildCross-compile CLI binaries (optionally --target <triple>)

Planned — Framework implemented. No templates yet. Pure generation module with no external dependencies.


Phase: 4 — Deploy Purpose: Generate a datacontract.com 1.1.0 YAML specification for the entity’s Gold and Semantic views. The contract defines schema, SLA, quality expectations, and ownership in a machine-readable format. Parallelizable: Yes (parallel within Phase 4)

DirectionArtifactPathFormat
InputEntity modelmodels/{entity}/model.jsonJSON
InputGovernance metadatamodels/{entity}/governance.jsonJSON
InputGUPRI recordmodels/{entity}/gupri.yamlYAML
OutputData contractmodels/{entity}/datacontract.yamlYAML (datacontract.com 1.1.0)

None — pure template rendering module.

  • Language: Rust (edition 2021)
  • Async: No — synchronous (Pattern B). Pure template rendering.
  • Key crates: tera (templates), jsonschema (output validation), serde_yaml, chrono
  • Templates (1): contract.yaml.tera
  • Schemas (1): contract.schema.json (validates generated output)
CommandDescription
generateGenerate datacontract.yaml for an entity

Production-ready — Fully implemented with template rendering, schema validation, GUPRI integration, and dry-run support. Unit tests verify schema compliance.


Phase: 5 — Register Purpose: Register the deployed data product across enterprise discovery and governance systems — push lineage to Collibra, register in Snowflake Horizon, and catalog in Roche Data Marketplace. Parallelizable: Yes (parallel with gupri and search within Phase 5)

DirectionArtifactPathFormat
InputAll Phase 4 deployment resultsdeploy/*-result.jsonJSON
InputEntity model + governancemodels/{entity}/JSON + YAML
OutputRegistration confirmationsregister/register-result.jsonJSON
SystemClient TraitAuthInterfaceAccess TaskStatus
CollibraCollibraClientOAuthIF-06A08Stub
Snowflake HorizonHorizonClientOAuth (WAM)IF-14Stub
Data Marketplace(TBD)TBDIF-15A15Stub
  • Language: Rust (edition 2021)
  • Async: Yes — #[tokio::main] (Pattern A)
  • Key crates: tokio, serde
  • Templates: None
CommandDescription
collibraPush lineage records to Collibra
horizonRegister in Snowflake Horizon
rdmRegister in Roche Data Marketplace
allRun all three registrations

Planned — Framework in place. All client traits defined with stubs. Blocked on A07/A08 (Collibra), A15 (Data Marketplace).


Phase: 5 — Register Purpose: Register artifacts with GUPRI (Globally Unique Persistent Roche Identifier) to obtain resolvable URIs for every data product artifact. Parallelizable: Yes (parallel within Phase 5)

DirectionArtifactPathFormat
InputEntity ID + artifact typeCLI argumentsString
OutputGUPRI recordmodels/{entity}/gupri.yamlYAML
OutputModule result envelopestdout (when --json)JSON
SystemClient TraitAuthInterfaceAccess TaskStatus
GUPRIGupriClientOAuth (PingFederate)IF-12, IF-13A02Stub
  • Language: Rust (edition 2021)
  • Async: Yes — #[tokio::main] (Pattern A)
  • Key crates: tokio, reqwest, async-trait, jsonschema, serde_yaml
  • Schemas (1): gupri.schema.json (validates GUPRI records)
  • Templates: None
CommandDescription
registerRegister artifact and obtain GUPRI URI (--artifact-type)
resolveResolve an existing GUPRI URI to its record

Production-ready — Fully implemented with StubGupriClient, schema validation, YAML output, and dry-run support. Pending A02 for live GUPRI API integration.


Phase: 5 — Register Purpose: Push offline documentation to Sinequa enterprise search engine for data product discovery across Roche. Parallelizable: Yes (parallel within Phase 5)

DirectionArtifactPathFormat
InputAll generated artifactsVarious pathsMixed
OutputSearch index push confirmationregister/search-result.jsonJSON
SystemClient TraitAuthInterfaceAccess TaskStatus
SinequaSinequaClientTBDIF-16A17Stub
  • Language: Rust (edition 2021)
  • Async: Yes — #[tokio::main] (Pattern A)
  • Key crates: tokio, serde
  • Templates: None
CommandDescription
pushPush documentation to Sinequa search index

Planned — Framework in place. StubSinequaClient defined. Integration mechanism (API vs. file drop) TBD. Blocked on A17.


Phase: 6 — Support Purpose: Generate Starlight reference documentation from all pipeline artifacts — clap definitions, JSON Schemas, ADRs, contracts, and API specs. The docs site is a build artifact that cannot drift from implementation. Parallelizable: Yes (parallel with cidb and event within Phase 6)

DirectionArtifactPathFormat
InputAll Phase 4 artifactsVariousMixed (SQL, YAML, JSON, Rego)
InputADR filesadr/Markdown
InputCLI definitionsEmbedded in binariesClap metadata
OutputReference docsdocs/src/content/docs/reference/Markdown
OutputArchitecture docsdocs/src/content/docs/architecture/Markdown
OutputStatus docsdocs/src/content/docs/status/Markdown

None — pure generation from local files.

  • Language: Rust (edition 2021)
  • Async: No — synchronous (Pattern B). Pure file reading and template rendering.
  • Key crates: serde_json, serde
  • Templates: Planned (Markdown templates for each doc type)
CommandDescription
generateGenerate all Starlight reference documentation

Planned — Framework in place. Generation logic not yet implemented. Output directories (reference/, architecture/, status/) are exclusively owned by this module — no manual editing.


Phase: 6 — Support Purpose: Create ServiceNow change management records (CIDM) for production deployments. Provides audit trail for change control compliance. Parallelizable: Yes (parallel within Phase 6)

DirectionArtifactPathFormat
InputDeployment resultsPhase 4–5 result envelopesJSON
InputEntity metadatamodels/{entity}/JSON + YAML
OutputChange request recordsupport/cidb-result.jsonJSON
SystemClient TraitAuthInterfaceAccess TaskStatus
ServiceNowServicenowClientOAuthIF-20A12Stub
  • Language: Rust (edition 2021)
  • Async: Yes — #[tokio::main] (Pattern A)
  • Key crates: tokio, serde
  • Templates: None
CommandDescription
registerCreate ServiceNow change request for deployment

Planned — Framework in place. StubServicenowClient defined. Blocked on A12 (ServiceNow access).


Phase: 6 — Support Purpose: Publish data product lifecycle events to the Solace enterprise event bus — notifying downstream systems and consumers of new, updated, or deprecated data products. Parallelizable: Yes (parallel within Phase 6)

DirectionArtifactPathFormat
InputEntity modelmodels/{entity}/model.jsonJSON
InputGUPRI recordmodels/{entity}/gupri.yamlYAML
OutputEvent publication confirmationsupport/event-result.jsonJSON
SystemClient TraitAuthInterfaceAccess TaskStatus
SolaceSolaceClientTokenIF-19A04Stub
  • Language: Rust (edition 2021)
  • Async: Yes — #[tokio::main] (Pattern A)
  • Key crates: tokio, chrono, serde, serde_json, serde_yaml, jsonschema
  • Schemas (1): event.schema.json (validates event payloads)
  • Event types: Created, Updated, Verified, SupersededBy
  • Topic pattern: rdt/data-product/{entity_id}/{event_type}
CommandDescription
publishPublish lifecycle event (optionally --event-type, default: created)

Production-ready — Fully implemented with client certificate authentication (PEM), schema-validated event payloads, topic-based routing, and dry-run support. A04 resolved 2026-05-07. Additionally, all CLI modules now publish automatic execution events to Solace via rdt-model-common/events.rs.


rdt-model-common is the [lib] member of the Cargo workspace. Every rdt-model-* binary depends on it. It provides no CLI interface — only shared types, traits, and utilities.

rdt-model-common/
├── src/
│ ├── lib.rs ← module exports
│ ├── cli.rs ← GlobalOpts: --target, --entity, --dry-run, --quiet, --json, --verbose
│ ├── config.rs ← Config: roche-data.toml + env var overrides + environment resolution
│ ├── paths.rs ← All output path functions (one per artifact type)
│ ├── errors.rs ← CliError enum with exit code mapping
│ ├── exit_codes.rs ← Standard exit codes per ADR 0008
│ ├── fs.rs ← write_artifact() + write_json_artifact() → OutputAction
│ ├── reporting.rs ← init_tracing(), ModuleResultBuilder, ModuleResult, OutputAction
│ ├── run_id.rs ← UUIDv7 run correlation IDs
│ ├── models/
│ │ └── mod.rs ← Entity, GupriRecord, CollibraMetadata, RulesDefinition, etc.
│ ├── clients/
│ │ ├── mod.rs ← re-exports all client traits
│ │ ├── rtis.rs ← RTisClient trait + StubRTisClient
│ │ ├── collibra.rs ← CollibraClient trait + HttpCollibraClient + StubCollibraClient
│ │ ├── gupri.rs ← GupriClient trait + StubGupriClient
│ │ ├── snowflake.rs ← SnowflakeClient trait + StubSnowflakeClient
│ │ ├── snowflake_auth.rs ← SnowflakeAuth (OAuth WAM token exchange)
│ │ ├── postgres_auth.rs ← PostgresAuth (Vault credential retrieval)
│ │ ├── horizon.rs ← HorizonClient trait + StubHorizonClient
│ │ ├── solace.rs ← SolaceClient trait + StubSolaceClient
│ │ ├── mulesoft.rs ← MulesoftClient trait + StubMulesoftClient
│ │ ├── mcp_registry.rs ← McpRegistryClient trait + StubMcpRegistryClient
│ │ ├── servicenow.rs ← ServicenowClient trait + StubServicenowClient
│ │ ├── sinequa.rs ← SinequaClient trait + StubSinequaClient
│ │ ├── rdm.rs ← RdmClient trait + StubRdmClient
│ │ └── llm.rs ← LlmClient trait + StubLlmClient
│ └── json/
│ ├── mod.rs ← re-exports
│ ├── handler.rs ← JsonHandler: simd-json parse, jsonschema validate, BufWriter write
│ ├── schema_cache.rs ← Lazy OnceLock-based compiled schema cache
│ └── errors.rs ← JsonValidationError enum
└── schemas/
├── manifest.json ← Shared base manifest schema
└── result.json ← Shared result envelope schema

Client Trait Pattern

Every external system has a Rust trait and two implementations: a real HTTP client and a stub. The binary layer receives &dyn SystemClient via dependency injection, enabling transparent switching between production and dry-run mode.

graph LR
Binary["**Binary Module**<br/>e.g., rdt-model-pull"]
Trait["**Trait (async)**<br/>e.g., RTisClient<br/>get_entity()<br/>list_entities()"]
Http["**HttpRTisClient**<br/>(live HTTP)"]
Stub["**StubRTisClient**<br/>(fixture data)"]
Binary --> Trait
Http -.->|implements| Trait
Stub -.->|implements| Trait

Client selection logic: if --dry-run is set or credentials are missing, the module uses the stub client. Otherwise, it instantiates the real HTTP client. This is a configuration decision, not a code change.

JSON Handling (json/)

Centralised JSON processing with three guarantees:

  1. Parsing: simd_json for 2–4x faster throughput with SIMD acceleration
  2. Validation: Three-layer approach — jsonschema at entry, serde types for structure, garde for business logic
  3. Writing: Direct-to-file via BufWriter (no full string allocation)

The JsonHandler facade provides parse() for internal/trusted data and parse_validated() for external input that requires schema validation. A lazy OnceLock-based schema cache compiles schemas once and reuses them.

Path Management (paths.rs)

One function per artifact type. All output paths are constructed here — never inline in commands or generators. Adding a new artifact type means adding one function.

Reporting (reporting.rs)

Two-track system used by every binary:

TrackTargetPurpose
Structured tracingstderrHuman-readable progress (info!, debug!, warn!)
Result envelopestdoutMachine-readable JSON for orchestrator integration

init_reporting() is called first in every main(). Verbosity is controlled by --verbose / --quiet / RUST_LOG.

Filesystem Helpers (fs.rs)

write_artifact() and write_json_artifact() handle file output. They return OutputAction (Wrote, Skipped, Updated) which feeds into the result envelope. They respect --dry-run mode and use tracing for progress reporting.


Two Streamlit in Snowflake applications provide consumer-facing interfaces outside the CLI pipeline. They are Python applications deployed directly into Snowflake’s Streamlit hosting environment.

Purpose: Render a data entry form from a crud.json artifact, enabling create/read/update/delete operations against an entity’s Snowflake tables.

AttributeValue
Directoryui/crud/
Entry pointapp.py
Inputmodels/{entity}/crud.json (generated by pipeline)
Dependenciesstreamlit, snowflake-snowpark-python
Schemaschemas/crud.schema.json
Deployed toStreamlit in Snowflake
StatusScaffold (form rendering from spec; CRUD operations not yet wired)

Data flow: The CLI pipeline generates crud.json → Streamlit app reads the spec → renders dynamic form → executes SQL against Snowflake session.

rdt-ui-ratification (Steward Ratification)
Section titled “rdt-ui-ratification (Steward Ratification)”

Purpose: Enable data stewards to review and approve/change taxonomies, synonyms, definitions, and data tags (PII, classification, usage restrictions) for an entity.

AttributeValue
Directoryui/ratification/
Entry pointapp.py
Inputmodels/{entity}/model.json + models/{entity}/governance.json
Dependenciesstreamlit, snowflake-snowpark-python
Deployed toStreamlit in Snowflake
StatusScaffold (taxonomy display; approval workflow not yet wired)

Data flow: Steward opens app → reviews entity metadata (taxonomies, definitions, governance) → approves or requests changes → changes feed back into the pipeline as governance updates.


4.4.21 Cross-Cutting Architectural Patterns

Section titled “4.4.21 Cross-Cutting Architectural Patterns”

These patterns are enforced across all 18 binary modules.

Generators are pure functions that take a resolved Model and return Result<String>. No filesystem access, no HTTP calls, no side effects. The command layer handles input loading and output writing. Generators live in the module that owns the artifact — never in rdt-model-compile.

pub fn generate_datacontract(
model: &Model,
gupri: &GupriRecord,
governance: &CollibraMetadata,
) -> Result<String> {
let tmpl = include_str!("../templates/contract.yaml.tera");
let mut ctx = tera::Context::new();
ctx.insert("model", model);
ctx.insert("gupri", gupri);
ctx.insert("governance", governance);
tera::Tera::one_off(tmpl, &ctx, false)
.context("failed to render data contract template")
}

All Tera templates are embedded at compile time using include_str!. The binary is fully self-contained — no runtime template file access. Template changes require recompilation, which triggers CI validation.

Every binary calls cli.global.init_reporting() as the first action in main(). All progress uses tracing macros (never println!). When --json is passed, a machine-readable result envelope is emitted to stdout for orchestrator consumption.

Every rdt-model-* command requires --target dev|test|prod. There is no default. The target drives:

  • Snowflake schema prefix (DEV_BRONZE, TEST_BRONZE, PROD_BRONZE)
  • Kubernetes namespace (rdt-model-dev, rdt-model-test, rdt-model-prod)
  • Vault secret path (secret/dev/ci, secret/test/ci, secret/prod/ci)
  • dbt target profile

anyhow::Result throughout. Every ? has .context(...) for error chain clarity. No .unwrap() or .expect() outside #[cfg(test)]. Exit codes are standardised: 0 (success), 1 (runtime error), 2 (validation error), 3 (config error).

Every external system integration follows the stub-first pattern. Modules implement the full interface using StubClient implementations that return fixture data. Switching to live is a configuration change (credentials present + --dry-run not set), not a code change. This keeps the full pipeline runnable without any credentials.


All environments share the same physical infrastructure. Separation is achieved through configuration — schema prefixes, namespaces, and Vault paths — not through separate systems. See ADR 0010 for full rationale.

graph TD
subgraph Snowflake["**Snowflake (Cloud)**<br/>Account: roche-gsn | Database: RDT_MODEL"]
subgraph SFDev["DEV"]
DEV_B["DEV_BRONZE"]
DEV_S["DEV_SILVER"]
DEV_G["DEV_GOLD"]
DEV_SE["DEV_SEMANTIC"]
end
subgraph SFTest["TEST"]
TEST_B["TEST_BRONZE"]
TEST_S["TEST_SILVER"]
TEST_G["TEST_GOLD"]
TEST_SE["TEST_SEMANTIC"]
end
subgraph SFProd["PROD"]
PROD_B["PROD_BRONZE"]
PROD_S["PROD_SILVER"]
PROD_G["PROD_GOLD"]
PROD_SE["PROD_SEMANTIC"]
end
SiS["Streamlit: rdt-ui-crud, rdt-ui-ratification"]
Cortex["Cortex Analyst"]
end
subgraph K8s["**CaaS / Kubernetes (Rancher)**<br/>Cluster: Cloud Prod eu-central-1 | Project: rdt_model"]
K8sDev["ns: rdt-model-dev<br/>OPA pods + CronJobs"]
K8sTest["ns: rdt-model-test<br/>OPA pods + CronJobs"]
K8sProd["ns: rdt-model-prod<br/>OPA pods + CronJobs"]
end
subgraph VaultSvc["**HashiCorp Vault**<br/>Auth: OIDC + AppRole | KV v2"]
VDev["secret/dev/ci/"]
VTest["secret/test/ci/"]
VProd["secret/prod/ci/"]
VCommon["secret/common/caas"]
end
subgraph GHA["**GitHub Actions (CI/CD)**"]
Workflows["validate.yml, deploy.yml, docs.yml"]
Envs["Environments: dev, test, prod"]
Runners["Runners: Roche VPN-connected"]
end
subgraph Ping["**PingFederate / WAM (Identity)**"]
OAuth["OAuth 2.0 client_credentials"]
WAM["Snowflake WAM integration"]
end
graph TD
subgraph Internal["**ROCHE INTERNAL NETWORK**"]
Dev["Developer Workstation<br/>rdt-model-* CLI"]
GHA["GitHub Actions Runner<br/>rdt-model-* CI (VPN)"]
subgraph VPN["**Roche VPN / Corporate Network**"]
RTiS2["RTiS (AWS+VPN)"]
GUPRI2["GUPRI (AWS+VPN)"]
MRHub2["MRHub (AWS+VPN)"]
Vault2["Vault (internal)"]
CaaS2["CaaS/Rancher"]
Ping2["PingFederate"]
SN2["ServiceNow"]
Art2["Artifactory"]
end
Dev --> VPN
GHA --> VPN
end
subgraph Cloud["**CLOUD / EXTERNAL**"]
SF2["Snowflake (HTTPS)"]
Bedrock2["AWS Bedrock (Claude)"]
Mule2["Mulesoft Anypoint"]
Sol2["Solace (Event Bus)"]
end
VPN -->|HTTPS| Cloud

Key network constraints:

SystemNetwork ZoneAccess Method
RTiS, GUPRI, MRHubAWS behind Roche VPNHTTPS from VPN-connected clients
Vault, CaaS, PingFederateRoche internalDirect internal HTTPS
SnowflakeCloud (public endpoint)HTTPS with OAuth (WAM/PingFederate)
AWS BedrockAWS CloudHTTPS with IAM SigV4
Mulesoft, SolaceCloud/HybridHTTPS with OAuth
GitHub ActionsCloud runners + VPNSelf-hosted runners on Roche VPN

All environments share identical infrastructure. Isolation is achieved through configuration at three levels:

LayerDevTestProd
Snowflake schemasDEV_BRONZE, DEV_SILVER, DEV_GOLD, DEV_SEMANTICTEST_BRONZE, TEST_SILVER, TEST_GOLD, TEST_SEMANTICPROD_BRONZE, PROD_SILVER, PROD_GOLD, PROD_SEMANTIC
K8s namespacerdt-model-devrdt-model-testrdt-model-prod
Vault pathsecret/dev/ci/secret/test/ci/secret/prod/ci/
dbt targetdevtestprod
GitHub Environmentdev (auto-deploy)test (manual approval)prod (reviewer approval)

Config resolution:

base roche-data.toml → [environments.{target}] overrides → env var overrides

CI/CD promotion flow:

graph LR
Push["Push to main"] --> DEV["Deploy to DEV<br/>(auto)"]
DEV --> TEST["Deploy to TEST<br/>(manual approval)"]
TEST --> PROD["Deploy to PROD<br/>(reviewer approval)"]

OPA policy containers are built and deployed to CaaS Kubernetes:

ComponentRegistryImageDeployment
OPA sidecarRoche Artifactoryartifactory.roche.com/rdt-model/opa:{version}K8s Deployment
Bundle refreshRoche Artifactoryartifactory.roche.com/rdt-model/bundle-refresh:{version}K8s CronJob

Images are built in GitHub Actions, pushed to Artifactory, and deployed via generated Kubernetes manifests. Each entity gets its own OPA deployment with entity-specific Rego bundles.

graph TD
subgraph SF["**SNOWFLAKE — RDT_MODEL Database**"]
subgraph Bronze["{ENV}_BRONZE (physical — append-only)"]
BWT["waste_tracking"]
BSE["site_energy"]
BVQ["vendor_quality"]
end
subgraph Silver["{ENV}_SILVER (views — G2 validity)"]
SWT["waste_tracking_silver"]
SSE["site_energy_silver"]
SVQ["vendor_quality_silver"]
end
subgraph Gold["{ENV}_GOLD (views — G3 business rules)"]
GWT["waste_tracking_gold"]
GSE["site_energy_gold"]
GVQ["vendor_quality_gold"]
end
subgraph Semantic["{ENV}_SEMANTIC (views — Cortex Analyst)"]
SMWT["waste_tracking_semantic"]
SMSE["site_energy_semantic"]
SMVQ["vendor_quality_semantic"]
end
Audit["AUDIT (cross-env, append-only)<br/>pipeline_audit_log"]
end
Bronze --> Silver --> Gold --> Semantic

Key design decisions (from ADR 0004):

  • Bronze is the only physical write. Silver, Gold, and Semantic are views.
  • Views eliminate schema migration at Silver/Gold/Semantic layers.
  • DQ gates run at query time (view predicates), not at write time.
  • Snowflake result cache and micro-partition pruning handle view performance.

ProfileDescriptionScalePrimary Interaction
Data EngineerRoche domain data engineers who define entities, author rules, and run the pipeline. Power CLI users.5–15 across all domains (Phase 0–2), scaling to 50+ (Phase 5)CLI + Git
Data StewardGovernance professionals maintaining metadata in Collibra. Non-technical, use Streamlit UI for ratification.3–10 per domainStreamlit UI + Collibra
Platform AdminTeam maintaining the CLI codebase, templates, CI/CD, and infrastructure.2–5Rust development + GitHub
Domain ExpertBusiness analysts refining Gold rules and Semantic definitions via PR review.10–30 per domainPR review + YAML authoring
Consumer (Human)Analysts and scientists querying data products via SQL, SDK, or Cortex Analyst.100–1000+ per domainSQL + SDK + NLQ
Consumer (AI Agent)AI agents accessing data products via MCP tools.UnboundedMCP protocol
CI/CD PipelineGitHub Actions workflows running the pipeline on every merge.Concurrent per entity × environmentCLI (--json mode)
OperationTargetConstraint
Full pipeline (compile run)< 5 minutes per entityIncludes all 18 modules, stub mode
Single module (template rendering)< 10 secondsPure Tera rendering, no network
Single module (API call + render)< 30 secondsIncludes HTTP call + template rendering
Artifact validation (validate all)< 15 seconds per entitySchema validation of 20+ artifacts
Profile discovery< 60 seconds per tableDatabase metadata extraction
Query PatternTargetMechanism
Gold view — single entity KPI< 5 secondsSnowflake result cache + micro-partition pruning
Semantic view — Cortex Analyst query< 10 secondsNLQ → SQL → view chain
Silver view — full entity scan< 30 secondsColumnar scan, partition pruning on date
Bronze table — historical backfill query< 60 secondsClustering key on reporting_date
StageTarget
PR validation (compile + validate)< 3 minutes
Full deployment (dev)< 10 minutes
Promotion (test → prod)< 5 minutes (after approval)
PhaseEntity CountDomainsConcurrent Pipelines
Phase 0–1 (current)3–5 entities1 (Global Sites Network)1
Phase 2–320–50 entities2–3 domains5
Phase 4–5100–500 entities10+ domains20
StorageGrowth ModelRetention
Git repository (artifacts)~500 KB per entity (20+ files)Indefinite (git history)
Snowflake Bronze tablesAppend-only, entity-dependentTime-travel + retention policy (TBD)
OPA bundles (K8s ConfigMaps)~10 KB per entityCurrent version only
Docker images (Artifactory)~50 MB per OPA image versionLast 5 versions
EnvironmentWarehouse SizeAuto-suspendUsage Pattern
DevX-Small60 secondsInteractive development
TestSmall120 secondsCI/CD validation runs
ProdMedium300 secondsScheduled pipeline + analyst queries
ComponentAvailability TargetMechanism
Snowflake (query)99.9% (platform SLA)Snowflake managed HA, multi-AZ
CaaS/K8s (OPA)99.5%Replica count ≥ 2 for prod, health checks
GitHub Actions (CI/CD)99.9% (platform SLA)GitHub managed
Vault (secrets)99.9%Vault HA cluster (Roche managed)
Pipeline executionBest-effortRetry on transient failures; stub fallback
ComponentRPORTOStrategy
Source code + artifacts0 (git)MinutesGit clone from GitHub (distributed)
Snowflake dataPer Snowflake Time Travel (up to 90 days)HoursSnowflake native DR (failover)
OPA policies0 (git)MinutesRe-deploy from git (K8s manifests)
Vault secretsPer Vault snapshot scheduleHoursVault snapshot restore
Pipeline stateN/A (stateless)ImmediateRe-run pipeline (idempotent)

Git as artifact store provides inherent DR. All generated artifacts are committed to git. The repository is the source of truth. Any lost deployment can be recreated by re-running the pipeline against the committed model.

ConcernControl
AuthenticationOAuth 2.0 via PingFederate (all systems). AWS IAM for Bedrock.
AuthorizationSnowflake RBAC (role per environment). K8s RBAC (namespace scoped). Vault policies (path scoped).
Secrets managementHashiCorp Vault (OIDC + AppRole). No secrets in code or CI variables.
Data classificationCollibra-sourced PII flags → column-level masking in Snowflake.
AuditAppend-only audit table in Snowflake. Git history for all artifact changes.
NetworkRoche VPN for internal systems. HTTPS for all external calls. No plain HTTP.
Supply chainCargo.lock pinned. GitHub Dependabot for CVE alerts.
AspectApproach
ObservabilityStructured tracing (stderr) with level control (--verbose, --quiet, RUST_LOG). Machine-readable result envelopes (--json) for aggregation.
Debugging--dry-run mode for safe testing. --verbose for full trace output. Workspace retention (--keep-workspace) for post-mortem inspection.
Code qualityCargo clippy (deny warnings). cargo test --workspace in CI. Integration test feature flag (integration).
DocumentationAuto-generated from artifacts by rdt-model-docs. Cannot drift from implementation. ADRs for architectural decisions.
Dependency managementCargo workspace with shared dependency versions. Dependabot alerts. Minimal external dependencies for pure-rendering modules.
Template evolutionTemplate changes propagate to all entities on next pipeline run. No per-entity customisation — consistency enforced by design.

CategoryTechnologyVersionPurpose
LanguageRustEdition 2021CLI implementation (18 binaries + 1 library)
BuildCargoWorkspaceMulti-crate build, dependency management
CLI frameworkclap4.xCommand-line argument parsing, subcommands
Template engineTera1.xEmbedded template rendering (SQL, YAML, Rego, K8s manifests)
JSON parsingsimd-json0.14SIMD-accelerated JSON parsing (via JsonHandler)
JSON Schemajsonschema0.18Artifact validation (Draft 2020-12)
Serializationserde + serde_json + serde_yaml1.xJSON/YAML serialization/deserialization
HTTP clientreqwest0.12External system API calls
Async runtimetokio1.xAsync I/O for network-bound modules
Async traitsasync-trait0.1Async trait definitions for client traits
Tracingtracing + tracing-subscriber0.1Structured logging and diagnostics
Date/timechrono0.4Timestamps, date handling
UUIDuuid1.xUUIDv7 run correlation IDs
Compressionflate21.xOptional gzip for profile output
Regexregex1.xSQL identifier validation, pattern matching
Env filesdotenvy0.15.env file loading
Data platformSnowflakeMedallion architecture (Bronze/Silver/Gold/Semantic)
TransformationdbtCoreView generation, batch DQ tests
Policy engineOpen Policy Agent0.xReal-time DQ enforcement, access control
Policy languageRegoPolicy definitions compiled from YAML DSL
Container platformKubernetes (Rancher/CaaS)1.xOPA deployment, bundle refresh jobs
Container registryArtifactoryDocker image storage for OPA containers
Secret managementHashiCorp VaultOIDC + AppRole auth, KV v2 secrets
Identity providerPingFederateOAuth 2.0 (client_credentials) for all systems
CI/CDGitHub ActionsValidate, deploy, docs workflows
DocumentationStarlight (Astro)Generated reference documentation site
LLMClaude on AWS BedrockMetadata enrichment (term mapping, descriptions)
UI frameworkStreamlit in SnowflakeCRUD and ratification web applications
Data contractdatacontract.com1.1.0Machine-readable schema + SLA + quality spec
Event busSolaceEnterprise event publishing
SearchSinequaEnterprise search indexing
API gatewayMulesoft (Anypoint)Managed API publication
AI querySnowflake Cortex AnalystNatural language query over Semantic views

PathContentsUsed by
secret/common/caasRancher token, cluster URLrdt-model-policy (K8s deployment)
secret/common/artifactoryDocker registry credentialsCI/CD (image push)
secret/common/githubGitHub App credentialsCI/CD workflows
Path PatternContentsUsed by
secret/{env}/ci/snowflakeSnowflake OAuth client_id/secret, account, warehouse, rolerdt-model-store, all Snowflake ops
secret/{env}/ci/collibraCollibra API client_id/secret, bridge keyrdt-model-govern, rdt-model-register
secret/{env}/ci/rtisRTiS API credentials (Basic Auth or OAuth)rdt-model-pull
secret/{env}/ci/gupriGUPRI API credentialsrdt-model-gupri
secret/{env}/ci/mulesoftAnypoint Platform credentialsrdt-model-api
secret/{env}/ci/solaceSolace connection credentialsrdt-model-event
secret/{env}/ci/servicenowServiceNow API credentialsrdt-model-cidb
secret/{env}/ci/sinequaSinequa API credentialsrdt-model-search
secret/{env}/ci/bedrockAWS IAM credentials for Bedrockrdt-model-infer
secret/{env}/ci/postgresAurora PostgreSQL credentialsrdt-model-profile
secret/{env}/ci/mrhubMRHub API credentialsrdt-model-policy

Where {env} is one of dev, test, prod.


Access tasks track the provisioning of credentials and network paths to external systems. Each task is a GitHub Issue.

IDSystemDescriptionIssueStatus
A01RTiSREST API credentials + network path#15Pending
A02GUPRIREST API credentials + network path#16Pending
A03MRHubREST API credentials for G2 lookups#24Not started
A04MRHub / SolaceSolace event subscription + publish credentials#24Not started
A05SnowflakeService account, database, schema provisioning#23Partial (auth live)
A06SnowflakeCortex Analyst feature enablement#23Pending
A07CollibraREST API credentials for governance metadata pull#25Pending
A08CollibraREST API credentials for lineage push#25Pending
A09MulesoftAnypoint Platform API credentials#26Pending
A10GitHub ActionsWorkflow configuration + runner accessDone
A11GitHub ActionsRunner VPN access for internal systemsDone
A12ServiceNowTable API credentials for CIDM#27Pending
A13CaaS/K8sRancher access + namespace provisioning#28Active
A14LeanIXEA catalog API credentials (stretch)#29Not started
A15Data MarketplaceRegistry API credentials (stretch)#30Not started
A16VaultOIDC + AppRole configuration for CI#70Done
A17SinequaSearch API credentials + push mechanism#80Pending
A18Aurora PostgreSQLDatabase connection credentials for profilingTBDNot started
A19Snowflake WAMOAuth token exchange configurationTBDDone

ADRTitleStatusSections Referenced
0001Project VisionAccepted§1, §3, §4.1, §4.2
0002Rust as CLI Implementation LanguageAccepted§4.1.2, Appendix A
0003Monorepo StructureAccepted§4.1.2, §4.4
0004Virtual Medallion ArchitectureAccepted§4.3.2, §4.5.5
0005Rule Engine — MODEL DSL to OPA on K8sAccepted§4.3.2, §4.4.8
0005bOPA as MODEL Unified Policy EngineAccepted§4.3.2, §4.4.8
0006Multi-Binary Cargo WorkspaceSuperseded by 0011
0007Data Product LifecycleProposed§4.2.1, §4.2.2
0008CLI Module Development StandardsProposed§4.4, §4.4.21
0009Module I/O ContractsAccepted§4.4, Pipeline Overview
0010Environment StrategyProposed§4.5.3
0011Pipeline Restructure (19-module / 6-phase)Accepted§4.2.2, §4.4

ModulePhaseAsyncTemplatesSchemasImplementationClient Trait
rdt-model-pull1Yes01Stub (HTTP client ready)RTisClient
rdt-model-profile1Yes02StubDatabaseProbe
rdt-model-govern2Yes00Stub (HTTP client ready)CollibraClient
rdt-model-infer2Yes01PlannedLlmClient
rdt-model-compile3No00Planned (orchestrator)None
rdt-model-validate3No00PlannedNone
rdt-model-store4No80Stub (templates ready)SnowflakeClient
rdt-model-policy4Mixed73Stub (templates ready)(pending)
rdt-model-api4Yes00PlannedMulesoftClient
rdt-model-mcp4Yes00PlannedMcpRegistryClient
rdt-model-sdk4No00PlannedNone
rdt-model-contract4No11ProductionNone
rdt-model-register5Yes00PlannedCollibraClient, HorizonClient
rdt-model-gupri5Yes01ProductionGupriClient
rdt-model-search5Yes00PlannedSinequaClient
rdt-model-docs6No00PlannedNone
rdt-model-cidb6Yes00PlannedServicenowClient
rdt-model-event6Yes01ProductionSolaceClient

Production = fully executable with fixtures (not stubbed logic). Stub = framework with templates/schemas present; execution delegates to stub clients. Planned = CLI skeleton defined; execution not yet implemented.


DiagramLocationUsed In
Platform Flow (6 phases, ASCII)Inline in §4.2.2§4.2, §4.4
CLI Module Architecture (SVG)docs/src/assets/diagrams/model-cli.svg§4.4
Medallion Architecture (SVG)docs/src/assets/diagrams/model-medallion.svg§4.3, §4.5
System Context (ASCII)Inline in §4.3.3§4.3
Physical Infrastructure (ASCII)Inline in §4.5.1§4.5
Network Topology (ASCII)Inline in §4.5.2§4.5
Data Storage Layout (ASCII)Inline in §4.5.5§4.5
DQ Gate Flow (ASCII)Inline in §4.3.2§4.3
Data Product Lifecycle (ASCII)Inline in §4.2.1§4.2
Business Process Flow (ASCII)Inline in §4.2.4§4.2
Conceptual Data Model (ASCII)Inline in §4.3.1§4.3