Solution Architecture

Field	Value
Document Title	RDT MODEL — Solution Architecture
Version	1.0
Date	2026-05-03
Status	Draft
Classification	Roche Internal

Authorship

Role	Name
Author	Sebastian Streit
Reviewer	Xavier Gutierrez
Approver	Nick Perry
Approver	Paulina Maria Swiecicka

Change History

Version	Date	Author	Changes
0.1	2026-05-03	Sebastian Streit	Initial draft — all sections

1. Purpose

This document describes the solution architecture for RDT MODEL — a data infrastructure compiler built as a Rust CLI platform for Roche Global IT. The system takes an RTiS ontology model as input and produces a complete, certified data product as output: all Snowflake layers (Bronze/Silver/Gold/Semantic), data contract, OPA policies, SDK, MCP tool, API specification, documentation, and audit trail.

A single command — rdt-model-compile run --entity <name> — orchestrates 18 specialised CLI modules across 6 pipeline phases to deliver every artifact required for a data product to be discoverable, governed, quality-assured, and AI-ready.

1.1 Scope

This architecture covers:

The 18-module CLI pipeline and its orchestration model
All 21 external system integrations (sources, targets, bidirectional)
The Snowflake medallion architecture (Bronze/Silver/Gold/Semantic)
Data quality enforcement (OPA real-time + dbt batch, gates G1–G4)
The shared Rust library (rdt-model-common) and cross-cutting patterns
Two Streamlit in Snowflake UI applications (CRUD, Ratification)
Physical infrastructure: Snowflake, CaaS Kubernetes, Vault, GitHub Actions
Environment strategy: dev/test/prod through configuration, not separate systems

1.2 Assumptions

#	Assumption
A1	RTiS is the canonical source of truth for all data model definitions. Every data product originates from an RTiS entity.
A2	Snowflake is the sole target analytics platform. All medallion layers deploy to a single Snowflake account with schema-level isolation per environment.
A3	Stub-first development: modules implement the full interface using stub clients until access tasks (A01–A19) are resolved. The pipeline is runnable in `--dry-run` mode without credentials.
A4	All environments (dev/test/prod) share the same physical infrastructure. Separation is achieved through configuration (schema prefixes, K8s namespaces, Vault paths).
A5	GitHub Actions is the CI/CD platform. All deployment flows run through GitHub Actions workflows.
A6	Collibra is the enterprise data governance platform. Stewardship metadata is pulled at generation time; lineage is pushed at deployment time.
A7	PingFederate (via Snowflake WAM) is the OAuth identity provider for all Snowflake access.

1.3 Constraints

#	Constraint	Impact
C1	Access tasks (A01–A19) block live integrations with external systems. Until resolved, all modules use `StubClient` implementations returning fixture data.	Modules are developed and tested against stubs; live integration is a configuration change, not a code change.
C2	Single Snowflake account for all environments. No separate accounts for dev/test/prod.	Environment isolation relies on schema naming conventions (`DEV_BRONZE`, `TEST_BRONZE`, `PROD_BRONZE`).
C3	Roche VPN required for RTiS, GUPRI, MRHub, and Vault access. GitHub Actions runners must be on-network.	CI/CD runners must be self-hosted or use Roche’s VPN-connected runner pool.
C4	Collibra deployment model (on-prem vs. cloud) not yet confirmed. Network path may require async batch sync.	Architecture supports both real-time REST and batch file exchange patterns.
C5	Rust expertise required for CLI development.	Mitigated by LLM-assisted development and comprehensive ADR documentation.

Document	Location
ADR 0001 — Project Vision	`adr/0001-project-vision.md`
ADR 0007 — Data Product Lifecycle	`adr/0007-data-product-lifecycle.md`
ADR 0009 — Module I/O Contracts	`adr/0009-module-io-contracts.md`
ADR 0010 — Environment Strategy	`adr/0010-environment-strategy.md`
ADR 0011 — Pipeline Restructure	`adr/0011-pipeline-restructure.md`
Pipeline Overview	`docs/src/content/docs/architecture/pipeline-overview.md`

2. Definitions

Term	Definition
RTiS	Roche Terminology and Information Services — the canonical source of data model definitions, ontologies, terminologies, and synonyms. Deployed on AWS behind Roche VPN.
GUPRI	Globally Unique Persistent Roche Identifier — a persistent identifier system that assigns resolvable URIs to every artifact. Ensures global uniqueness across Roche systems.
Collibra	Enterprise data governance platform providing stewardship, ownership, classification, SLAs, PII flags, and lineage tracking. Bidirectional: provides metadata at generation, receives lineage on deployment.
Data Product	The complete output of one pipeline run for one entity: all Snowflake layers, data contract, policies, SDK, MCP tool, API spec, documentation, and audit trail. There is no partial product.
Medallion Architecture	A layered data architecture pattern: Bronze (raw, append-only), Silver (curated, validity-checked), Gold (business-ready, rule-checked), Semantic (AI-queryable, metric definitions). See ADR 0004.
OPA	Open Policy Agent — a general-purpose policy engine. Used here for real-time data quality enforcement and access control, deployed as containers on CaaS Kubernetes.
Rego	The declarative policy language used by OPA. Generated from YAML rule definitions by `rdt-model-policy`.
dbt	Data Build Tool — a SQL-first transformation framework. Used here for batch data quality enforcement and view materialisation in Snowflake.
Data Contract	A machine-readable specification (datacontract.com 1.1.0) defining the schema, SLA, quality expectations, and ownership of a data product. Generated by `rdt-model-contract`.
MCP	Model Context Protocol — an open standard for AI tool definitions. Generated MCP tools expose Gold data products to AI agents (Cortex Analyst, Claude).
CaaS	Container as a Service — Roche’s managed Kubernetes platform (Rancher-based). Hosts OPA policy containers and bundle refresh jobs.
Cortex Analyst	Snowflake’s AI-powered natural language query engine. Consumes Semantic view definitions to answer business questions in natural language.
PingFederate	Roche’s enterprise identity provider. All OAuth flows (including Snowflake WAM) route through PingFederate for authentication.
WAM	Web Access Management — Snowflake’s OAuth integration layer that delegates authentication to PingFederate via `client_credentials` grant.
DQ Gate	Data Quality Gate — one of four mandatory quality checkpoints (G1–G4) that data passes through before certification. Each gate has specific checks and failure consequences.
Stub Client	A test implementation of a system integration trait that returns fixture data from local JSON files. Enables full pipeline execution without live credentials.
Entity	A logical data object defined in RTiS (e.g., “waste-tracking”, “site-energy”). The unit of work for the pipeline — one entity produces one complete data product.
MRHub	Master Reference Hub — Roche’s master data system providing reference data for G2 validity checks and Solace change events.
Solace	Enterprise event bus for publishing data product lifecycle events (creation, update, deprecation).
Sinequa	Enterprise search engine. Receives offline documentation for data product discovery across Roche.
Mulesoft	API management platform (Anypoint). Publishes generated OpenAPI specifications as managed, governed APIs.
Snowflake Horizon	Snowflake’s cross-account data governance and discovery layer. Used for registering data products for cross-account access.

3. Current State (AS-IS)

No formal data product architecture exists today. The current state across Roche data domains is characterised by:

Manual, linear process. Creating a new data product takes 3–6 months of specialist involvement. Each business question requires a dedicated data engineer, manual Snowflake provisioning, hand-written dbt models, and ad-hoc quality checks. There is no reusable infrastructure.

Disconnected systems. RTiS holds ontology definitions, Collibra holds governance metadata, Snowflake hosts the data — but no automated pipeline connects them. Metadata flows are manual, inconsistent, and frequently stale.

No shared semantic layer. KPI definitions diverge across teams. The same metric exists in 4+ variants. Global reporting requires manual Excel reconciliation between domain teams.

No data contracts. Upstream schema changes cascade unpredictably into downstream consumers. Breakages surface in production dashboards and board presentations. There is no machine-readable contract between producer and consumer.

Inconsistent quality assurance. Some data products have strict dbt tests; others have none. Users cannot distinguish certified data from unchecked data. This creates a false sense of quality that is more dangerous than having no quality gates at all.

No AI readiness. Zero data products have semantic definitions suitable for natural language queries. Cortex Analyst cannot be deployed. AI agents have no MCP tools to access governed data.

The immediate trigger is the Global Sites Network: 100+ operational tools that cannot communicate, producing siloed data with no shared semantics. The same structural problems exist across all Roche data domains. The platform is designed as a horizontal solution serving all domains, with Global Sites Network as the pilot.

4. Proposed Architecture (TO-BE)

4.1 Solution Overview

RDT MODEL is a data infrastructure compiler: it takes a declarative model definition as input and produces a complete, deployable data product as output.

RTiS is the grammar. roche-data is the compiler. Git is the object store. dbt is the runtime.

The compiler is implemented as a Rust CLI workspace containing 18 specialised binary modules plus one shared library. A single orchestration command — rdt-model-compile run --entity <name> — invokes all modules in dependency order across 6 sequential phases. Within each phase, modules that share no data dependency execute in parallel.

From one RTiS entity definition, the compiler produces:

#	Artifact	Purpose
1	Bronze table + G1 DQ	Physical append-only landing, schema enforcement
2	Silver view + G2 DQ	Curated data, validity-checked against master data
3	Gold view + G3 DQ	Business-ready data, rule-checked and SLA-governed
4	Semantic view	AI-queryable metrics for Cortex Analyst
5	Data contract	Machine-readable schema + SLA + quality spec (datacontract.com 1.1.0)
6	OPA policies	Real-time DQ enforcement + access control (6 policy domains)
7	dbt tests	Batch DQ enforcement, aligned with OPA rules
8	Python + CLI SDK	Type-safe programmatic access for consumers
9	MCP tool	AI agent tool definition (Cortex Analyst, Claude)
10	OpenAPI spec	REST API contract for managed publication
11	Documentation	Offline docs for enterprise search (Sinequa)
12	Platform events	Solace creation/update events + ServiceNow change records
13	Audit trail	CSRD/GDPR/GxP-compliant creation audit

All artifacts are committed to git, deployed through GitHub Actions CI/CD, and registered with GUPRI persistent identifiers. There is no partial product — every entity gets the full stack.

4.1.1 Evolutionary Architecture

The platform is designed to scale across three dimensions:

Multi-domain scaling. Global Sites Network is the pilot domain. The same CLI, templates, and pipeline serve every Roche data domain. Adding a new domain requires only RTiS model definitions and domain-specific business rules — no platform changes.

LLM enrichment. Claude on AWS Bedrock enriches metadata where RTiS coverage is insufficient: terminology mappings, synonym generation, field descriptions, and DQ rule suggestions. All LLM output is human-reviewable and subject to the four-eyes PR rule.

Self-service. Two Streamlit in Snowflake UI applications provide non-technical users with CRUD and ratification capabilities. The Starlight documentation site auto-generates from pipeline artifacts — it cannot drift from implementation.

Iterative quality. Data products ship structurally complete on day one. Quality rules improve iteratively: Bronze rules are mechanical (from schema), Silver rules add master data checks, Gold rules incorporate business logic from domain experts. The pipeline re-runs on model changes without re-architecture.

4.1.2 Alternatives Rejected

Alternative	Reason for rejection
Commercial data catalog (DataHub, Atlan, Unity Catalog)	These catalog what already exists. They do not generate the artifacts that make data AI-ready. RDT MODEL is upstream — it generates the metadata catalogs consume. Not mutually exclusive: Collibra remains as the governance layer.
Python CLI toolbox	Python CLIs carry virtualenv and dependency management debt into every CI pipeline. A Rust binary ships as a single file with no runtime dependencies, starts an order of magnitude faster in CI, and eliminates “it works on my machine” failures.
Central team builds all products	Preserves linear scalability: new business question = new ticket = 6–8 weeks. The platform shifts this to exponential: new question = one CLI command = one CI cycle.
Separate repos per domain	Artifact types are deeply coupled (contract YAML that Bronze depends on comes from the same model as Semantic YAML). Monorepo keeps versions consistent and makes template changes propagate to all domains simultaneously.

4.2 Business Architecture

4.2.1 Data Product Lifecycle

A data product follows this lifecycle (see ADR 0007):

graph LR
    DEFINE["**DEFINE**<br/>RTiS + Collibra"]
    GENERATE["**GENERATE**<br/>CLI runs 18 mods"]
    VALIDATE["**VALIDATE**<br/>Schemas + DQ"]
    DEPLOY["**DEPLOY**<br/>CI/CD promote"]
    CERTIFY["**CERTIFY**<br/>G4 pass → stable"]
    REFINE["**REFINE**<br/>PR-based rule updates"]

    DEFINE --> GENERATE --> VALIDATE --> DEPLOY --> CERTIFY
    CERTIFY --> REFINE
    REFINE --> DEFINE

Define — Data stewards define the entity in RTiS (schema, ontology, relationships) and configure governance in Collibra (ownership, SLA, classification).
Generate — rdt-model-compile run orchestrates all 18 modules to produce the complete artifact set.
Validate — rdt-model-validate checks all artifacts against JSON Schemas, syntax rules, and cross-references.
Deploy — GitHub Actions CI/CD promotes validated artifacts through dev → test → prod.
Certify — G4 Consistency gate confirms data meets trend baselines and AI guardrails.
Refine — Domain experts improve quality rules and business logic through normal PR workflow. Re-run generates updated artifacts.

4.2.2 Pipeline Phases

The 18 modules are organised into 6 sequential phases. Phases run in order; modules within a phase run in parallel when they share no data dependency.

graph LR
    subgraph Phase1["**Phase 1: INGEST**"]
        Pull
        Profile
    end
    subgraph Phase2["**Phase 2: ENRICH**"]
        Govern
        Infer
    end
    subgraph Phase3["**Phase 3: PREPARE**"]
        Compile
        Validate
    end
    subgraph Phase4["**Phase 4: DEPLOY**"]
        Store
        Policy
        Api
        Mcp
        Sdk
        Contract
    end
    subgraph Phase5["**Phase 5: REGISTER**"]
        Register
        Gupri
        Search
    end
    subgraph Phase6["**Phase 6: SUPPORT**"]
        Docs
        Cidb
        Event
    end

    Phase1 --> Phase2 --> Phase3 --> Phase4 --> Phase5 --> Phase6

Phase	Name	Purpose	Modules	Parallelism
1	Ingest	Acquire source data from upstream systems	`pull`, `profile`	Full (no dependencies between modules)
2	Enrich	Add governance metadata and LLM intelligence	`govern`, `infer`	Full (no dependencies between modules)
3	Prepare	Compile artifacts and validate correctness	`compile`, `validate`	Sequential (`validate` depends on `compile`)
4	Deploy	Push artifacts to target systems	`store`, `policy`, `api`, `mcp`, `sdk`, `contract`	Full (6 modules in parallel)
5	Register	Announce to enterprise catalogs	`register`, `gupri`, `search`	Full (3 modules in parallel)
6	Support	Generate docs, compliance records, events	`docs`, `cidb`, `event`	Full (3 modules in parallel)

4.2.3 Actor Roles

Actor	Responsibility	Interaction
Data Engineer	Defines entity models in RTiS, authors `rules.yaml` and `policies.yaml`, runs the pipeline, reviews generated artifacts	CLI + Git + PR workflow
Data Steward	Maintains governance metadata in Collibra (ownership, SLA, classification, PII flags). Ratifies changes via Streamlit UI.	Collibra + Ratification UI
Platform Team	Maintains the CLI codebase, templates, and infrastructure. Resolves access tasks. Manages CI/CD workflows.	Rust development + GitHub Actions
Domain Expert	Refines Gold business rules and Semantic view definitions. Reviews LLM enrichment suggestions.	PR review + `rules.yaml` authoring
Consumer	Queries data products via SDK, API, Cortex Analyst, or direct Snowflake access. Relies on data contracts for stability guarantees.	SDK + API + SQL + natural language
AI Agent	Accesses Gold data products via MCP tools. Uses Semantic view definitions for natural language understanding.	MCP protocol + Cortex Analyst

4.2.4 Business Process Flow

sequenceDiagram
    participant DE as Data Engineer
    participant PL as Platform (automated)
    participant DS as Data Steward

    DE->>PL: Define entity in RTiS
    DS->>PL: Configure governance in Collibra
    DE->>PL: Run pipeline (single cmd)
    PL->>PL: Pull model + governance
    PL->>PL: Enrich with LLM
    PL->>PL: Compile all artifacts
    PL->>PL: Validate schemas
    PL->>PL: Deploy to Snowflake
    PL->>PL: Register in catalogs
    PL->>PL: Publish events
    PL->>DE: Commit artifacts to git, open PR
    DE->>DE: Review PR (4-eyes rule)
    DE->>PL: Merge to main
    PL->>PL: CI/CD promotes: dev → test → prod
    PL->>DS: G4 certification
    DS->>DS: Ratify changes via Streamlit UI

4.3 Data Architecture

4.3.1 Conceptual Data Model

The platform operates on a three-stage conceptual model: source ontology → enriched entity model → generated artifacts.

graph TD
    RTiS["**RTiS Ontology**<br/>Schema, fields, types,<br/>relationships, terminologies"]
    Collibra["**Collibra Governance**<br/>Ownership, SLA, PII,<br/>classification"]
    LLM["**Claude on Bedrock**<br/>Term mapping, synonyms,<br/>descriptions"]
    Entity["**Entity Model (enriched)**<br/>model.json + governance.json<br/>+ suggestions.json"]

    RTiS --> Entity
    Collibra --> Entity
    LLM --> Entity

    Entity --> Snowflake["**Snowflake**<br/>Bronze DDL, Silver SQL,<br/>Gold SQL, Semantic"]
    Entity --> Quality["**Quality**<br/>OPA Rego, dbt tests,<br/>K8s deploy, Bundle cfg"]
    Entity --> Consumer["**Consumer Access**<br/>SDK, API spec,<br/>MCP tool, Docs"]
    Entity --> Platform["**Platform**<br/>Data contract, GUPRI URI,<br/>Solace event, Audit trail"]

Key entities:

Entity	Description	Cardinality
RTiS Entity	A logical data object (e.g., “waste-tracking”, “site-energy”). The unit of work.	1 per data product
Field	A typed attribute within an entity. Carries RTiS metadata (type, terminology, synonyms).	N per entity
Rule Group	A collection of validation rules authored in `rules.yaml`. Compiled to both OPA and dbt.	1 per entity
Policy Set	Access, workflow, API, audit, and deployment policies in `policies.yaml`.	1 per entity
Governance Record	Collibra-sourced stewardship: owner, steward, SLA, classification, PII flags.	1 per entity
Data Product	The complete output bundle: all Snowflake layers + all artifacts.	1 per entity
GUPRI Record	Persistent identifier registration. Every artifact and entity gets a resolvable URI.	N per entity

Relationships:

graph LR
    Entity["**RTiS Entity**"]
    Entity -->|1:N| Field
    Entity -->|1:1| RuleGroup["Rule Group"]
    Entity -->|1:1| PolicySet["Policy Set"]
    Entity -->|1:1| GovRecord["Governance Record<br/>(from Collibra)"]
    Entity -->|1:1| DataProduct["Data Product<br/>(generated)"]
    DataProduct -->|1:N| GUPRI["GUPRI Record<br/>(registered)"]
    DataProduct -->|1:1| Contract["Data Contract<br/>(generated)"]
    DataProduct -->|1:4| Layers["Snowflake Layers<br/>(Bronze/Silver/Gold/Semantic)"]

4.3.2 Data Governance Architecture

Data governance is architecturally embedded, not bolted on. Two systems enforce quality, and one system provides governance metadata.

Governance metadata source: Collibra

Collibra is the authoritative source for all governance metadata. Ownership, SLAs, data classification, and PII flags are not authored locally — they are pulled from Collibra where data stewards maintain them. The pipeline pulls governance at generation time and pushes lineage at deployment time.

Metadata	Source	Used by
Data owner	Collibra	Data contract, documentation
Data steward	Collibra	Ratification UI, change notifications
SLA (availability, freshness)	Collibra	Data contract, G4 monitoring
Data classification	Collibra	OPA access policies, PII handling
PII flags	Collibra	Column-level masking policies
Terms of use	Collibra	Data contract, SDK documentation
Lineage	Generated → Collibra	Enterprise lineage graph

Quality enforcement: four gates

Data passes through four mandatory quality gates before certification:

graph LR
    G1["**G1 COMPLETENESS**<br/>Bronze Layer<br/><br/>Schema match<br/>Type conformance<br/>NOT NULL enforced<br/><br/>_FAIL: file rejected_"]
    G2["**G2 VALIDITY**<br/>Silver View<br/><br/>MRHub identity<br/>Orphan detection<br/>Referential integrity<br/><br/>_FAIL: row excluded from Silver_"]
    G3["**G3 BUSINESS RULES**<br/>Gold View<br/><br/>Range checks<br/>Cross-field<br/>Freshness SLA<br/><br/>_FAIL: row excluded, steward alerted_"]
    G4["**G4 CONSISTENCY**<br/>Certified Product<br/><br/>Trend deviation<br/>AI guardrail<br/>30-day baseline<br/><br/>_FAIL: visible w/ Warning badge_"]

    G1 --> G2 --> G3 --> G4

Dual enforcement: OPA (real-time) + dbt (batch)

Rules are defined once in rules.yaml and compiled to two execution targets:

Target	Engine	Context	Latency	Used by
OPA	Rego policies on Kubernetes	API boundary, form validation, real-time checks	Milliseconds	UI, API consumers, integration tests
dbt	SQL tests in Snowflake	Pipeline execution, batch validation, regression detection	Seconds–minutes	CI/CD, scheduled runs, monitoring

Both targets are generated from the same rules.yaml source by rdt-model-policy. They must stay in sync — a rule that passes in OPA must also pass in dbt, and vice versa.

4.3.3 System Context

The platform integrates with 21 external systems across source, target, and bidirectional roles.

graph TD
    Platform["**RDT MODEL Platform**<br/>18 CLI Modules + rdt-model-common<br/>+ 2 Streamlit UI apps"]

    subgraph Sources["SOURCES"]
        RTiS["RTiS (ontology)"]
        Aurora["Aurora PostgreSQL"]
        MRHub["MRHub (master data)"]
        Bedrock["Claude/Bedrock (LLM)"]
        Vault["Vault (secrets)"]
    end

    subgraph Bidirectional["BIDIRECTIONAL"]
        Collibra["Collibra (governance ↔ lineage)"]
        GUPRI["GUPRI (register ↔ resolve)"]
        Horizon["Snowflake Horizon (discovery ↔ governance)"]
    end

    subgraph Targets["TARGETS"]
        Snowflake["Snowflake (DDL + views)"]
        K8s["CaaS/Kubernetes (OPA pods)"]
        Mulesoft["Mulesoft (API)"]
        Solace["Solace (events)"]
        Sinequa["Sinequa (search)"]
        ServiceNow["ServiceNow (CIDM)"]
        Artifactory["Artifactory (Docker images)"]
        DataMP["Data Marketplace"]
        MCPReg["MCP Registry"]
    end

    subgraph PlatformSvc["PLATFORM SERVICES"]
        GHA["GitHub Actions (CI/CD)"]
        Ping["PingFederate/WAM (OAuth)"]
        Starlight["Starlight/Astro (docs)"]
    end

    Sources --> Platform
    Platform <--> Bidirectional
    Platform --> Targets
    PlatformSvc -.-> Platform

System classification:

Role	Systems	Data Flow
Source	RTiS, Aurora PostgreSQL, MRHub, Claude/Bedrock, Vault	System → Platform
Bidirectional	Collibra, GUPRI, Snowflake Horizon	System ↔ Platform
Target	Snowflake, CaaS/K8s, Mulesoft, Solace, Sinequa, ServiceNow, Artifactory, Data Marketplace, MCP Registry	Platform → System
Platform	GitHub Actions, PingFederate/WAM, Starlight/Astro	Infrastructure

4.3.4 Interface Summary

Each row represents a data exchange between the RDT MODEL platform and an external system. Interface IDs are used for traceability to access tasks.

IF-ID	Source	Target	Data Exchanged	Frequency	Protocol	Auth	Module	Access Task	Status
IF-01	RTiS	Platform	Entity definitions, ontologies, terminologies, synonyms	On-demand (pipeline trigger)	REST / GraphQL	OAuth (PingFederate)	`pull`	A01	Stub
IF-02	Aurora PostgreSQL	Platform	Upstream table metadata (columns, types, constraints)	On-demand (profiling)	PostgreSQL wire protocol	Username/password (Vault)	`profile`	A18	Stub
IF-03	Snowflake WAM	Platform	OAuth access tokens for Snowflake operations	Per-session	REST OIDC (`client_credentials`)	OAuth (PingFederate)	All Snowflake ops	A19	Live
IF-04	Platform	Snowflake	Bronze DDL, dbt models (Silver/Gold/Semantic views)	On deployment	Snowflake REST API	OAuth (WAM token)	`store`	A05/A06	Partial
IF-05	Collibra	Platform	Governance metadata: ownership, SLA, classification, PII	On-demand (pipeline trigger)	REST API	OAuth	`govern`	A07	Stub
IF-06	Platform	Collibra	Lineage records after deployment	On deployment	REST API	OAuth	`register`	A08	Stub
IF-07	Claude (Bedrock)	Platform	LLM-generated term mappings, descriptions, DQ suggestions	On-demand (enrichment)	AWS Bedrock API	AWS IAM (SigV4)	`infer`	—	Planned
IF-08	Platform	CaaS/K8s	OPA deployment manifests, Rego bundles, ConfigMaps	On deployment	Kubernetes API	Rancher token	`policy`	A13	Active
IF-09	Platform	Artifactory	OPA container images	On build	Docker Registry v2	Token	`policy` (build)	—	Planned
IF-10	Platform	Mulesoft	OpenAPI specifications for managed API publication	On deployment	Anypoint Platform API	OAuth	`api`	A09	Stub
IF-11	Platform	MCP Registry	MCP tool definitions for AI agent registration	On deployment	TBD	TBD	`mcp`	—	Planned
IF-12	GUPRI	Platform	Persistent identifier resolution (existing URIs)	On-demand	REST API	OAuth (PingFederate)	`gupri`	A02	Stub
IF-13	Platform	GUPRI	Persistent identifier registration (new URIs)	On deployment	REST API	OAuth (PingFederate)	`gupri`	A02	Stub
IF-14	Platform	Snowflake Horizon	Cross-account discovery and governance metadata	On deployment	Snowflake API	OAuth (WAM token)	`register`	—	Stub
IF-15	Platform	Data Marketplace	Data product catalog registration	On deployment	REST API	TBD	`register`	A15	Stub
IF-16	Platform	Sinequa	Offline documentation for enterprise search indexing	On deployment	TBD	TBD	`search`	A17	Stub
IF-17	MRHub	Platform	Master reference data for G2 validity lookups	On-demand	REST API	OAuth	`policy`	A03	Not started
IF-18	MRHub (Solace)	Platform	Change events for master data updates	Continuous	Solace event subscription	Token	`policy`	A04	Not started
IF-19	Platform	Solace	Data product creation/update lifecycle events	On deployment	Solace event publish	Token	`event`	A04	Stub
IF-20	Platform	ServiceNow	Change management records (CIDM)	On deployment	REST Table API	OAuth	`cidb`	A12	Stub
IF-21	Vault	Platform	Secrets (database credentials, API keys, tokens)	On CI/CD run	REST API (AppRole / OIDC)	AppRole / OIDC JWT	All (via CI)	A16	Live

Interface status legend:

Status	Meaning
Live	Integration is operational with real credentials.
Partial	Authentication works; functional integration pending (e.g., Snowflake auth live, schema provisioning pending).
Active	Infrastructure access confirmed; integration under development.
Stub	Module implements the interface using `StubClient` with fixture data. Switching to live is a configuration change.
Planned	Module exists but integration work has not started.
Not started	Access task not yet filed or investigated.

4.3.5 Data Migration

This is a greenfield platform — there is no legacy system to migrate from. However, entity onboarding involves importing existing data structures:

Entity onboarding via rdt-model-profile

For entities that exist in upstream databases but lack RTiS representation, rdt-model-profile provides a discovery path:

graph LR
    DB["**Upstream DB**<br/>Aurora PG / Snowflake"]
    Profile["**rdt-model-profile**<br/>Discover tables<br/>Extract schema<br/>Suggest model"]
    RTiS["**RTiS**<br/>Register as<br/>new entity<br/>(manual step)"]

    DB -->|profile| Profile -->|suggest| RTiS

rdt-model-profile connects to the upstream database (Aurora PostgreSQL or Snowflake).
Extracts table metadata: column names, types, constraints, sample data statistics.
Produces a suggested entity model that a data engineer reviews and registers in RTiS.
Once registered in RTiS, the standard pipeline takes over.

This is a discovery aid, not an automated migration. The data engineer makes all decisions about entity structure, naming, and classification. The profile output is a suggestion that accelerates the manual RTiS registration process.

Data backfill

Historical data from upstream systems is loaded into Bronze tables through the standard ingestion path. There is no special migration tool — Bronze tables are append-only, and historical data is simply the first batch of appended records. The Silver/Gold/Semantic views immediately operate over this data once loaded.

4.4 Logical Architecture

The logical architecture is organised by pipeline phase. Each module follows the conventions in ADR 0008 (CLI module standards) and ADR 0009 (module I/O contracts). See also the Pipeline Overview for data flow and workspace isolation.

4.4.1 rdt-model-pull

Phase: 1 — Ingest Purpose: Fetch an entity definition from RTiS and write a frozen JSON snapshot to the pipeline workspace. This is the entry point for every data product — all downstream modules consume the snapshot. Parallelizable: Yes (parallel with profile within Phase 1)

Input/Output

Direction	Artifact	Path	Format
Input	RTiS entity ID	CLI argument (`--entity`)	String
Output	Frozen entity snapshot	`models/{entity}/model.json`	JSON
Output	Module result envelope	stdout (when `--json`)	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
RTiS	`RTisClient`	OAuth (PingFederate) / Basic Auth	IF-01	A01	Stub

Technology

Language: Rust (edition 2021)
Async: Yes — tokio::runtime::Runtime (Pattern A)
Key crates: reqwest (HTTP), async-trait, chrono, uuid (UUIDv7 run correlation)
Templates: None (data-only module)

Subcommands

Command	Description
`pull`	Fetch entity from RTiS and write `model.json`
`diff`	Show changes between local snapshot and RTiS (planned)
`list`	List available entities in RTiS
`snapshot`	Create versioned snapshot (planned)

Current Status

Stub — Full command structure implemented. StubRTisClient returns fixture data from cli/common/src/clients/fixtures/rtis/. HttpRTisClient implemented with JSON-LD response mapping, pending A01 resolution for live RTiS access.

4.4.2 rdt-model-profile

Phase: 1 — Ingest (optional support module) Purpose: Discover and profile existing database tables in upstream systems (Aurora PostgreSQL, Snowflake) to suggest entity models for RTiS registration. Used for onboarding entities that lack RTiS representation. Parallelizable: Yes (parallel with pull within Phase 1)

Input/Output

Direction	Artifact	Path	Format
Input	Database connection + table identifier	CLI arguments (`--database-type`, `--schema`, `--table`)	String
Input	Sample row count	CLI argument (`--sample-rows`, default 100, max 10,000)	Integer
Output	Table structural metadata	`{output_dir}/{db_type}/{schema}.{table}.profile.json`	JSON
Output	Sample data	Embedded in profile JSON	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
Aurora PostgreSQL	`DatabaseProbe`	Username/password (Vault)	IF-02	A18	Stub
Snowflake	`DatabaseProbe`	OAuth (WAM token)	IF-04	A19	Stub

Technology

Language: Rust (edition 2021)
Async: Yes — tokio::runtime::Runtime (Pattern A)
Key crates: reqwest, async-trait, regex (identifier validation), flate2 (optional gzip)
Templates: None (data-only module)

Subcommands

Command	Description
`profile`	Profile a database table: extract schema, types, constraints, sample data

Current Status

Stub — StubDatabaseProbe returns deterministic fixture data. SQL identifier validation implemented. Real Snowflake and Aurora PostgreSQL probes pending access tasks A18/A19.

4.4.3 rdt-model-govern

Phase: 2 — Enrich Purpose: Pull governance metadata from Collibra for an entity — ownership, stewardship, data classification, SLAs, PII flags, and terms of use. This metadata feeds into the data contract, documentation, and access policies. Parallelizable: Yes (parallel with infer within Phase 2)

Input/Output

Direction	Artifact	Path	Format
Input	Entity ID	CLI argument (`--entity`)	String
Input	model.json (from Phase 1)	`models/{entity}/model.json`	JSON
Output	Governance metadata	`models/{entity}/governance.json`	JSON
Output	Module result envelope	stdout (when `--json`)	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
Collibra	`CollibraClient`	OAuth (`client_id` + `client_secret` + `x-meta-bridge-key`)	IF-05	A07	Stub

Technology

Language: Rust (edition 2021)
Async: Yes — #[tokio::main] macro
Key crates: tokio, tracing
Templates: None (data passthrough)

Subcommands

Command	Description
`pull`	Fetch governance metadata from Collibra
`status`	Show Collibra sync status (planned)

Current Status

Stub — StubCollibraClient returns fixture CollibraMetadata. HttpCollibraClient implemented with pagination support. Blocked on A07 (Collibra access task).

4.4.4 rdt-model-infer

Phase: 2 — Enrich (optional) Purpose: Enrich entity metadata using Claude on AWS Bedrock — generate term mappings, business-friendly synonyms, field descriptions, and DQ rule suggestions where RTiS coverage is insufficient. All suggestions are human-reviewable. Parallelizable: Yes (parallel with govern within Phase 2)

Input/Output

Direction	Artifact	Path	Format
Input	Entity ID	CLI argument (`--entity`)	String
Input	model.json (from Phase 1)	`models/{entity}/model.json`	JSON
Input	Scope filter (optional)	CLI argument (`--scope`: terms, descriptions, rules)	String
Output	LLM enrichment suggestions	`models/{entity}/suggestions.json`	JSON
Output	Module result envelope	stdout (when `--json`)	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
Claude (AWS Bedrock)	`LlmClient`	AWS IAM (SigV4)	IF-07	—	Planned

Technology

Language: Rust (edition 2021)
Async: Yes — #[tokio::main] macro
Key crates: tokio, serde, tracing
Templates: None (data-only module)

Subcommands

Command	Description
`suggest`	Generate LLM suggestions with optional scope filter

Current Status

Planned — Async skeleton implemented. StubLlmClient returns hardcoded suggestions. Real Bedrock integration not yet started. LLM provider confirmed as Claude on AWS Bedrock.

4.4.5 rdt-model-compile

Phase: 3 — Prepare Purpose: Pipeline orchestrator — invokes all downstream modules in dependency order, manages workspace lifecycle, aggregates results, and handles artifact promotion from workspace to repository paths. Parallelizable: No (sequential orchestrator; spawns parallel modules within phases)

Input/Output

Direction	Artifact	Path	Format
Input	Entity ID	CLI argument (`--entity`)	String
Input	model.json, governance.json, suggestions.json	From Phase 1–2 outputs	JSON
Input	rules.yaml, policies.yaml	`models/{entity}/`	YAML
Output	Orchestration result	`compile-result.json` (workspace)	JSON
Output	All artifacts (20+)	`compile/artifacts/` (workspace)	Mixed

External System Integration

None — the orchestrator delegates all external calls to downstream modules.

Technology

Language: Rust (edition 2021)
Async: No — synchronous (Pattern B). Spawns child processes via std::process::Command.
Key crates: serde, serde_json, tracing
Templates: None (orchestrator has no template responsibility)

Subcommands

Command	Description
`run`	Execute full pipeline (or single stage with `--stage`)
`status`	Show pipeline status (planned)
`semantic`	Generate Semantic YAML (delegated, planned)
`openapi`	Generate OpenAPI spec (delegated, planned)
`mcp`	Generate MCP tool definition (delegated, planned)
`rules`	Compile rules.yaml to OPA Rego (delegated, planned)
`policies`	Compile policies.yaml to OPA Rego (delegated, planned)
`k8s`	Generate OPA K8s manifests (delegated, planned)

Current Status

Planned — CLI structure defined with 8 subcommands. Orchestration logic not yet implemented. Will spawn modules as child processes with --json to capture result envelopes.

4.4.6 rdt-model-validate

Phase: 3 — Prepare Purpose: Validate all generated artifacts against JSON Schemas, syntax rules, and cross-references. Acts as a quality gate — the pipeline does not proceed to Phase 4 (Deploy) unless validation passes. Parallelizable: No (must run after compile completes)

Input/Output

Direction	Artifact	Path	Format
Input	All artifacts from Phase 3	`compile/artifacts/` (workspace)	Mixed
Input	JSON Schemas	Embedded at compile time via `include_str!`	JSON Schema
Output	Validation report	stdout	Text / JSON (with `--json`)
Output	Exit code	0 (pass) or non-zero (fail)	Process exit

External System Integration

None — pure local file validation.

Technology

Language: Rust (edition 2021)
Async: No — synchronous (Pattern B). Pure file I/O.
Key crates: jsonschema (JSON Schema validation), serde_json, tracing
Templates: None (validation only)

Subcommands

Command	Description
`all`	Validate all artifacts for an entity
`contract`	Validate data contract YAML
`schema`	Validate any JSON/YAML against a schema
`dbt`	Validate dbt model files
`semantic`	Validate semantic YAML
`rules`	Validate rules against a rule group using OPA

Current Status

Planned — CLI structure defined with 6 subcommands. Validation logic not yet implemented. Will use jsonschema crate with embedded schemas.

4.4.7 rdt-model-store

Phase: 4 — Deploy Purpose: Generate all Snowflake storage artifacts — Bronze DDL, dbt models for Bronze/Silver/Gold layers, and Semantic view definitions. Optionally deploys DDL to Snowflake. Parallelizable: Yes (parallel with policy, api, mcp, sdk, contract within Phase 4)

Input/Output

Direction	Artifact	Path	Format
Input	Entity model	`models/{entity}/model.json`	JSON
Input	Governance metadata	`models/{entity}/governance.json`	JSON
Input	Rules definition	`models/{entity}/rules.yaml`	YAML
Output	Bronze DDL	`snowflake/ddl/{entity}.sql`	SQL
Output	dbt Bronze model + schema	`dbt/models/bronze/{entity}.sql` + `.yml`	SQL + YAML
Output	dbt Silver view + schema	`dbt/models/silver/{entity}_silver.sql` + `.yml`	SQL + YAML
Output	dbt Gold view + schema	`dbt/models/gold/{entity}_gold.sql` + `.yml`	SQL + YAML
Output	Semantic view	`dbt/models/semantic/{entity}.semantic.yml`	YAML

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
Snowflake	`SnowflakeClient`	OAuth (WAM token)	IF-04	A05/A06	Stub

Technology

Language: Rust (edition 2021)
Async: No — synchronous (Pattern B). Pure template rendering.
Key crates: tera (templates), serde_yaml, chrono
Templates (8): bronze_ddl.sql.tera, dbt_bronze.sql.tera, dbt_bronze.yml.tera, silver.sql.tera, silver.yml.tera, gold.sql.tera, gold.yml.tera, semantic.yml.tera

Subcommands

Command	Description
`generate`	Generate all Snowflake artifacts (optionally filter by `--layer`)
`apply`	Deploy DDL to Snowflake (planned)

Current Status

Stub — All 8 Tera templates present. Framework implemented with template rendering logic. StubSnowflakeClient for deployment. Execution pending full wiring.

4.4.8 rdt-model-policy

Phase: 4 — Deploy Purpose: Compile rules.yaml and policies.yaml into OPA Rego policies, generate Kubernetes deployment manifests, and produce dbt test files. Handles both real-time (OPA) and batch (dbt) DQ enforcement from a single rule source. Parallelizable: Yes (parallel within Phase 4)

Input/Output

Direction	Artifact	Path	Format
Input	Rules definition	`models/{entity}/rules.yaml`	YAML
Input	Policies definition	`models/{entity}/policies.yaml`	YAML
Input	Entity model	`models/{entity}/model.json`	JSON
Output	OPA Rego policies	`k8s/{entity}/*.rego`	Rego
Output	K8s Deployment	`k8s/{entity}/opa-deployment.yaml`	YAML
Output	K8s Service	`k8s/{entity}/opa-service.yaml`	YAML
Output	Bundle ConfigMap	`k8s/{entity}/bundle-configmap.yaml`	YAML
Output	Bundle refresh CronJob	`k8s/{entity}/bundle-refresh-cronjob.yaml`	YAML
Output	dbt test files	`dbt/tests/{entity}/`	SQL

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
CaaS/Kubernetes	(pending)	Rancher token	IF-08	A13	Active
MRHub	(pending)	OAuth	IF-17	A03	Not started

Technology

Language: Rust (edition 2021)
Async: Mixed — sync for generation (Pattern B), async for deploy
Key crates: tera (templates), jsonschema (rule validation), serde_yaml, tokio, reqwest
Templates (7): rego-policy.rego.tera, rego-validation.rego.tera, opa-deployment.yaml.tera, opa-service.yaml.tera, bundle-configmap.yaml.tera, bundle-refresh-cronjob.yaml.tera, plus dbt test template
Schemas (3): rules.schema.json, policies.schema.json, validation-response.schema.json

Subcommands

Command	Description
`generate`	Generate Rego policies and K8s manifests (optionally filter by `--domain`)
`deploy`	Deploy to CaaS cluster (planned)
`evaluate`	Local OPA policy evaluation against input data
`dbt`	Generate dbt test files (optionally filter by `--gate`)

Current Status

Stub — All 7 templates and 3 schemas present. CaaS access confirmed (Rancher token active). Framework implemented; execution pending full wiring.

4.4.9 rdt-model-api

Phase: 4 — Deploy Purpose: Generate OpenAPI 3.x specifications from entity models and publish to Mulesoft Anypoint Platform for managed API exposure. Parallelizable: Yes (parallel within Phase 4)

Input/Output

Direction	Artifact	Path	Format
Input	Entity model	`models/{entity}/model.json`	JSON
Input	Governance metadata	`models/{entity}/governance.json`	JSON
Output	OpenAPI spec	`apis/{entity}/openapi.yaml`	YAML (OpenAPI 3.x)

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
Mulesoft	`MulesoftClient`	Anypoint OAuth	IF-10	A09	Stub

Technology

Language: Rust (edition 2021)
Async: Yes — #[tokio::main] (Pattern A)
Key crates: tera (templates), tokio, serde
Templates: Planned (OpenAPI YAML template)

Subcommands

Command	Description
`generate`	Generate OpenAPI 3.x specification
`publish`	Publish to Mulesoft Anypoint (planned)

Current Status

Planned — Framework in place. No templates yet. StubMulesoftClient provides fixture responses. Blocked on A09 (Mulesoft access).

4.4.10 rdt-model-mcp

Phase: 4 — Deploy Purpose: Generate MCP (Model Context Protocol) tool definitions that expose Gold data products to AI agents — Cortex Analyst, Claude, and other MCP-compatible agents. Parallelizable: Yes (parallel within Phase 4)

Input/Output

Direction	Artifact	Path	Format
Input	Entity model	`models/{entity}/model.json`	JSON
Input	Semantic view definition	`dbt/models/semantic/{entity}.semantic.yml`	YAML
Output	MCP tool definition	`apis/{entity}/mcp_tool.json`	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
MCP Registry	`McpRegistryClient`	TBD	IF-11	—	Planned

Technology

Language: Rust (edition 2021)
Async: Yes — #[tokio::main] (Pattern A)
Key crates: tera (templates), tokio, serde
Templates: Planned (MCP tool JSON template)

Subcommands

Command	Description
`generate`	Generate MCP tool definition
`register`	Register with MCP registry (hosting TBD)

Current Status

Planned — Framework in place. MCP hosting model under investigation (Snowflake MCP not available on company account).

4.4.11 rdt-model-sdk

Phase: 4 — Deploy Purpose: Generate type-safe SDK clients for programmatic data product access — Python package and cross-compiled Rust CLI targeting 5 platforms. Parallelizable: Yes (parallel within Phase 4)

Input/Output

Direction	Artifact	Path	Format
Input	Entity model	`models/{entity}/model.json`	JSON
Input	Data contract	`models/{entity}/datacontract.yaml`	YAML
Output	Python SDK	`sdks/{entity}/python/`	Python package
Output	CLI SDK source	`sdks/{entity}/cli/`	Rust source

External System Integration

None — pure code generation module.

Technology

Language: Rust (edition 2021)
Async: No — synchronous (Pattern B). Pure code generation.
Key crates: tera (templates), serde_json, serde_yaml, chrono
Templates: Planned (Python + Rust CLI templates)

Subcommands

Command	Description
`python`	Generate Python SDK package
`cli generate`	Generate Rust CLI SDK source
`cli build`	Cross-compile CLI binaries (optionally `--target <triple>`)

Current Status

Planned — Framework implemented. No templates yet. Pure generation module with no external dependencies.

4.4.12 rdt-model-contract

Phase: 4 — Deploy Purpose: Generate a datacontract.com 1.1.0 YAML specification for the entity’s Gold and Semantic views. The contract defines schema, SLA, quality expectations, and ownership in a machine-readable format. Parallelizable: Yes (parallel within Phase 4)

Input/Output

Direction	Artifact	Path	Format
Input	Entity model	`models/{entity}/model.json`	JSON
Input	Governance metadata	`models/{entity}/governance.json`	JSON
Input	GUPRI record	`models/{entity}/gupri.yaml`	YAML
Output	Data contract	`models/{entity}/datacontract.yaml`	YAML (datacontract.com 1.1.0)

External System Integration

None — pure template rendering module.

Technology

Language: Rust (edition 2021)
Async: No — synchronous (Pattern B). Pure template rendering.
Key crates: tera (templates), jsonschema (output validation), serde_yaml, chrono
Templates (1): contract.yaml.tera
Schemas (1): contract.schema.json (validates generated output)

Subcommands

Command	Description
`generate`	Generate datacontract.yaml for an entity

Current Status

Production-ready — Fully implemented with template rendering, schema validation, GUPRI integration, and dry-run support. Unit tests verify schema compliance.

4.4.13 rdt-model-register

Phase: 5 — Register Purpose: Register the deployed data product across enterprise discovery and governance systems — push lineage to Collibra, register in Snowflake Horizon, and catalog in Roche Data Marketplace. Parallelizable: Yes (parallel with gupri and search within Phase 5)

Input/Output

Direction	Artifact	Path	Format
Input	All Phase 4 deployment results	`deploy/*-result.json`	JSON
Input	Entity model + governance	`models/{entity}/`	JSON + YAML
Output	Registration confirmations	`register/register-result.json`	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
Collibra	`CollibraClient`	OAuth	IF-06	A08	Stub
Snowflake Horizon	`HorizonClient`	OAuth (WAM)	IF-14	—	Stub
Data Marketplace	(TBD)	TBD	IF-15	A15	Stub

Technology

Language: Rust (edition 2021)
Async: Yes — #[tokio::main] (Pattern A)
Key crates: tokio, serde
Templates: None

Subcommands

Command	Description
`collibra`	Push lineage records to Collibra
`horizon`	Register in Snowflake Horizon
`rdm`	Register in Roche Data Marketplace
`all`	Run all three registrations

Current Status

Planned — Framework in place. All client traits defined with stubs. Blocked on A07/A08 (Collibra), A15 (Data Marketplace).

4.4.14 rdt-model-gupri

Phase: 5 — Register Purpose: Register artifacts with GUPRI (Globally Unique Persistent Roche Identifier) to obtain resolvable URIs for every data product artifact. Parallelizable: Yes (parallel within Phase 5)

Input/Output

Direction	Artifact	Path	Format
Input	Entity ID + artifact type	CLI arguments	String
Output	GUPRI record	`models/{entity}/gupri.yaml`	YAML
Output	Module result envelope	stdout (when `--json`)	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
GUPRI	`GupriClient`	OAuth (PingFederate)	IF-12, IF-13	A02	Stub

Technology

Language: Rust (edition 2021)
Async: Yes — #[tokio::main] (Pattern A)
Key crates: tokio, reqwest, async-trait, jsonschema, serde_yaml
Schemas (1): gupri.schema.json (validates GUPRI records)
Templates: None

Subcommands

Command	Description
`register`	Register artifact and obtain GUPRI URI (`--artifact-type`)
`resolve`	Resolve an existing GUPRI URI to its record

Current Status

Production-ready — Fully implemented with StubGupriClient, schema validation, YAML output, and dry-run support. Pending A02 for live GUPRI API integration.

4.4.15 rdt-model-search

Phase: 5 — Register Purpose: Push offline documentation to Sinequa enterprise search engine for data product discovery across Roche. Parallelizable: Yes (parallel within Phase 5)

Input/Output

Direction	Artifact	Path	Format
Input	All generated artifacts	Various paths	Mixed
Output	Search index push confirmation	`register/search-result.json`	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
Sinequa	`SinequaClient`	TBD	IF-16	A17	Stub

Technology

Language: Rust (edition 2021)
Async: Yes — #[tokio::main] (Pattern A)
Key crates: tokio, serde
Templates: None

Subcommands

Command	Description
`push`	Push documentation to Sinequa search index

Current Status

Planned — Framework in place. StubSinequaClient defined. Integration mechanism (API vs. file drop) TBD. Blocked on A17.

4.4.16 rdt-model-docs

Phase: 6 — Support Purpose: Generate Starlight reference documentation from all pipeline artifacts — clap definitions, JSON Schemas, ADRs, contracts, and API specs. The docs site is a build artifact that cannot drift from implementation. Parallelizable: Yes (parallel with cidb and event within Phase 6)

Input/Output

Direction	Artifact	Path	Format
Input	All Phase 4 artifacts	Various	Mixed (SQL, YAML, JSON, Rego)
Input	ADR files	`adr/`	Markdown
Input	CLI definitions	Embedded in binaries	Clap metadata
Output	Reference docs	`docs/src/content/docs/reference/`	Markdown
Output	Architecture docs	`docs/src/content/docs/architecture/`	Markdown
Output	Status docs	`docs/src/content/docs/status/`	Markdown

External System Integration

None — pure generation from local files.

Technology

Language: Rust (edition 2021)
Async: No — synchronous (Pattern B). Pure file reading and template rendering.
Key crates: serde_json, serde
Templates: Planned (Markdown templates for each doc type)

Subcommands

Command	Description
`generate`	Generate all Starlight reference documentation

Current Status

Planned — Framework in place. Generation logic not yet implemented. Output directories (reference/, architecture/, status/) are exclusively owned by this module — no manual editing.

4.4.17 rdt-model-cidb

Phase: 6 — Support Purpose: Create ServiceNow change management records (CIDM) for production deployments. Provides audit trail for change control compliance. Parallelizable: Yes (parallel within Phase 6)

Input/Output

Direction	Artifact	Path	Format
Input	Deployment results	Phase 4–5 result envelopes	JSON
Input	Entity metadata	`models/{entity}/`	JSON + YAML
Output	Change request record	`support/cidb-result.json`	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
ServiceNow	`ServicenowClient`	OAuth	IF-20	A12	Stub

Technology

Language: Rust (edition 2021)
Async: Yes — #[tokio::main] (Pattern A)
Key crates: tokio, serde
Templates: None

Subcommands

Command	Description
`register`	Create ServiceNow change request for deployment

Current Status

Planned — Framework in place. StubServicenowClient defined. Blocked on A12 (ServiceNow access).

4.4.18 rdt-model-event

Phase: 6 — Support Purpose: Publish data product lifecycle events to the Solace enterprise event bus — notifying downstream systems and consumers of new, updated, or deprecated data products. Parallelizable: Yes (parallel within Phase 6)

Input/Output

Direction	Artifact	Path	Format
Input	Entity model	`models/{entity}/model.json`	JSON
Input	GUPRI record	`models/{entity}/gupri.yaml`	YAML
Output	Event publication confirmation	`support/event-result.json`	JSON

External System Integration

System	Client Trait	Auth	Interface	Access Task	Status
Solace	`SolaceClient`	Token	IF-19	A04	Stub

Technology

Language: Rust (edition 2021)
Async: Yes — #[tokio::main] (Pattern A)
Key crates: tokio, chrono, serde, serde_json, serde_yaml, jsonschema
Schemas (1): event.schema.json (validates event payloads)
Event types: Created, Updated, Verified, SupersededBy
Topic pattern: rdt/data-product/{entity_id}/{event_type}

Subcommands

Command	Description
`publish`	Publish lifecycle event (optionally `--event-type`, default: `created`)

Current Status

Production-ready — Fully implemented with client certificate authentication (PEM), schema-validated event payloads, topic-based routing, and dry-run support. A04 resolved 2026-05-07. Additionally, all CLI modules now publish automatic execution events to Solace via rdt-model-common/events.rs.

4.4.19 rdt-model-common (Shared Library)

rdt-model-common is the [lib] member of the Cargo workspace. Every rdt-model-* binary depends on it. It provides no CLI interface — only shared types, traits, and utilities.

Architecture

rdt-model-common/
├── src/
│   ├── lib.rs              ← module exports
│   ├── cli.rs              ← GlobalOpts: --target, --entity, --dry-run, --quiet, --json, --verbose
│   ├── config.rs           ← Config: roche-data.toml + env var overrides + environment resolution
│   ├── paths.rs            ← All output path functions (one per artifact type)
│   ├── errors.rs           ← CliError enum with exit code mapping
│   ├── exit_codes.rs       ← Standard exit codes per ADR 0008
│   ├── fs.rs               ← write_artifact() + write_json_artifact() → OutputAction
│   ├── reporting.rs        ← init_tracing(), ModuleResultBuilder, ModuleResult, OutputAction
│   ├── run_id.rs           ← UUIDv7 run correlation IDs
│   ├── models/
│   │   └── mod.rs          ← Entity, GupriRecord, CollibraMetadata, RulesDefinition, etc.
│   ├── clients/
│   │   ├── mod.rs          ← re-exports all client traits
│   │   ├── rtis.rs         ← RTisClient trait + StubRTisClient
│   │   ├── collibra.rs     ← CollibraClient trait + HttpCollibraClient + StubCollibraClient
│   │   ├── gupri.rs        ← GupriClient trait + StubGupriClient
│   │   ├── snowflake.rs    ← SnowflakeClient trait + StubSnowflakeClient
│   │   ├── snowflake_auth.rs ← SnowflakeAuth (OAuth WAM token exchange)
│   │   ├── postgres_auth.rs  ← PostgresAuth (Vault credential retrieval)
│   │   ├── horizon.rs      ← HorizonClient trait + StubHorizonClient
│   │   ├── solace.rs       ← SolaceClient trait + StubSolaceClient
│   │   ├── mulesoft.rs     ← MulesoftClient trait + StubMulesoftClient
│   │   ├── mcp_registry.rs ← McpRegistryClient trait + StubMcpRegistryClient
│   │   ├── servicenow.rs   ← ServicenowClient trait + StubServicenowClient
│   │   ├── sinequa.rs      ← SinequaClient trait + StubSinequaClient
│   │   ├── rdm.rs          ← RdmClient trait + StubRdmClient
│   │   └── llm.rs          ← LlmClient trait + StubLlmClient
│   └── json/
│       ├── mod.rs          ← re-exports
│       ├── handler.rs      ← JsonHandler: simd-json parse, jsonschema validate, BufWriter write
│       ├── schema_cache.rs ← Lazy OnceLock-based compiled schema cache
│       └── errors.rs       ← JsonValidationError enum
└── schemas/
    ├── manifest.json       ← Shared base manifest schema
    └── result.json         ← Shared result envelope schema

Key Subsystems

Client Trait Pattern

Every external system has a Rust trait and two implementations: a real HTTP client and a stub. The binary layer receives &dyn SystemClient via dependency injection, enabling transparent switching between production and dry-run mode.

graph LR
    Binary["**Binary Module**<br/>e.g., rdt-model-pull"]
    Trait["**Trait (async)**<br/>e.g., RTisClient<br/>get_entity()<br/>list_entities()"]
    Http["**HttpRTisClient**<br/>(live HTTP)"]
    Stub["**StubRTisClient**<br/>(fixture data)"]

    Binary --> Trait
    Http -.->|implements| Trait
    Stub -.->|implements| Trait

Client selection logic: if --dry-run is set or credentials are missing, the module uses the stub client. Otherwise, it instantiates the real HTTP client. This is a configuration decision, not a code change.

JSON Handling (json/)

Centralised JSON processing with three guarantees:

Parsing: simd_json for 2–4x faster throughput with SIMD acceleration
Validation: Three-layer approach — jsonschema at entry, serde types for structure, garde for business logic
Writing: Direct-to-file via BufWriter (no full string allocation)

The JsonHandler facade provides parse() for internal/trusted data and parse_validated() for external input that requires schema validation. A lazy OnceLock-based schema cache compiles schemas once and reuses them.

Path Management (paths.rs)

One function per artifact type. All output paths are constructed here — never inline in commands or generators. Adding a new artifact type means adding one function.

Reporting (reporting.rs)

Two-track system used by every binary:

Track	Target	Purpose
Structured tracing	stderr	Human-readable progress (`info!`, `debug!`, `warn!`)
Result envelope	stdout	Machine-readable JSON for orchestrator integration

init_reporting() is called first in every main(). Verbosity is controlled by --verbose / --quiet / RUST_LOG.

Filesystem Helpers (fs.rs)

write_artifact() and write_json_artifact() handle file output. They return OutputAction (Wrote, Skipped, Updated) which feeds into the result envelope. They respect --dry-run mode and use tracing for progress reporting.

4.4.20 Streamlit UI Applications

Two Streamlit in Snowflake applications provide consumer-facing interfaces outside the CLI pipeline. They are Python applications deployed directly into Snowflake’s Streamlit hosting environment.

rdt-ui-crud (Entity CRUD)

Purpose: Render a data entry form from a crud.json artifact, enabling create/read/update/delete operations against an entity’s Snowflake tables.

Attribute	Value
Directory	`ui/crud/`
Entry point	`app.py`
Input	`models/{entity}/crud.json` (generated by pipeline)
Dependencies	`streamlit`, `snowflake-snowpark-python`
Schema	`schemas/crud.schema.json`
Deployed to	Streamlit in Snowflake
Status	Scaffold (form rendering from spec; CRUD operations not yet wired)

Data flow: The CLI pipeline generates crud.json → Streamlit app reads the spec → renders dynamic form → executes SQL against Snowflake session.

rdt-ui-ratification (Steward Ratification)

Purpose: Enable data stewards to review and approve/change taxonomies, synonyms, definitions, and data tags (PII, classification, usage restrictions) for an entity.

Attribute	Value
Directory	`ui/ratification/`
Entry point	`app.py`
Input	`models/{entity}/model.json` + `models/{entity}/governance.json`
Dependencies	`streamlit`, `snowflake-snowpark-python`
Deployed to	Streamlit in Snowflake
Status	Scaffold (taxonomy display; approval workflow not yet wired)

Data flow: Steward opens app → reviews entity metadata (taxonomies, definitions, governance) → approves or requests changes → changes feed back into the pipeline as governance updates.

4.4.21 Cross-Cutting Architectural Patterns

These patterns are enforced across all 18 binary modules.

Pattern 1: Generator Purity

Generators are pure functions that take a resolved Model and return Result<String>. No filesystem access, no HTTP calls, no side effects. The command layer handles input loading and output writing. Generators live in the module that owns the artifact — never in rdt-model-compile.

pub fn generate_datacontract(
    model: &Model,
    gupri: &GupriRecord,
    governance: &CollibraMetadata,
) -> Result<String> {
    let tmpl = include_str!("../templates/contract.yaml.tera");
    let mut ctx = tera::Context::new();
    ctx.insert("model", model);
    ctx.insert("gupri", gupri);
    ctx.insert("governance", governance);
    tera::Tera::one_off(tmpl, &ctx, false)
        .context("failed to render data contract template")
}

Pattern 2: Embedded Templates

All Tera templates are embedded at compile time using include_str!. The binary is fully self-contained — no runtime template file access. Template changes require recompilation, which triggers CI validation.

Pattern 3: Mandatory Reporting

Every binary calls cli.global.init_reporting() as the first action in main(). All progress uses tracing macros (never println!). When --json is passed, a machine-readable result envelope is emitted to stdout for orchestrator consumption.

Pattern 4: Environment Targeting

Every rdt-model-* command requires --target dev|test|prod. There is no default. The target drives:

Snowflake schema prefix (DEV_BRONZE, TEST_BRONZE, PROD_BRONZE)
Kubernetes namespace (rdt-model-dev, rdt-model-test, rdt-model-prod)
Vault secret path (secret/dev/ci, secret/test/ci, secret/prod/ci)
dbt target profile

Pattern 5: Error Handling

anyhow::Result throughout. Every ? has .context(...) for error chain clarity. No .unwrap() or .expect() outside #[cfg(test)]. Exit codes are standardised: 0 (success), 1 (runtime error), 2 (validation error), 3 (config error).

Pattern 6: Stub-First Development

Every external system integration follows the stub-first pattern. Modules implement the full interface using StubClient implementations that return fixture data. Switching to live is a configuration change (credentials present + --dry-run not set), not a code change. This keeps the full pipeline runnable without any credentials.

4.5 Physical Architecture

4.5.1 Infrastructure Overview

All environments share the same physical infrastructure. Separation is achieved through configuration — schema prefixes, namespaces, and Vault paths — not through separate systems. See ADR 0010 for full rationale.

graph TD
    subgraph Snowflake["**Snowflake (Cloud)**<br/>Account: roche-gsn | Database: RDT_MODEL"]
        subgraph SFDev["DEV"]
            DEV_B["DEV_BRONZE"]
            DEV_S["DEV_SILVER"]
            DEV_G["DEV_GOLD"]
            DEV_SE["DEV_SEMANTIC"]
        end
        subgraph SFTest["TEST"]
            TEST_B["TEST_BRONZE"]
            TEST_S["TEST_SILVER"]
            TEST_G["TEST_GOLD"]
            TEST_SE["TEST_SEMANTIC"]
        end
        subgraph SFProd["PROD"]
            PROD_B["PROD_BRONZE"]
            PROD_S["PROD_SILVER"]
            PROD_G["PROD_GOLD"]
            PROD_SE["PROD_SEMANTIC"]
        end
        SiS["Streamlit: rdt-ui-crud, rdt-ui-ratification"]
        Cortex["Cortex Analyst"]
    end

    subgraph K8s["**CaaS / Kubernetes (Rancher)**<br/>Cluster: Cloud Prod eu-central-1 | Project: rdt_model"]
        K8sDev["ns: rdt-model-dev<br/>OPA pods + CronJobs"]
        K8sTest["ns: rdt-model-test<br/>OPA pods + CronJobs"]
        K8sProd["ns: rdt-model-prod<br/>OPA pods + CronJobs"]
    end

    subgraph VaultSvc["**HashiCorp Vault**<br/>Auth: OIDC + AppRole | KV v2"]
        VDev["secret/dev/ci/"]
        VTest["secret/test/ci/"]
        VProd["secret/prod/ci/"]
        VCommon["secret/common/caas"]
    end

    subgraph GHA["**GitHub Actions (CI/CD)**"]
        Workflows["validate.yml, deploy.yml, docs.yml"]
        Envs["Environments: dev, test, prod"]
        Runners["Runners: Roche VPN-connected"]
    end

    subgraph Ping["**PingFederate / WAM (Identity)**"]
        OAuth["OAuth 2.0 client_credentials"]
        WAM["Snowflake WAM integration"]
    end

4.5.2 Network Topology

graph TD
    subgraph Internal["**ROCHE INTERNAL NETWORK**"]
        Dev["Developer Workstation<br/>rdt-model-* CLI"]
        GHA["GitHub Actions Runner<br/>rdt-model-* CI (VPN)"]
        subgraph VPN["**Roche VPN / Corporate Network**"]
            RTiS2["RTiS (AWS+VPN)"]
            GUPRI2["GUPRI (AWS+VPN)"]
            MRHub2["MRHub (AWS+VPN)"]
            Vault2["Vault (internal)"]
            CaaS2["CaaS/Rancher"]
            Ping2["PingFederate"]
            SN2["ServiceNow"]
            Art2["Artifactory"]
        end
        Dev --> VPN
        GHA --> VPN
    end

    subgraph Cloud["**CLOUD / EXTERNAL**"]
        SF2["Snowflake (HTTPS)"]
        Bedrock2["AWS Bedrock (Claude)"]
        Mule2["Mulesoft Anypoint"]
        Sol2["Solace (Event Bus)"]
    end

    VPN -->|HTTPS| Cloud

Key network constraints:

System	Network Zone	Access Method
RTiS, GUPRI, MRHub	AWS behind Roche VPN	HTTPS from VPN-connected clients
Vault, CaaS, PingFederate	Roche internal	Direct internal HTTPS
Snowflake	Cloud (public endpoint)	HTTPS with OAuth (WAM/PingFederate)
AWS Bedrock	AWS Cloud	HTTPS with IAM SigV4
Mulesoft, Solace	Cloud/Hybrid	HTTPS with OAuth
GitHub Actions	Cloud runners + VPN	Self-hosted runners on Roche VPN

4.5.3 Environment Strategy

All environments share identical infrastructure. Isolation is achieved through configuration at three levels:

Layer	Dev	Test	Prod
Snowflake schemas	`DEV_BRONZE`, `DEV_SILVER`, `DEV_GOLD`, `DEV_SEMANTIC`	`TEST_BRONZE`, `TEST_SILVER`, `TEST_GOLD`, `TEST_SEMANTIC`	`PROD_BRONZE`, `PROD_SILVER`, `PROD_GOLD`, `PROD_SEMANTIC`
K8s namespace	`rdt-model-dev`	`rdt-model-test`	`rdt-model-prod`
Vault path	`secret/dev/ci/`	`secret/test/ci/`	`secret/prod/ci/`
dbt target	`dev`	`test`	`prod`
GitHub Environment	`dev` (auto-deploy)	`test` (manual approval)	`prod` (reviewer approval)

Config resolution:

base roche-data.toml → [environments.{target}] overrides → env var overrides

CI/CD promotion flow:

graph LR
    Push["Push to main"] --> DEV["Deploy to DEV<br/>(auto)"]
    DEV --> TEST["Deploy to TEST<br/>(manual approval)"]
    TEST --> PROD["Deploy to PROD<br/>(reviewer approval)"]

4.5.4 Container and Image Strategy

OPA policy containers are built and deployed to CaaS Kubernetes:

Component	Registry	Image	Deployment
OPA sidecar	Roche Artifactory	`artifactory.roche.com/rdt-model/opa:{version}`	K8s Deployment
Bundle refresh	Roche Artifactory	`artifactory.roche.com/rdt-model/bundle-refresh:{version}`	K8s CronJob

Images are built in GitHub Actions, pushed to Artifactory, and deployed via generated Kubernetes manifests. Each entity gets its own OPA deployment with entity-specific Rego bundles.

4.5.5 Data Storage Architecture

graph TD
    subgraph SF["**SNOWFLAKE — RDT_MODEL Database**"]
        subgraph Bronze["{ENV}_BRONZE (physical — append-only)"]
            BWT["waste_tracking"]
            BSE["site_energy"]
            BVQ["vendor_quality"]
        end
        subgraph Silver["{ENV}_SILVER (views — G2 validity)"]
            SWT["waste_tracking_silver"]
            SSE["site_energy_silver"]
            SVQ["vendor_quality_silver"]
        end
        subgraph Gold["{ENV}_GOLD (views — G3 business rules)"]
            GWT["waste_tracking_gold"]
            GSE["site_energy_gold"]
            GVQ["vendor_quality_gold"]
        end
        subgraph Semantic["{ENV}_SEMANTIC (views — Cortex Analyst)"]
            SMWT["waste_tracking_semantic"]
            SMSE["site_energy_semantic"]
            SMVQ["vendor_quality_semantic"]
        end
        Audit["AUDIT (cross-env, append-only)<br/>pipeline_audit_log"]
    end

    Bronze --> Silver --> Gold --> Semantic

Key design decisions (from ADR 0004):

Bronze is the only physical write. Silver, Gold, and Semantic are views.
Views eliminate schema migration at Silver/Gold/Semantic layers.
DQ gates run at query time (view predicates), not at write time.
Snowflake result cache and micro-partition pruning handle view performance.

4.6 Non-Functional Requirements

4.6.1 User Profiles

Profile	Description	Scale	Primary Interaction
Data Engineer	Roche domain data engineers who define entities, author rules, and run the pipeline. Power CLI users.	5–15 across all domains (Phase 0–2), scaling to 50+ (Phase 5)	CLI + Git
Data Steward	Governance professionals maintaining metadata in Collibra. Non-technical, use Streamlit UI for ratification.	3–10 per domain	Streamlit UI + Collibra
Platform Admin	Team maintaining the CLI codebase, templates, CI/CD, and infrastructure.	2–5	Rust development + GitHub
Domain Expert	Business analysts refining Gold rules and Semantic definitions via PR review.	10–30 per domain	PR review + YAML authoring
Consumer (Human)	Analysts and scientists querying data products via SQL, SDK, or Cortex Analyst.	100–1000+ per domain	SQL + SDK + NLQ
Consumer (AI Agent)	AI agents accessing data products via MCP tools.	Unbounded	MCP protocol
CI/CD Pipeline	GitHub Actions workflows running the pipeline on every merge.	Concurrent per entity × environment	CLI (`--json` mode)

4.6.2 Performance Requirements

CLI Execution

Operation	Target	Constraint
Full pipeline (`compile run`)	< 5 minutes per entity	Includes all 18 modules, stub mode
Single module (template rendering)	< 10 seconds	Pure Tera rendering, no network
Single module (API call + render)	< 30 seconds	Includes HTTP call + template rendering
Artifact validation (`validate all`)	< 15 seconds per entity	Schema validation of 20+ artifacts
Profile discovery	< 60 seconds per table	Database metadata extraction

Snowflake Query Performance

Query Pattern	Target	Mechanism
Gold view — single entity KPI	< 5 seconds	Snowflake result cache + micro-partition pruning
Semantic view — Cortex Analyst query	< 10 seconds	NLQ → SQL → view chain
Silver view — full entity scan	< 30 seconds	Columnar scan, partition pruning on date
Bronze table — historical backfill query	< 60 seconds	Clustering key on `reporting_date`

CI/CD Pipeline

Stage	Target
PR validation (compile + validate)	< 3 minutes
Full deployment (dev)	< 10 minutes
Promotion (test → prod)	< 5 minutes (after approval)

4.6.3 Capacity Requirements

Entity Scaling

Phase	Entity Count	Domains	Concurrent Pipelines
Phase 0–1 (current)	3–5 entities	1 (Global Sites Network)	1
Phase 2–3	20–50 entities	2–3 domains	5
Phase 4–5	100–500 entities	10+ domains	20

Artifact Storage

Storage	Growth Model	Retention
Git repository (artifacts)	~500 KB per entity (20+ files)	Indefinite (git history)
Snowflake Bronze tables	Append-only, entity-dependent	Time-travel + retention policy (TBD)
OPA bundles (K8s ConfigMaps)	~10 KB per entity	Current version only
Docker images (Artifactory)	~50 MB per OPA image version	Last 5 versions

Snowflake Compute

Environment	Warehouse Size	Auto-suspend	Usage Pattern
Dev	X-Small	60 seconds	Interactive development
Test	Small	120 seconds	CI/CD validation runs
Prod	Medium	300 seconds	Scheduled pipeline + analyst queries

4.6.4 Business Continuity

4.6.4.1 Availability

Component	Availability Target	Mechanism
Snowflake (query)	99.9% (platform SLA)	Snowflake managed HA, multi-AZ
CaaS/K8s (OPA)	99.5%	Replica count ≥ 2 for prod, health checks
GitHub Actions (CI/CD)	99.9% (platform SLA)	GitHub managed
Vault (secrets)	99.9%	Vault HA cluster (Roche managed)
Pipeline execution	Best-effort	Retry on transient failures; stub fallback

4.6.4.2 Disaster Recovery

Component	RPO	RTO	Strategy
Source code + artifacts	0 (git)	Minutes	Git clone from GitHub (distributed)
Snowflake data	Per Snowflake Time Travel (up to 90 days)	Hours	Snowflake native DR (failover)
OPA policies	0 (git)	Minutes	Re-deploy from git (K8s manifests)
Vault secrets	Per Vault snapshot schedule	Hours	Vault snapshot restore
Pipeline state	N/A (stateless)	Immediate	Re-run pipeline (idempotent)

Git as artifact store provides inherent DR. All generated artifacts are committed to git. The repository is the source of truth. Any lost deployment can be recreated by re-running the pipeline against the committed model.

4.6.4.3 Security

Concern	Control
Authentication	OAuth 2.0 via PingFederate (all systems). AWS IAM for Bedrock.
Authorization	Snowflake RBAC (role per environment). K8s RBAC (namespace scoped). Vault policies (path scoped).
Secrets management	HashiCorp Vault (OIDC + AppRole). No secrets in code or CI variables.
Data classification	Collibra-sourced PII flags → column-level masking in Snowflake.
Audit	Append-only audit table in Snowflake. Git history for all artifact changes.
Network	Roche VPN for internal systems. HTTPS for all external calls. No plain HTTP.
Supply chain	Cargo.lock pinned. GitHub Dependabot for CVE alerts.

4.6.4.4 Maintainability

Aspect	Approach
Observability	Structured tracing (stderr) with level control (`--verbose`, `--quiet`, `RUST_LOG`). Machine-readable result envelopes (`--json`) for aggregation.
Debugging	`--dry-run` mode for safe testing. `--verbose` for full trace output. Workspace retention (`--keep-workspace`) for post-mortem inspection.
Code quality	Cargo clippy (deny warnings). `cargo test --workspace` in CI. Integration test feature flag (`integration`).
Documentation	Auto-generated from artifacts by `rdt-model-docs`. Cannot drift from implementation. ADRs for architectural decisions.
Dependency management	Cargo workspace with shared dependency versions. Dependabot alerts. Minimal external dependencies for pure-rendering modules.
Template evolution	Template changes propagate to all entities on next pipeline run. No per-entity customisation — consistency enforced by design.

Appendix A: Technology Stack

Category	Technology	Version	Purpose
Language	Rust	Edition 2021	CLI implementation (18 binaries + 1 library)
Build	Cargo	Workspace	Multi-crate build, dependency management
CLI framework	clap	4.x	Command-line argument parsing, subcommands
Template engine	Tera	1.x	Embedded template rendering (SQL, YAML, Rego, K8s manifests)
JSON parsing	simd-json	0.14	SIMD-accelerated JSON parsing (via JsonHandler)
JSON Schema	jsonschema	0.18	Artifact validation (Draft 2020-12)
Serialization	serde + serde_json + serde_yaml	1.x	JSON/YAML serialization/deserialization
HTTP client	reqwest	0.12	External system API calls
Async runtime	tokio	1.x	Async I/O for network-bound modules
Async traits	async-trait	0.1	Async trait definitions for client traits
Tracing	tracing + tracing-subscriber	0.1	Structured logging and diagnostics
Date/time	chrono	0.4	Timestamps, date handling
UUID	uuid	1.x	UUIDv7 run correlation IDs
Compression	flate2	1.x	Optional gzip for profile output
Regex	regex	1.x	SQL identifier validation, pattern matching
Env files	dotenvy	0.15	`.env` file loading
Data platform	Snowflake	—	Medallion architecture (Bronze/Silver/Gold/Semantic)
Transformation	dbt	Core	View generation, batch DQ tests
Policy engine	Open Policy Agent	0.x	Real-time DQ enforcement, access control
Policy language	Rego	—	Policy definitions compiled from YAML DSL
Container platform	Kubernetes (Rancher/CaaS)	1.x	OPA deployment, bundle refresh jobs
Container registry	Artifactory	—	Docker image storage for OPA containers
Secret management	HashiCorp Vault	—	OIDC + AppRole auth, KV v2 secrets
Identity provider	PingFederate	—	OAuth 2.0 (client_credentials) for all systems
CI/CD	GitHub Actions	—	Validate, deploy, docs workflows
Documentation	Starlight (Astro)	—	Generated reference documentation site
LLM	Claude on AWS Bedrock	—	Metadata enrichment (term mapping, descriptions)
UI framework	Streamlit in Snowflake	—	CRUD and ratification web applications
Data contract	datacontract.com	1.1.0	Machine-readable schema + SLA + quality spec
Event bus	Solace	—	Enterprise event publishing
Search	Sinequa	—	Enterprise search indexing
API gateway	Mulesoft (Anypoint)	—	Managed API publication
AI query	Snowflake Cortex Analyst	—	Natural language query over Semantic views

Appendix B: Vault Path Mapping

Common Paths (shared across environments)

Path	Contents	Used by
`secret/common/caas`	Rancher token, cluster URL	`rdt-model-policy` (K8s deployment)
`secret/common/artifactory`	Docker registry credentials	CI/CD (image push)
`secret/common/github`	GitHub App credentials	CI/CD workflows

Per-Environment Paths

Path Pattern	Contents	Used by
`secret/{env}/ci/snowflake`	Snowflake OAuth client_id/secret, account, warehouse, role	`rdt-model-store`, all Snowflake ops
`secret/{env}/ci/collibra`	Collibra API client_id/secret, bridge key	`rdt-model-govern`, `rdt-model-register`
`secret/{env}/ci/rtis`	RTiS API credentials (Basic Auth or OAuth)	`rdt-model-pull`
`secret/{env}/ci/gupri`	GUPRI API credentials	`rdt-model-gupri`
`secret/{env}/ci/mulesoft`	Anypoint Platform credentials	`rdt-model-api`
`secret/{env}/ci/solace`	Solace connection credentials	`rdt-model-event`
`secret/{env}/ci/servicenow`	ServiceNow API credentials	`rdt-model-cidb`
`secret/{env}/ci/sinequa`	Sinequa API credentials	`rdt-model-search`
`secret/{env}/ci/bedrock`	AWS IAM credentials for Bedrock	`rdt-model-infer`
`secret/{env}/ci/postgres`	Aurora PostgreSQL credentials	`rdt-model-profile`
`secret/{env}/ci/mrhub`	MRHub API credentials	`rdt-model-policy`

Where {env} is one of dev, test, prod.

Appendix C: Access Task Status Matrix

Access tasks track the provisioning of credentials and network paths to external systems. Each task is a GitHub Issue.

ID	System	Description	Issue	Status
A01	RTiS	REST API credentials + network path	#15	Pending
A02	GUPRI	REST API credentials + network path	#16	Pending
A03	MRHub	REST API credentials for G2 lookups	#24	Not started
A04	MRHub / Solace	Solace event subscription + publish credentials	#24	Not started
A05	Snowflake	Service account, database, schema provisioning	#23	Partial (auth live)
A06	Snowflake	Cortex Analyst feature enablement	#23	Pending
A07	Collibra	REST API credentials for governance metadata pull	#25	Pending
A08	Collibra	REST API credentials for lineage push	#25	Pending
A09	Mulesoft	Anypoint Platform API credentials	#26	Pending
A10	GitHub Actions	Workflow configuration + runner access	—	Done
A11	GitHub Actions	Runner VPN access for internal systems	—	Done
A12	ServiceNow	Table API credentials for CIDM	#27	Pending
A13	CaaS/K8s	Rancher access + namespace provisioning	#28	Active
A14	LeanIX	EA catalog API credentials (stretch)	#29	Not started
A15	Data Marketplace	Registry API credentials (stretch)	#30	Not started
A16	Vault	OIDC + AppRole configuration for CI	#70	Done
A17	Sinequa	Search API credentials + push mechanism	#80	Pending
A18	Aurora PostgreSQL	Database connection credentials for profiling	TBD	Not started
A19	Snowflake WAM	OAuth token exchange configuration	TBD	Done

Appendix D: ADR Cross-Reference

ADR	Title	Status	Sections Referenced
0001	Project Vision	Accepted	§1, §3, §4.1, §4.2
0002	Rust as CLI Implementation Language	Accepted	§4.1.2, Appendix A
0003	Monorepo Structure	Accepted	§4.1.2, §4.4
0004	Virtual Medallion Architecture	Accepted	§4.3.2, §4.5.5
0005	Rule Engine — MODEL DSL to OPA on K8s	Accepted	§4.3.2, §4.4.8
0005b	OPA as MODEL Unified Policy Engine	Accepted	§4.3.2, §4.4.8
0006	Multi-Binary Cargo Workspace	Superseded by 0011	—
0007	Data Product Lifecycle	Proposed	§4.2.1, §4.2.2
0008	CLI Module Development Standards	Proposed	§4.4, §4.4.21
0009	Module I/O Contracts	Accepted	§4.4, Pipeline Overview
0010	Environment Strategy	Proposed	§4.5.3
0011	Pipeline Restructure (19-module / 6-phase)	Accepted	§4.2.2, §4.4

Appendix E: Module Implementation Status

Module	Phase	Async	Templates	Schemas	Implementation	Client Trait
`rdt-model-pull`	1	Yes	0	1	Stub (HTTP client ready)	`RTisClient`
`rdt-model-profile`	1	Yes	0	2	Stub	`DatabaseProbe`
`rdt-model-govern`	2	Yes	0	0	Stub (HTTP client ready)	`CollibraClient`
`rdt-model-infer`	2	Yes	0	1	Planned	`LlmClient`
`rdt-model-compile`	3	No	0	0	Planned (orchestrator)	None
`rdt-model-validate`	3	No	0	0	Planned	None
`rdt-model-store`	4	No	8	0	Stub (templates ready)	`SnowflakeClient`
`rdt-model-policy`	4	Mixed	7	3	Stub (templates ready)	(pending)
`rdt-model-api`	4	Yes	0	0	Planned	`MulesoftClient`
`rdt-model-mcp`	4	Yes	0	0	Planned	`McpRegistryClient`
`rdt-model-sdk`	4	No	0	0	Planned	None
`rdt-model-contract`	4	No	1	1	Production	None
`rdt-model-register`	5	Yes	0	0	Planned	`CollibraClient`, `HorizonClient`
`rdt-model-gupri`	5	Yes	0	1	Production	`GupriClient`
`rdt-model-search`	5	Yes	0	0	Planned	`SinequaClient`
`rdt-model-docs`	6	No	0	0	Planned	None
`rdt-model-cidb`	6	Yes	0	0	Planned	`ServicenowClient`
`rdt-model-event`	6	Yes	0	1	Production	`SolaceClient`

Production = fully executable with fixtures (not stubbed logic). Stub = framework with templates/schemas present; execution delegates to stub clients. Planned = CLI skeleton defined; execution not yet implemented.

Appendix F: Diagram Index

Diagram	Location	Used In
Platform Flow (6 phases, ASCII)	Inline in §4.2.2	§4.2, §4.4
CLI Module Architecture (SVG)	`docs/src/assets/diagrams/model-cli.svg`	§4.4
Medallion Architecture (SVG)	`docs/src/assets/diagrams/model-medallion.svg`	§4.3, §4.5
System Context (ASCII)	Inline in §4.3.3	§4.3
Physical Infrastructure (ASCII)	Inline in §4.5.1	§4.5
Network Topology (ASCII)	Inline in §4.5.2	§4.5
Data Storage Layout (ASCII)	Inline in §4.5.5	§4.5
DQ Gate Flow (ASCII)	Inline in §4.3.2	§4.3
Data Product Lifecycle (ASCII)	Inline in §4.2.1	§4.2
Business Process Flow (ASCII)	Inline in §4.2.4	§4.2
Conceptual Data Model (ASCII)	Inline in §4.3.1	§4.3

Solution Architecture

Authorship

Change History

1. Purpose

1.1 Scope

1.2 Assumptions

1.3 Constraints

1.4 Related Documents

2. Definitions

3. Current State (AS-IS)

4. Proposed Architecture (TO-BE)

4.1 Solution Overview

4.1.1 Evolutionary Architecture

4.1.2 Alternatives Rejected

4.2 Business Architecture

4.2.1 Data Product Lifecycle

4.2.2 Pipeline Phases

4.2.3 Actor Roles

4.2.4 Business Process Flow

4.3 Data Architecture

4.3.1 Conceptual Data Model

4.3.2 Data Governance Architecture

4.3.3 System Context

4.3.4 Interface Summary

4.3.5 Data Migration

4.4 Logical Architecture

4.4.1 rdt-model-pull

Input/Output

External System Integration

Technology

Subcommands

Current Status

4.4.2 rdt-model-profile

Input/Output

External System Integration

Technology

Subcommands

Current Status

4.4.3 rdt-model-govern

Input/Output

External System Integration

Technology

Subcommands

Current Status

4.4.4 rdt-model-infer

Input/Output

External System Integration

Technology

Subcommands

Current Status

4.4.5 rdt-model-compile

Input/Output

External System Integration

Technology

Subcommands

Current Status

4.4.6 rdt-model-validate

Input/Output

External System Integration

Technology

Subcommands

Current Status

4.4.7 rdt-model-store

Input/Output

External System Integration

Technology

Subcommands

Current Status

4.4.8 rdt-model-policy

Input/Output

External System Integration

Technology

Subcommands

Current Status

4.4.9 rdt-model-api

Input/Output

External System Integration

Technology

Subcommands

Current Status