Governance, Auditing & Compliance
Governance by Design
Section titled “Governance by Design”Traditional data platforms bolt governance on as an afterthought — adding audit logging months after launch, retrofitting access controls, manually maintaining compliance documentation. roche-data takes a fundamentally different approach: governance is generated, not configured.
Every artifact the compiler produces carries governance properties from the moment it’s created:
- Data contracts define ownership, quality SLAs, and access classification
- dbt models embed quality gate predicates directly into view definitions
- CEL-based DQ rules enforce validation at batch (dbt) and interactive (Snowflake UDF) layers
- Git history provides an immutable audit trail for every change
Three Pillars of Auditability
Section titled “Three Pillars of Auditability”1. Git — The Immutable Change Ledger
Section titled “1. Git — The Immutable Change Ledger”Every artifact in the roche-data platform is version-controlled in git. This means:
| What’s tracked | How it’s tracked |
|---|---|
| Model changes | Every RTiS pull is committed as a versioned model.json snapshot |
| Generated artifacts | Data contracts, DDL, dbt models, APIs — all committed and diffable |
| Policy definitions | YAML policy files reviewed in pull requests before deployment |
| Configuration changes | roche-data.toml changes tracked with full history |
| Deployment decisions | Every merge to main triggers deployment — the merge commit is the deployment record |
Why this matters: When an auditor asks “Who changed this data contract, when, and why?” — the answer is a git log command. No separate audit database to maintain. No log rotation to manage. The audit trail is the same system developers already use.
Pull request reviews add a human approval layer. Every change to a policy, contract, or quality rule requires peer review before it reaches production. The PR discussion is preserved as context for future audits.
2. Snowflake — Access & Query Auditing
Section titled “2. Snowflake — Access & Query Auditing”Snowflake provides enterprise-grade access auditing out of the box:
| Audit capability | What it captures |
|---|---|
| Query history | Every SQL query executed, by whom, when, and how long it took |
| Access history | Which users and roles accessed which tables and columns |
| Login history | Authentication events, failed attempts, session management |
| Data sharing | Cross-account data access and secure view usage |
| Copy history | Data loading events from external stages |
roche-data enhances this with the virtual medallion architecture (Bronze/Silver/Gold views). Because Silver and Gold are views — not physical tables — the query history captures exactly which quality-filtered data each user actually consumed. An auditor can reconstruct:
- What raw data arrived (Bronze table inserts)
- What quality filters were applied (Silver/Gold view definitions in git)
- Who queried which filtered view (Snowflake query history)
- What results they received (query + view definition = deterministic output)
This creates an end-to-end lineage from raw data landing to business consumption — without any additional tooling.
3. CEL-Based Data Quality Rules (ADR 0013)
Section titled “3. CEL-Based Data Quality Rules (ADR 0013)”The shared DQ rule library provides declarative, portable validation — rules written once in CEL (Common Expression Language), compiled to multiple execution targets:
| Execution target | Scale | Use case |
|---|---|---|
| dbt tests (CEL → SQL) | Millions of rows per batch | Batch enforcement in dbt Cloud |
| Snowflake UDFs (Python/SQL) | Single values | Interactive validation from Steward UI |
| CLI local (cel-interpreter) | Development/CI | Testing rules before deployment |
Three Rule Tiers
Section titled “Three Rule Tiers”| Tier | Dependencies | Example |
|---|---|---|
| Tier 1 — Atomic | None (self-contained) | “Email matches @roche.com pattern” |
| Tier 2 — Lookup | Snowflake reference tables | ”Vendor code exists in MRHub and is ACTIVE” |
| Tier 3 — Business | Composes multiple checks | ”Batch weight within 5% of recipe-based expected value” |
How It Works
Section titled “How It Works”- Domain experts define rules as CEL expressions in YAML files (
rules/library/) - Entity stewards bind rules to properties in
models/{entity}/rules.yaml - The compiler (
rdt-model-policy compile) transpiles CEL to SQL (dbt tests) and generates Snowflake UDFs - dbt Cloud runs batch tests at scale; Snowflake UDFs enable per-record validation
- Git tracks all rule definitions and bindings — every change goes through PR review
Key principle: Rules are code. They live in git, go through pull request review, are tested locally via
eval-rule/test-rule, and are deployed through the same pipeline as every other artifact. The same CEL expression powers both batch and interactive enforcement.
Four-Gate Data Quality
Section titled “Four-Gate Data Quality”Data quality is not a separate process — it’s built into the generated views:
Gate 1 — Completeness (Bronze)
Section titled “Gate 1 — Completeness (Bronze)”What: Technical validation — format, types, schema compliance
How: SQL predicates embedded in the Bronze materialization check that incoming records match the data contract schema. Records that fail are rejected at the source with detailed error logging.
Audit trail: Failed records are logged with rejection reason. Source systems receive error notifications.
Gate 2 — Validity (Silver View)
Section titled “Gate 2 — Validity (Silver View)”What: Business identity validation — MRHub lookups, referential integrity
How: The Silver view includes predicates that filter records against master data. A waste record referencing a non-existent site ID is included in Bronze (the raw data is preserved) but excluded from Silver (it doesn’t appear in validated views).
Audit trail: Filtered records remain visible in Bronze for investigation. The view definition (in git) documents exactly what filter was applied.
Gate 3 — Business Rules (Gold View)
Section titled “Gate 3 — Business Rules (Gold View)”What: Domain-specific checks — range validation, cross-field consistency, freshness SLAs
How: The Gold view adds business rule predicates compiled from rules.yaml. A measurement outside the physically possible range, or a record older than the configured freshness SLA, is excluded from the Gold view.
Audit trail: Rule definitions in git. Excluded records visible in Silver. Metric Stewards alerted on exclusions.
Gate 4 — Consistency (Certified)
Section titled “Gate 4 — Consistency (Certified)”What: Statistical trend analysis and AI guardrails
How: Incoming data is compared against 30-day rolling averages. Significant deviations trigger review. Data that passes receives Certified status — the highest trust level.
Audit trail: Trend analysis results logged. Anomaly flags visible in dashboards. Only Certified data surfaced to Cortex Analyst and external APIs.
Compliance Benefits
Section titled “Compliance Benefits”| Requirement | How roche-data addresses it |
|---|---|
| Data lineage | Git history traces every artifact from RTiS model to deployed view |
| Access control | Snowflake RBAC enforces row/column-level access; Snowflake logs all queries |
| Change management | ServiceNow change requests created automatically before production deployment |
| Data quality | Four-gate architecture with embedded quality predicates; no manual QA steps |
| Policy enforcement | CEL rules compiled to dbt tests (batch) and Snowflake UDFs (interactive) — see ADR 0013 |
| Audit trail | Git (changes) + Snowflake (access) + dbt test results (quality decisions) = complete picture |
| Reproducibility | Generated artifacts are deterministic — same input always produces same output |
| Separation of duties | Model owners define rules; the compiler enforces them; PR reviews gate deployment |
For Auditors
Section titled “For Auditors”If you’re reviewing this platform for compliance, here’s where to find what you need:
| Question | Where to look |
|---|---|
| ”What changed and when?” | git log on the repository — every artifact change is a commit |
| ”Who approved this change?” | Pull request history on GitHub — every merge requires review |
| ”What quality rules apply?” | models/{entity}/rules.yaml (bindings) and rules/library/ (rule definitions) in git |
| ”Who accessed this data?” | Snowflake QUERY_HISTORY and ACCESS_HISTORY views |
| ”What policies are enforced?” | dbt/tests/{entity}/dq_all_rules.sql (compiled tests) + snowflake/udf/dq_*.sql (UDFs) in git |
| ”How was this deployed?” | deploy.yml GitHub Actions workflow + ServiceNow change request |
Related
Section titled “Related”- Quality Assurance Approach — End-to-end QA process, testing, and failure remediation
- DQ Rule Library Reference — Rule format, CEL syntax, and transpilation details
- rdt-model-policy Module — CLI commands for rule compilation and testing