Governance, Auditing & Compliance

Governance by Design

Traditional data platforms bolt governance on as an afterthought — adding audit logging months after launch, retrofitting access controls, manually maintaining compliance documentation. roche-data takes a fundamentally different approach: governance is generated, not configured.

Every artifact the compiler produces carries governance properties from the moment it’s created:

Data contracts define ownership, quality SLAs, and access classification
dbt models embed quality gate predicates directly into view definitions
CEL-based DQ rules enforce validation at batch (dbt) and interactive (Snowflake UDF) layers
Git history provides an immutable audit trail for every change

Three Pillars of Auditability

1. Git — The Immutable Change Ledger

Every artifact in the roche-data platform is version-controlled in git. This means:

What’s tracked	How it’s tracked
Model changes	Every RTiS pull is committed as a versioned `model.json` snapshot
Generated artifacts	Data contracts, DDL, dbt models, APIs — all committed and diffable
Policy definitions	YAML policy files reviewed in pull requests before deployment
Configuration changes	`roche-data.toml` changes tracked with full history
Deployment decisions	Every merge to `main` triggers deployment — the merge commit is the deployment record

Why this matters: When an auditor asks “Who changed this data contract, when, and why?” — the answer is a git log command. No separate audit database to maintain. No log rotation to manage. The audit trail is the same system developers already use.

Pull request reviews add a human approval layer. Every change to a policy, contract, or quality rule requires peer review before it reaches production. The PR discussion is preserved as context for future audits.

2. Snowflake — Access & Query Auditing

Snowflake provides enterprise-grade access auditing out of the box:

Audit capability	What it captures
Query history	Every SQL query executed, by whom, when, and how long it took
Access history	Which users and roles accessed which tables and columns
Login history	Authentication events, failed attempts, session management
Data sharing	Cross-account data access and secure view usage
Copy history	Data loading events from external stages

roche-data enhances this with the virtual medallion architecture (Bronze/Silver/Gold views). Because Silver and Gold are views — not physical tables — the query history captures exactly which quality-filtered data each user actually consumed. An auditor can reconstruct:

What raw data arrived (Bronze table inserts)
What quality filters were applied (Silver/Gold view definitions in git)
Who queried which filtered view (Snowflake query history)
What results they received (query + view definition = deterministic output)

This creates an end-to-end lineage from raw data landing to business consumption — without any additional tooling.

3. CEL-Based Data Quality Rules (ADR 0013)

The shared DQ rule library provides declarative, portable validation — rules written once in CEL (Common Expression Language), compiled to multiple execution targets:

Execution target	Scale	Use case
dbt tests (CEL → SQL)	Millions of rows per batch	Batch enforcement in dbt Cloud
Snowflake UDFs (Python/SQL)	Single values	Interactive validation from Steward UI
CLI local (cel-interpreter)	Development/CI	Testing rules before deployment

Three Rule Tiers

Tier	Dependencies	Example
Tier 1 — Atomic	None (self-contained)	“Email matches `@roche.com` pattern”
Tier 2 — Lookup	Snowflake reference tables	”Vendor code exists in MRHub and is ACTIVE”
Tier 3 — Business	Composes multiple checks	”Batch weight within 5% of recipe-based expected value”

How It Works

Domain experts define rules as CEL expressions in YAML files (rules/library/)
Entity stewards bind rules to properties in models/{entity}/rules.yaml
The compiler (rdt-model-policy compile) transpiles CEL to SQL (dbt tests) and generates Snowflake UDFs
dbt Cloud runs batch tests at scale; Snowflake UDFs enable per-record validation
Git tracks all rule definitions and bindings — every change goes through PR review

Key principle: Rules are code. They live in git, go through pull request review, are tested locally via eval-rule/test-rule, and are deployed through the same pipeline as every other artifact. The same CEL expression powers both batch and interactive enforcement.

Four-Gate Data Quality

Data quality is not a separate process — it’s built into the generated views:

Gate 1 — Completeness (Bronze)

What: Technical validation — format, types, schema compliance

How: SQL predicates embedded in the Bronze materialization check that incoming records match the data contract schema. Records that fail are rejected at the source with detailed error logging.

Audit trail: Failed records are logged with rejection reason. Source systems receive error notifications.

Gate 2 — Validity (Silver View)

What: Business identity validation — MRHub lookups, referential integrity

How: The Silver view includes predicates that filter records against master data. A waste record referencing a non-existent site ID is included in Bronze (the raw data is preserved) but excluded from Silver (it doesn’t appear in validated views).

Audit trail: Filtered records remain visible in Bronze for investigation. The view definition (in git) documents exactly what filter was applied.

Gate 3 — Business Rules (Gold View)

What: Domain-specific checks — range validation, cross-field consistency, freshness SLAs

How: The Gold view adds business rule predicates compiled from rules.yaml. A measurement outside the physically possible range, or a record older than the configured freshness SLA, is excluded from the Gold view.

Audit trail: Rule definitions in git. Excluded records visible in Silver. Metric Stewards alerted on exclusions.

Gate 4 — Consistency (Certified)

What: Statistical trend analysis and AI guardrails

How: Incoming data is compared against 30-day rolling averages. Significant deviations trigger review. Data that passes receives Certified status — the highest trust level.

Audit trail: Trend analysis results logged. Anomaly flags visible in dashboards. Only Certified data surfaced to Cortex Analyst and external APIs.

Compliance Benefits

Requirement	How roche-data addresses it
Data lineage	Git history traces every artifact from RTiS model to deployed view
Access control	Snowflake RBAC enforces row/column-level access; Snowflake logs all queries
Change management	ServiceNow change requests created automatically before production deployment
Data quality	Four-gate architecture with embedded quality predicates; no manual QA steps
Policy enforcement	CEL rules compiled to dbt tests (batch) and Snowflake UDFs (interactive) — see ADR 0013
Audit trail	Git (changes) + Snowflake (access) + dbt test results (quality decisions) = complete picture
Reproducibility	Generated artifacts are deterministic — same input always produces same output
Separation of duties	Model owners define rules; the compiler enforces them; PR reviews gate deployment

For Auditors

If you’re reviewing this platform for compliance, here’s where to find what you need:

Question	Where to look
”What changed and when?”	`git log` on the repository — every artifact change is a commit
”Who approved this change?”	Pull request history on GitHub — every merge requires review
”What quality rules apply?”	`models/{entity}/rules.yaml` (bindings) and `rules/library/` (rule definitions) in git
”Who accessed this data?”	Snowflake `QUERY_HISTORY` and `ACCESS_HISTORY` views
”What policies are enforced?”	`dbt/tests/{entity}/dq_all_rules.sql` (compiled tests) + `snowflake/udf/dq_*.sql` (UDFs) in git
”How was this deployed?”	`deploy.yml` GitHub Actions workflow + ServiceNow change request

Quality Assurance Approach — End-to-end QA process, testing, and failure remediation
DQ Rule Library Reference — Rule format, CEL syntax, and transpilation details
rdt-model-policy Module — CLI commands for rule compilation and testing