Skip to content

Governance, Auditing & Compliance

Traditional data platforms bolt governance on as an afterthought — adding audit logging months after launch, retrofitting access controls, manually maintaining compliance documentation. roche-data takes a fundamentally different approach: governance is generated, not configured.

Every artifact the compiler produces carries governance properties from the moment it’s created:

  • Data contracts define ownership, quality SLAs, and access classification
  • dbt models embed quality gate predicates directly into view definitions
  • CEL-based DQ rules enforce validation at batch (dbt) and interactive (Snowflake UDF) layers
  • Git history provides an immutable audit trail for every change

Every artifact in the roche-data platform is version-controlled in git. This means:

What’s trackedHow it’s tracked
Model changesEvery RTiS pull is committed as a versioned model.json snapshot
Generated artifactsData contracts, DDL, dbt models, APIs — all committed and diffable
Policy definitionsYAML policy files reviewed in pull requests before deployment
Configuration changesroche-data.toml changes tracked with full history
Deployment decisionsEvery merge to main triggers deployment — the merge commit is the deployment record

Why this matters: When an auditor asks “Who changed this data contract, when, and why?” — the answer is a git log command. No separate audit database to maintain. No log rotation to manage. The audit trail is the same system developers already use.

Pull request reviews add a human approval layer. Every change to a policy, contract, or quality rule requires peer review before it reaches production. The PR discussion is preserved as context for future audits.

Snowflake provides enterprise-grade access auditing out of the box:

Audit capabilityWhat it captures
Query historyEvery SQL query executed, by whom, when, and how long it took
Access historyWhich users and roles accessed which tables and columns
Login historyAuthentication events, failed attempts, session management
Data sharingCross-account data access and secure view usage
Copy historyData loading events from external stages

roche-data enhances this with the virtual medallion architecture (Bronze/Silver/Gold views). Because Silver and Gold are views — not physical tables — the query history captures exactly which quality-filtered data each user actually consumed. An auditor can reconstruct:

  1. What raw data arrived (Bronze table inserts)
  2. What quality filters were applied (Silver/Gold view definitions in git)
  3. Who queried which filtered view (Snowflake query history)
  4. What results they received (query + view definition = deterministic output)

This creates an end-to-end lineage from raw data landing to business consumption — without any additional tooling.

3. CEL-Based Data Quality Rules (ADR 0013)

Section titled “3. CEL-Based Data Quality Rules (ADR 0013)”

The shared DQ rule library provides declarative, portable validation — rules written once in CEL (Common Expression Language), compiled to multiple execution targets:

Execution targetScaleUse case
dbt tests (CEL → SQL)Millions of rows per batchBatch enforcement in dbt Cloud
Snowflake UDFs (Python/SQL)Single valuesInteractive validation from Steward UI
CLI local (cel-interpreter)Development/CITesting rules before deployment
TierDependenciesExample
Tier 1 — AtomicNone (self-contained)“Email matches @roche.com pattern”
Tier 2 — LookupSnowflake reference tables”Vendor code exists in MRHub and is ACTIVE”
Tier 3 — BusinessComposes multiple checks”Batch weight within 5% of recipe-based expected value”
  1. Domain experts define rules as CEL expressions in YAML files (rules/library/)
  2. Entity stewards bind rules to properties in models/{entity}/rules.yaml
  3. The compiler (rdt-model-policy compile) transpiles CEL to SQL (dbt tests) and generates Snowflake UDFs
  4. dbt Cloud runs batch tests at scale; Snowflake UDFs enable per-record validation
  5. Git tracks all rule definitions and bindings — every change goes through PR review

Key principle: Rules are code. They live in git, go through pull request review, are tested locally via eval-rule/test-rule, and are deployed through the same pipeline as every other artifact. The same CEL expression powers both batch and interactive enforcement.


Data quality is not a separate process — it’s built into the generated views:

What: Technical validation — format, types, schema compliance

How: SQL predicates embedded in the Bronze materialization check that incoming records match the data contract schema. Records that fail are rejected at the source with detailed error logging.

Audit trail: Failed records are logged with rejection reason. Source systems receive error notifications.

What: Business identity validation — MRHub lookups, referential integrity

How: The Silver view includes predicates that filter records against master data. A waste record referencing a non-existent site ID is included in Bronze (the raw data is preserved) but excluded from Silver (it doesn’t appear in validated views).

Audit trail: Filtered records remain visible in Bronze for investigation. The view definition (in git) documents exactly what filter was applied.

What: Domain-specific checks — range validation, cross-field consistency, freshness SLAs

How: The Gold view adds business rule predicates compiled from rules.yaml. A measurement outside the physically possible range, or a record older than the configured freshness SLA, is excluded from the Gold view.

Audit trail: Rule definitions in git. Excluded records visible in Silver. Metric Stewards alerted on exclusions.

What: Statistical trend analysis and AI guardrails

How: Incoming data is compared against 30-day rolling averages. Significant deviations trigger review. Data that passes receives Certified status — the highest trust level.

Audit trail: Trend analysis results logged. Anomaly flags visible in dashboards. Only Certified data surfaced to Cortex Analyst and external APIs.


RequirementHow roche-data addresses it
Data lineageGit history traces every artifact from RTiS model to deployed view
Access controlSnowflake RBAC enforces row/column-level access; Snowflake logs all queries
Change managementServiceNow change requests created automatically before production deployment
Data qualityFour-gate architecture with embedded quality predicates; no manual QA steps
Policy enforcementCEL rules compiled to dbt tests (batch) and Snowflake UDFs (interactive) — see ADR 0013
Audit trailGit (changes) + Snowflake (access) + dbt test results (quality decisions) = complete picture
ReproducibilityGenerated artifacts are deterministic — same input always produces same output
Separation of dutiesModel owners define rules; the compiler enforces them; PR reviews gate deployment

If you’re reviewing this platform for compliance, here’s where to find what you need:

QuestionWhere to look
”What changed and when?”git log on the repository — every artifact change is a commit
”Who approved this change?”Pull request history on GitHub — every merge requires review
”What quality rules apply?”models/{entity}/rules.yaml (bindings) and rules/library/ (rule definitions) in git
”Who accessed this data?”Snowflake QUERY_HISTORY and ACCESS_HISTORY views
”What policies are enforced?”dbt/tests/{entity}/dq_all_rules.sql (compiled tests) + snowflake/udf/dq_*.sql (UDFs) in git
”How was this deployed?”deploy.yml GitHub Actions workflow + ServiceNow change request