DQ Rule Library

The DQ Rule Library is a shared repository of validation rules defined using CEL (Common Expression Language). Rules are defined once and bound to any entity’s properties, then compiled to multiple execution targets.

Architecture

Plane	System	What it holds
Definition	Git (`rules/library/`)	Rule logic (CEL), metadata, version
Assignment	Git (`models/{entity}/rules.yaml`)	Which rules apply to which properties
Execution	dbt Cloud / Snowflake UDF	Running the checks

Rule Tiers

Tier	Name	Characteristics	Execution
1	Atomic	No external dependencies	All targets (local, dbt, UDF)
2	Lookup	Requires Snowflake reference data	dbt + UDF only
3	Business	Complex multi-check rules	dbt + UDF only

Writing a Rule

Rules live in rules/library/{tier}/{domain}/{rule-id}.yaml:

id: valid-roche-email
version: "1.0.0"
tier: 1
domain: format
semantic_types: [email, roche_email]
tags: [baseline, roche-100, pii]
description: Validates Roche corporate email format
cel: |
  value.email.matches("^[a-zA-Z0-9._%+-]+@roche\\.com$")
message: "Must be a valid @roche.com email address"

CEL Expressions

value is a struct representing the full record. Access fields as value.field_name.

Supported operations:

CEL	SQL (transpiled)
`value.col == "x"`	`col = 'x'`
`value.col > 0`	`col > 0`
`value.a > value.b`	`a > b`
`value.col.matches("pat")`	`REGEXP_LIKE(col, 'pat')`
`value.col.contains("x")`	`CONTAINS(col, 'x')`
`value.col.startsWith("x")`	`STARTSWITH(col, 'x')`
`value.col.endsWith("x")`	`ENDSWITH(col, 'x')`
`value.col.size()`	`LENGTH(col)`
`value.col in ["a","b"]`	`col IN ('a', 'b')`
`expr1 && expr2`	`(expr1) AND (expr2)`
`expr1 \|\| expr2`	`(expr1) OR (expr2)`
`!expr`	`NOT (expr)`

Tier 2 Rules (Lookup)

Add a refs: section declaring external data sources:

id: vendor-exists-active
tier: 2
refs:
  mrhub_vendors:
    source: snowflake
    table: GOLD_MRHUB_VENDORS
    key_column: CODE
    refresh: PT1H
cel: |
  value.vendor_code in ref.mrhub_vendors.codes
    && ref.mrhub_vendors[value.vendor_code].status == "ACTIVE"

CEL → SQL for lookups:

CEL	SQL
`value.col in ref.X.codes`	`col IN (SELECT key FROM table)`
`ref.X[value.col].field == "y"`	`(SELECT field FROM table WHERE key = col) = 'y'`

Binding Rules to Entities

models/{entity}/rules.yaml maps properties to library rules:

rule_group: EntityDQ
entity_ref: "rtis://roche/entities/my-entity"
version: "2.0.0"

bindings:
  # Single-field binding
  - property: email_address
    rules: [valid-roche-email, not-empty]

  # Cross-field binding
  - properties: [end_date, start_date]
    rules: [end-after-start]

composites:
  - id: record-complete
    operator: AND
    requires: [valid-roche-email, not-empty]
    message: "Record failed validation"

Single-field: The rule’s generic CEL field name is remapped to the bound property. Cross-field: The CEL expression uses the actual field names as written.

Compilation

# Compile rules for an entity
rdt-model-policy --target dev --entity my-entity compile

# Output:
#   dbt/tests/my-entity/dq_valid_roche_email.sql
#   dbt/tests/my-entity/dq_not_empty.sql
#   dbt/tests/my-entity/dq_end_after_start.sql
#   snowflake/udf/dq_valid_roche_email.sql
#   snowflake/udf/dq_not_empty.sql
#   snowflake/udf/dq_end_after_start.sql

dbt Test Output

Generated tests use NULL-safe COALESCE — NULL column values are excluded (not treated as DQ failures):

-- Generated by rdt-model-policy compile — do not edit
-- Rule: valid-roche-email
-- Bound to: email_address

SELECT *
FROM {{ ref('my_entity_silver') }}
WHERE COALESCE(NOT (
    REGEXP_LIKE(email_address, '^[a-zA-Z0-9._%+-]+@roche\.com$')
), FALSE)

Snowflake UDF Output (Tier 1)

CREATE OR REPLACE FUNCTION DP_MDM.DQ_VALID_ROCHE_EMAIL(val VARCHAR)
RETURNS OBJECT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
PACKAGES = ('celpy==0.4.0')
HANDLER = 'evaluate'
AS $$
import celpy
...
$$;

Testing Rules Locally

# Single value
rdt-model-policy --target dev eval-rule \
  --rule valid-roche-email \
  --value '{"email": "user@roche.com"}'

# Fixture file
rdt-model-policy --target dev test-rule \
  --rule positive-number \
  --fixtures rules/fixtures/positive-number-test.json

Binding Validation

Before compiling, validate that binding properties exist as fields in the entity model:

rdt-model-policy --target dev --entity my-entity validate-bindings

Comparison is case-insensitive (model fields are UPPER_CASE, bindings use lowercase). The compile command also runs this check automatically when model.json is present.

Rule Registry

rules/registry.yaml is auto-generated by rdt-model-policy catalog:

rdt-model-policy --target dev catalog
rdt-model-policy --target dev catalog --strict  # fail on duplicate CEL

This produces a searchable index of all rules sorted by ID, with metadata (tier, domain, semantic types, file path, content hash).

Content Deduplication

The registry includes a SHA-256 hash of each rule’s normalized CEL expression (cel_hash). During catalog, rules with identical CEL are detected and reported as warnings. Use --strict to fail the build — useful as a CI gate to prevent duplicate rules from merging.

Directory Structure

rules/
├── library/
│   ├── tier1-atomic/          ← 40 rules (format, range, enum, string, temporal, type)
│   ├── tier2-lookup/          ← lookup rules (require Snowflake ref data)
│   └── tier3-business/        ← complex business rules
├── baseline/
│   └── roche-100.yaml         ← index of baseline rule IDs
├── schemas/
│   └── rule.schema.json       ← JSON Schema for rule YAML validation
├── fixtures/                   ← test fixture files for test-rule command
└── registry.yaml              ← auto-generated catalog

Execution Targets

Target	Scale	Use case	Technology
dbt Cloud	Millions of rows	Batch DQ enforcement	CEL → SQL transpilation
Snowflake UDF	Single values	Interactive validation (UI)	celpy (Python)
CLI local	Single values	Developer testing, CI	cel-interpreter (Rust)
Mulesoft API (future)	Request/response	Real-time external	cel-java

Quality Assurance Approach — End-to-end QA process, four gates, roles
rdt-model-policy Module — CLI subcommands for compilation and testing
Governance, Auditing & Compliance — How DQ fits the broader governance model
ADR 0013 — CEL-based DQ Rule Library — Architecture decision