DQ Rule Library
The DQ Rule Library is a shared repository of validation rules defined using CEL (Common Expression Language). Rules are defined once and bound to any entity’s properties, then compiled to multiple execution targets.
Architecture
Section titled “Architecture”| Plane | System | What it holds |
|---|---|---|
| Definition | Git (rules/library/) | Rule logic (CEL), metadata, version |
| Assignment | Git (models/{entity}/rules.yaml) | Which rules apply to which properties |
| Execution | dbt Cloud / Snowflake UDF | Running the checks |
Rule Tiers
Section titled “Rule Tiers”| Tier | Name | Characteristics | Execution |
|---|---|---|---|
| 1 | Atomic | No external dependencies | All targets (local, dbt, UDF) |
| 2 | Lookup | Requires Snowflake reference data | dbt + UDF only |
| 3 | Business | Complex multi-check rules | dbt + UDF only |
Writing a Rule
Section titled “Writing a Rule”Rules live in rules/library/{tier}/{domain}/{rule-id}.yaml:
id: valid-roche-emailversion: "1.0.0"tier: 1domain: formatsemantic_types: [email, roche_email]tags: [baseline, roche-100, pii]description: Validates Roche corporate email formatcel: | value.email.matches("^[a-zA-Z0-9._%+-]+@roche\\.com$")message: "Must be a valid @roche.com email address"CEL Expressions
Section titled “CEL Expressions”value is a struct representing the full record. Access fields as value.field_name.
Supported operations:
| CEL | SQL (transpiled) |
|---|---|
value.col == "x" | col = 'x' |
value.col > 0 | col > 0 |
value.a > value.b | a > b |
value.col.matches("pat") | REGEXP_LIKE(col, 'pat') |
value.col.contains("x") | CONTAINS(col, 'x') |
value.col.startsWith("x") | STARTSWITH(col, 'x') |
value.col.endsWith("x") | ENDSWITH(col, 'x') |
value.col.size() | LENGTH(col) |
value.col in ["a","b"] | col IN ('a', 'b') |
expr1 && expr2 | (expr1) AND (expr2) |
expr1 || expr2 | (expr1) OR (expr2) |
!expr | NOT (expr) |
Tier 2 Rules (Lookup)
Section titled “Tier 2 Rules (Lookup)”Add a refs: section declaring external data sources:
id: vendor-exists-activetier: 2refs: mrhub_vendors: source: snowflake table: GOLD_MRHUB_VENDORS key_column: CODE refresh: PT1Hcel: | value.vendor_code in ref.mrhub_vendors.codes && ref.mrhub_vendors[value.vendor_code].status == "ACTIVE"CEL → SQL for lookups:
| CEL | SQL |
|---|---|
value.col in ref.X.codes | col IN (SELECT key FROM table) |
ref.X[value.col].field == "y" | (SELECT field FROM table WHERE key = col) = 'y' |
Binding Rules to Entities
Section titled “Binding Rules to Entities”models/{entity}/rules.yaml maps properties to library rules:
rule_group: EntityDQentity_ref: "rtis://roche/entities/my-entity"version: "2.0.0"
bindings: # Single-field binding - property: email_address rules: [valid-roche-email, not-empty]
# Cross-field binding - properties: [end_date, start_date] rules: [end-after-start]
composites: - id: record-complete operator: AND requires: [valid-roche-email, not-empty] message: "Record failed validation"Single-field: The rule’s generic CEL field name is remapped to the bound property. Cross-field: The CEL expression uses the actual field names as written.
Compilation
Section titled “Compilation”# Compile rules for an entityrdt-model-policy --target dev --entity my-entity compile
# Output:# dbt/tests/my-entity/dq_valid_roche_email.sql# dbt/tests/my-entity/dq_not_empty.sql# dbt/tests/my-entity/dq_end_after_start.sql# snowflake/udf/dq_valid_roche_email.sql# snowflake/udf/dq_not_empty.sql# snowflake/udf/dq_end_after_start.sqldbt Test Output
Section titled “dbt Test Output”Generated tests use NULL-safe COALESCE — NULL column values are excluded (not treated as DQ failures):
-- Generated by rdt-model-policy compile — do not edit-- Rule: valid-roche-email-- Bound to: email_address
SELECT *FROM {{ ref('my_entity_silver') }}WHERE COALESCE(NOT ( REGEXP_LIKE(email_address, '^[a-zA-Z0-9._%+-]+@roche\.com$')), FALSE)Snowflake UDF Output (Tier 1)
Section titled “Snowflake UDF Output (Tier 1)”CREATE OR REPLACE FUNCTION DP_MDM.DQ_VALID_ROCHE_EMAIL(val VARCHAR)RETURNS OBJECTLANGUAGE PYTHONRUNTIME_VERSION = '3.11'PACKAGES = ('celpy==0.4.0')HANDLER = 'evaluate'AS $$import celpy...$$;Testing Rules Locally
Section titled “Testing Rules Locally”# Single valuerdt-model-policy --target dev eval-rule \ --rule valid-roche-email \ --value '{"email": "user@roche.com"}'
# Fixture filerdt-model-policy --target dev test-rule \ --rule positive-number \ --fixtures rules/fixtures/positive-number-test.jsonBinding Validation
Section titled “Binding Validation”Before compiling, validate that binding properties exist as fields in the entity model:
rdt-model-policy --target dev --entity my-entity validate-bindingsComparison is case-insensitive (model fields are UPPER_CASE, bindings use lowercase). The compile command also runs this check automatically when model.json is present.
Rule Registry
Section titled “Rule Registry”rules/registry.yaml is auto-generated by rdt-model-policy catalog:
rdt-model-policy --target dev catalogrdt-model-policy --target dev catalog --strict # fail on duplicate CELThis produces a searchable index of all rules sorted by ID, with metadata (tier, domain, semantic types, file path, content hash).
Content Deduplication
Section titled “Content Deduplication”The registry includes a SHA-256 hash of each rule’s normalized CEL expression (cel_hash). During catalog, rules with identical CEL are detected and reported as warnings. Use --strict to fail the build — useful as a CI gate to prevent duplicate rules from merging.
Directory Structure
Section titled “Directory Structure”rules/├── library/│ ├── tier1-atomic/ ← 40 rules (format, range, enum, string, temporal, type)│ ├── tier2-lookup/ ← lookup rules (require Snowflake ref data)│ └── tier3-business/ ← complex business rules├── baseline/│ └── roche-100.yaml ← index of baseline rule IDs├── schemas/│ └── rule.schema.json ← JSON Schema for rule YAML validation├── fixtures/ ← test fixture files for test-rule command└── registry.yaml ← auto-generated catalogExecution Targets
Section titled “Execution Targets”| Target | Scale | Use case | Technology |
|---|---|---|---|
| dbt Cloud | Millions of rows | Batch DQ enforcement | CEL → SQL transpilation |
| Snowflake UDF | Single values | Interactive validation (UI) | celpy (Python) |
| CLI local | Single values | Developer testing, CI | cel-interpreter (Rust) |
| Mulesoft API (future) | Request/response | Real-time external | cel-java |
Related
Section titled “Related”- Quality Assurance Approach — End-to-end QA process, four gates, roles
- rdt-model-policy Module — CLI subcommands for compilation and testing
- Governance, Auditing & Compliance — How DQ fits the broader governance model
- ADR 0013 — CEL-based DQ Rule Library — Architecture decision