Skip to content

DQ Rule Library

The DQ Rule Library is a shared repository of validation rules defined using CEL (Common Expression Language). Rules are defined once and bound to any entity’s properties, then compiled to multiple execution targets.

PlaneSystemWhat it holds
DefinitionGit (rules/library/)Rule logic (CEL), metadata, version
AssignmentGit (models/{entity}/rules.yaml)Which rules apply to which properties
Executiondbt Cloud / Snowflake UDFRunning the checks
TierNameCharacteristicsExecution
1AtomicNo external dependenciesAll targets (local, dbt, UDF)
2LookupRequires Snowflake reference datadbt + UDF only
3BusinessComplex multi-check rulesdbt + UDF only

Rules live in rules/library/{tier}/{domain}/{rule-id}.yaml:

id: valid-roche-email
version: "1.0.0"
tier: 1
domain: format
semantic_types: [email, roche_email]
tags: [baseline, roche-100, pii]
description: Validates Roche corporate email format
cel: |
value.email.matches("^[a-zA-Z0-9._%+-]+@roche\\.com$")
message: "Must be a valid @roche.com email address"

value is a struct representing the full record. Access fields as value.field_name.

Supported operations:

CELSQL (transpiled)
value.col == "x"col = 'x'
value.col > 0col > 0
value.a > value.ba > b
value.col.matches("pat")REGEXP_LIKE(col, 'pat')
value.col.contains("x")CONTAINS(col, 'x')
value.col.startsWith("x")STARTSWITH(col, 'x')
value.col.endsWith("x")ENDSWITH(col, 'x')
value.col.size()LENGTH(col)
value.col in ["a","b"]col IN ('a', 'b')
expr1 && expr2(expr1) AND (expr2)
expr1 || expr2(expr1) OR (expr2)
!exprNOT (expr)

Add a refs: section declaring external data sources:

id: vendor-exists-active
tier: 2
refs:
mrhub_vendors:
source: snowflake
table: GOLD_MRHUB_VENDORS
key_column: CODE
refresh: PT1H
cel: |
value.vendor_code in ref.mrhub_vendors.codes
&& ref.mrhub_vendors[value.vendor_code].status == "ACTIVE"

CEL → SQL for lookups:

CELSQL
value.col in ref.X.codescol IN (SELECT key FROM table)
ref.X[value.col].field == "y"(SELECT field FROM table WHERE key = col) = 'y'

models/{entity}/rules.yaml maps properties to library rules:

rule_group: EntityDQ
entity_ref: "rtis://roche/entities/my-entity"
version: "2.0.0"
bindings:
# Single-field binding
- property: email_address
rules: [valid-roche-email, not-empty]
# Cross-field binding
- properties: [end_date, start_date]
rules: [end-after-start]
composites:
- id: record-complete
operator: AND
requires: [valid-roche-email, not-empty]
message: "Record failed validation"

Single-field: The rule’s generic CEL field name is remapped to the bound property. Cross-field: The CEL expression uses the actual field names as written.

Terminal window
# Compile rules for an entity
rdt-model-policy --target dev --entity my-entity compile
# Output:
# dbt/tests/my-entity/dq_valid_roche_email.sql
# dbt/tests/my-entity/dq_not_empty.sql
# dbt/tests/my-entity/dq_end_after_start.sql
# snowflake/udf/dq_valid_roche_email.sql
# snowflake/udf/dq_not_empty.sql
# snowflake/udf/dq_end_after_start.sql

Generated tests use NULL-safe COALESCE — NULL column values are excluded (not treated as DQ failures):

-- Generated by rdt-model-policy compile — do not edit
-- Rule: valid-roche-email
-- Bound to: email_address
SELECT *
FROM {{ ref('my_entity_silver') }}
WHERE COALESCE(NOT (
REGEXP_LIKE(email_address, '^[a-zA-Z0-9._%+-]+@roche\.com$')
), FALSE)
CREATE OR REPLACE FUNCTION DP_MDM.DQ_VALID_ROCHE_EMAIL(val VARCHAR)
RETURNS OBJECT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
PACKAGES = ('celpy==0.4.0')
HANDLER = 'evaluate'
AS $$
import celpy
...
$$;
Terminal window
# Single value
rdt-model-policy --target dev eval-rule \
--rule valid-roche-email \
--value '{"email": "user@roche.com"}'
# Fixture file
rdt-model-policy --target dev test-rule \
--rule positive-number \
--fixtures rules/fixtures/positive-number-test.json

Before compiling, validate that binding properties exist as fields in the entity model:

Terminal window
rdt-model-policy --target dev --entity my-entity validate-bindings

Comparison is case-insensitive (model fields are UPPER_CASE, bindings use lowercase). The compile command also runs this check automatically when model.json is present.

rules/registry.yaml is auto-generated by rdt-model-policy catalog:

Terminal window
rdt-model-policy --target dev catalog
rdt-model-policy --target dev catalog --strict # fail on duplicate CEL

This produces a searchable index of all rules sorted by ID, with metadata (tier, domain, semantic types, file path, content hash).

The registry includes a SHA-256 hash of each rule’s normalized CEL expression (cel_hash). During catalog, rules with identical CEL are detected and reported as warnings. Use --strict to fail the build — useful as a CI gate to prevent duplicate rules from merging.

rules/
├── library/
│ ├── tier1-atomic/ ← 40 rules (format, range, enum, string, temporal, type)
│ ├── tier2-lookup/ ← lookup rules (require Snowflake ref data)
│ └── tier3-business/ ← complex business rules
├── baseline/
│ └── roche-100.yaml ← index of baseline rule IDs
├── schemas/
│ └── rule.schema.json ← JSON Schema for rule YAML validation
├── fixtures/ ← test fixture files for test-rule command
└── registry.yaml ← auto-generated catalog
TargetScaleUse caseTechnology
dbt CloudMillions of rowsBatch DQ enforcementCEL → SQL transpilation
Snowflake UDFSingle valuesInteractive validation (UI)celpy (Python)
CLI localSingle valuesDeveloper testing, CIcel-interpreter (Rust)
Mulesoft API (future)Request/responseReal-time externalcel-java