Skip to content

Writing DQ Rules

Step-by-step guide for authoring new DQ rules, binding them to entities, and testing before deployment.

Create a YAML file in rules/library/{tier}/{domain}/{rule-id}.yaml:

id: valid-roche-email
version: 1.0.0
tier: 1
domain: format
semantic_types: [email, roche_email]
tags: [roche-100]
description: Validates Roche corporate email format
cel: 'value.email.matches("^[a-zA-Z0-9._%+-]+@roche\\.com$")'
message: "Must be a valid @roche.com email address"

Requirements:

  • id must match the filename (without .yaml)
  • tier must match the directory: tier1-atomic/ → tier 1, tier2-lookup/ → tier 2
  • cel must be a valid CEL expression returning a boolean
  • semantic_types determines which fields the rule auto-binds to
CELSQL (transpiled)
value.field == "X"field = 'X'
value.field > 0field > 0
value.field.matches("^pattern$")REGEXP_LIKE(field, '^pattern$')
value.field.size() > 0LENGTH(field) > 0
value.field.contains("X")CONTAINS(field, 'X')
value.field.startsWith("X")STARTSWITH(field, 'X')
value.field in ["A", "B"]field IN ('A', 'B')
value.end > value.startend > start
!exprNOT (expr)
expr1 && expr2(expr1) AND (expr2)

For rules that check against reference data, add a refs section:

id: vendor-exists-active
version: 1.0.0
tier: 2
domain: vendor
semantic_types: [vendor_code]
refs:
mrhub_vendors:
source: snowflake
table: GOLD_MRHUB_VENDORS
key_column: CODE
refresh: PT1H
cel: 'value.vendor_code in ref.mrhub_vendors.codes && ref.mrhub_vendors[value.vendor_code].status == "ACTIVE"'
message: "Vendor must exist in MRHub and be active"

Create a JSON file with test cases:

{
"cases": [
{"input": {"email": "user@roche.com"}, "expected": true},
{"input": {"email": "USER@ROCHE.COM"}, "expected": false},
{"input": {"email": "user@gmail.com"}, "expected": false},
{"input": {"email": ""}, "expected": false}
]
}

Save to rules/fixtures/{rule-id}-test.json.

Terminal window
# Single value test
rdt-model-policy eval-rule \
--rule valid-roche-email \
--value '{"email": "user@roche.com"}'
# Batch test with fixtures
rdt-model-policy test-rule \
--rule valid-roche-email \
--fixtures rules/fixtures/valid-roche-email-test.json

All test cases must pass before proceeding.

Terminal window
rdt-model-policy --target dev catalog

This regenerates rules/registry.yaml with your new rule included, along with a SHA-256 content hash for deduplication. If your rule’s CEL expression is identical to an existing rule, you’ll see a warning — consider reusing the existing rule instead.

Use --strict in CI to block duplicate rules from merging:

Terminal window
rdt-model-policy --target dev catalog --strict

Edit models/{entity}/rules.yaml:

bindings:
# Single-field binding
- property: contact_email
rules: [valid-roche-email]
# Cross-field binding (rule uses actual field names)
- properties: [end_date, start_date]
rules: [end-after-start]

Combine multiple rules into a single pass/fail check:

composites:
- id: record-complete
operator: AND
requires: [valid-uuid, valid-roche-email, positive-number]
message: "Record failed composite validation"

Before compiling, verify that your binding properties match actual model fields:

Terminal window
rdt-model-policy --target dev --entity waste-tracking validate-bindings

This catches typos and mismatches (case-insensitive comparison). The compile command also runs this check automatically.

Terminal window
# Compile to dbt tests + UDFs
rdt-model-policy --target dev --entity waste-tracking compile
# Verify generated output
cat dbt/tests/waste-tracking/dq_all_rules.sql
cat snowflake/udf/dq_valid_roche_email.sql
Terminal window
git add rules/library/ rules/fixtures/ models/waste-tracking/rules.yaml
git add dbt/tests/ snowflake/udf/ rules/registry.yaml
git commit -m "feat(policy): add valid-roche-email rule for waste-tracking"
gh pr create

The PR review validates:

  • CEL expression correctness
  • Test fixture coverage
  • Appropriate binding to entity properties
  • Generated SQL is correct
# Bind both rules to the same property
- property: name
rules: [is-not-null, not-empty]
cel: 'value.status in ["ACTIVE", "INACTIVE", "DECOMMISSIONED"]'
- properties: [end_date, start_date]
rules: [end-after-start]
cel: 'value.percent >= 0 && value.percent <= 100'