Writing DQ Rules

Step-by-step guide for authoring new DQ rules, binding them to entities, and testing before deployment.

1. Write the Rule

Create a YAML file in rules/library/{tier}/{domain}/{rule-id}.yaml:

id: valid-roche-email
version: 1.0.0
tier: 1
domain: format
semantic_types: [email, roche_email]
tags: [roche-100]
description: Validates Roche corporate email format
cel: 'value.email.matches("^[a-zA-Z0-9._%+-]+@roche\\.com$")'
message: "Must be a valid @roche.com email address"

Requirements:

id must match the filename (without .yaml)
tier must match the directory: tier1-atomic/ → tier 1, tier2-lookup/ → tier 2
cel must be a valid CEL expression returning a boolean
semantic_types determines which fields the rule auto-binds to

CEL Quick Reference

CEL	SQL (transpiled)
`value.field == "X"`	`field = 'X'`
`value.field > 0`	`field > 0`
`value.field.matches("^pattern$")`	`REGEXP_LIKE(field, '^pattern$')`
`value.field.size() > 0`	`LENGTH(field) > 0`
`value.field.contains("X")`	`CONTAINS(field, 'X')`
`value.field.startsWith("X")`	`STARTSWITH(field, 'X')`
`value.field in ["A", "B"]`	`field IN ('A', 'B')`
`value.end > value.start`	`end > start`
`!expr`	`NOT (expr)`
`expr1 && expr2`	`(expr1) AND (expr2)`

Tier 2 Rules (Lookup)

For rules that check against reference data, add a refs section:

id: vendor-exists-active
version: 1.0.0
tier: 2
domain: vendor
semantic_types: [vendor_code]
refs:
  mrhub_vendors:
    source: snowflake
    table: GOLD_MRHUB_VENDORS
    key_column: CODE
    refresh: PT1H
cel: 'value.vendor_code in ref.mrhub_vendors.codes && ref.mrhub_vendors[value.vendor_code].status == "ACTIVE"'
message: "Vendor must exist in MRHub and be active"

2. Write Test Fixtures

Create a JSON file with test cases:

{
  "cases": [
    {"input": {"email": "user@roche.com"}, "expected": true},
    {"input": {"email": "USER@ROCHE.COM"}, "expected": false},
    {"input": {"email": "user@gmail.com"}, "expected": false},
    {"input": {"email": ""}, "expected": false}
  ]
}

Save to rules/fixtures/{rule-id}-test.json.

3. Test Locally

# Single value test
rdt-model-policy eval-rule \
  --rule valid-roche-email \
  --value '{"email": "user@roche.com"}'

# Batch test with fixtures
rdt-model-policy test-rule \
  --rule valid-roche-email \
  --fixtures rules/fixtures/valid-roche-email-test.json

All test cases must pass before proceeding.

4. Update the Catalog

rdt-model-policy --target dev catalog

This regenerates rules/registry.yaml with your new rule included, along with a SHA-256 content hash for deduplication. If your rule’s CEL expression is identical to an existing rule, you’ll see a warning — consider reusing the existing rule instead.

Use --strict in CI to block duplicate rules from merging:

rdt-model-policy --target dev catalog --strict

5. Bind to an Entity

Edit models/{entity}/rules.yaml:

bindings:
  # Single-field binding
  - property: contact_email
    rules: [valid-roche-email]

  # Cross-field binding (rule uses actual field names)
  - properties: [end_date, start_date]
    rules: [end-after-start]

Composite Bindings

Combine multiple rules into a single pass/fail check:

composites:
  - id: record-complete
    operator: AND
    requires: [valid-uuid, valid-roche-email, positive-number]
    message: "Record failed composite validation"

6. Validate Bindings

Before compiling, verify that your binding properties match actual model fields:

rdt-model-policy --target dev --entity waste-tracking validate-bindings

This catches typos and mismatches (case-insensitive comparison). The compile command also runs this check automatically.

7. Compile and Verify

# Compile to dbt tests + UDFs
rdt-model-policy --target dev --entity waste-tracking compile

# Verify generated output
cat dbt/tests/waste-tracking/dq_all_rules.sql
cat snowflake/udf/dq_valid_roche_email.sql

8. Submit for Review

git add rules/library/ rules/fixtures/ models/waste-tracking/rules.yaml
git add dbt/tests/ snowflake/udf/ rules/registry.yaml
git commit -m "feat(policy): add valid-roche-email rule for waste-tracking"
gh pr create

The PR review validates:

CEL expression correctness
Test fixture coverage
Appropriate binding to entity properties
Generated SQL is correct

Common Patterns

Required field (not null + not empty)

# Bind both rules to the same property
- property: name
  rules: [is-not-null, not-empty]

Enum validation

cel: 'value.status in ["ACTIVE", "INACTIVE", "DECOMMISSIONED"]'

Date range cross-field

- properties: [end_date, start_date]
  rules: [end-after-start]

Bounded numeric

cel: 'value.percent >= 0 && value.percent <= 100'

Quality Assurance Approach — End-to-end QA process and four-gate architecture
DQ Rule Library Reference — Complete specification
rdt-model-policy Module — CLI reference