Skip to content

Pipeline Overview

The RDT MODEL platform produces a complete data product from a single RTiS entity. The pipeline chains 20 modules across 6 phases. Each module reads a JSON manifest, writes a JSON result, and operates in an isolated workspace that enables safe parallel execution across entities.

This page describes the pipeline structure. See the Modules section for each module’s documentation.

Pipeline phases — 6 sequential phases with modules that can run in parallel within each phase

Phases run sequentially. Modules within the same phase can run in parallel when they have no dependency on each other.

Every arrow is a concrete JSON path connecting one module’s output to the next module’s input.

Pipeline data flow — external systems feeding into and consuming from the pipeline workspace

PhaseModuleDepends onParallel with
1 Ingestrdt-model-pull(entry point)Profile
1 Ingestrdt-model-profile (optional)(entry point)Pull
2 Enrichrdt-model-govern(entry point)Infer
2 Enrichrdt-model-infer (optional)PullGovern
3 Preparerdt-model-compilePull, Govern, Infer
3 Preparerdt-model-validateCompile
4 Deployrdt-model-storeCompile, ValidatePolicy, Api, Mcp, Sdk, Contract
4 Deployrdt-model-policyCompile, ValidateStore, Api, Mcp, Sdk, Contract
4 Deployrdt-model-apiCompile, ValidateStore, Policy, Mcp, Sdk, Contract
4 Deployrdt-model-mcpCompile, ValidateStore, Policy, Api, Sdk, Contract
4 Deployrdt-model-sdkCompile, ValidateStore, Policy, Api, Mcp, Contract
4 Deployrdt-model-contractCompile, ValidateStore, Policy, Api, Mcp, Sdk
5 Registerrdt-model-registerAll Deploy modulesGupri, Search
5 Registerrdt-model-gupriCompileRegister, Search
5 Registerrdt-model-searchCompileRegister, Gupri
6 Supportrdt-model-docsCompileCidb, Event
6 Supportrdt-model-cidbAll Register modulesDocs, Event
6 Supportrdt-model-eventAll Register modulesDocs, Cidb

This diagram shows exactly which output field feeds which input field across module boundaries.

Data wiring between modules — field-level connections from outputs to inputs

Every pipeline run creates an isolated workspace directory, enabling safe parallel execution.

{base_dir}/rdt-{entity_id}-{run_id}/
ComponentSourceExample
base_dir$RDT_WORKSPACE_DIR or $TMPDIR or /tmp/tmp
entity_idSanitised entity namewaste-tracking
run_idUUIDv4a1b2c3d4-e5f6-7890-abcd-ef1234567890

Workspace layout — directory tree showing per-phase subdirectories and result files

  1. Create — orchestrator generates workspace with unique UUID.
  2. Populate — each module writes to its own subdirectory.
  3. Promote — after validation, copy artifacts to final repo paths via paths.rs.
  4. Clean up — delete workspace (or retain with --keep-workspace for debugging).

Multiple entities run concurrently in separate workspaces. Even two runs of the same entity are safe — different UUIDs mean different directories.

Entity A: /tmp/rdt-waste-tracking-{uuid-a}/ ← independent
Entity B: /tmp/rdt-site-energy-{uuid-b}/ ← independent
Entity C: /tmp/rdt-vendor-quality-{uuid-c}/ ← independent

The pipeline is composed by an external orchestrator (shell script or GitHub Actions), not by a module. Each step invokes the binary with a JSON manifest.

#!/usr/bin/env bash
set -euo pipefail
ENTITY="$1"
RUN_ID="$(uuidgen)"
WS="/tmp/rdt-${ENTITY}-${RUN_ID}"
mkdir -p "$WS"
# Phase 1: Ingest
rdt-model-pull --manifest <(jq -n \
--arg e "$ENTITY" --arg ws "$WS" \
'{entity_id: $e, workspace: $ws}')
# Phase 2: Enrich
rdt-model-govern --manifest <(jq -n \
--arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws, model_path:"pull/model.json"}')
# Phase 3: Prepare
rdt-model-compile --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws}')
rdt-model-validate --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws}')
# Phase 4: Deploy (parallel)
rdt-model-store --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws, model_path:"pull/model.json"}') &
rdt-model-policy --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws, model_path:"pull/model.json"}') &
rdt-model-api --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws, model_path:"pull/model.json"}') &
rdt-model-mcp --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws, model_path:"pull/model.json"}') &
rdt-model-sdk --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws, model_path:"pull/model.json"}') &
rdt-model-contract --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws, model_path:"pull/model.json"}') &
wait
# Phase 5: Register (parallel)
rdt-model-register --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws}') &
rdt-model-gupri --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws}') &
rdt-model-search --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws}') &
wait
# Phase 6: Support (parallel)
rdt-model-docs --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws}') &
rdt-model-cidb --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws}') &
rdt-model-event --manifest <(jq -n --arg e "$ENTITY" --arg ws "$WS" \
'{entity_id:$e, workspace:$ws}') &
wait
# Promote and clean up
rdt-model-compile promote --workspace "$WS" --entity "$ENTITY"
rm -rf "$WS"
jobs:
pipeline:
strategy:
matrix:
entity: [waste-tracking, site-energy, vendor-quality]
steps:
- uses: actions/checkout@v4
- name: Run pipeline
run: ./scripts/pipeline.sh ${{ matrix.entity }}

Each matrix job runs in its own runner — full parallelism, zero contention.

After validation passes, the promote step copies artifacts from the workspace to their final repository paths. This keeps the repo untouched if validation fails.

Terminal window
rdt-model-compile promote --workspace "$WS" --entity waste-tracking

Promote reads compile/compile-result.json, maps each artifact to its destination via paths.rs, and copies. It writes promote-result.json listing files created, updated, or unchanged.

Every JSON exchanged between modules has a corresponding JSON Schema. Validation happens at three layers:

LayerWhereWhat
LibraryInside every modulecommon::manifest::load_and_validate() validates manifests on entry and results on exit
CLIrdt-model-validate schemaStandalone validation of any JSON against any schema
EmbeddedAt compile timeAll schemas are embedded via include_str! — the binary is self-contained

The jsonschema crate (already a workspace dependency) handles all validation. No external tool needed.

  • ADR 0006 — Multi-binary workspace (superseded by ADR 0011)
  • ADR 0007 — Data product lifecycle (defines the pipeline phases)
  • ADR 0008 — CLI module standards (defines per-binary conventions)
  • ADR 0009 — Module I/O contracts (full specification)
  • ADR 0011 — Pipeline restructure (18-module / 6-phase inventory)