Skip to content

rdt-model-profile

rdt-model-profile is an ingest module for upstream discovery. When onboarding a new data product from an existing database table that has no RTiS representation yet, this module profiles the table and produces a JSON snapshot containing:

  • Full table structure (columns, types, constraints, indexes, foreign keys)
  • Column-level statistics (nulls, distinct counts, histograms)
  • Sample data rows (array-of-arrays format)
  • DDL reconstruction

This output is consumed by rdt-model-infer to generate enriched metadata (terminologies, synonyms, descriptions) via LLM, which then seeds a new RTiS entity. After that, the standard pipeline takes over.

Pipeline phase: Phase 1 — Ingest

Profile existing table → LLM enrichment → Create RTiS entity → Standard pipeline
(profile) (infer) (manual/assisted) (pull → compile → store)

Usage:

Terminal window
cargo run -p rdt-model-profile -- --target dev --dry-run profile \
--database-type snowflake --schema PUBLIC --table CUSTOMERS
cargo run -p rdt-model-profile -- --target dev profile \
--database-type snowflake --schema PUBLIC --table CUSTOMERS
cargo run -p rdt-model-profile -- --target dev profile \
--database-type aurora-postgres --schema public --table customers
cargo run -p rdt-model-profile -- --target dev profile \
--database-type snowflake --schema PUBLIC --table CUSTOMERS \
--sample-rows 500 --output-dir /tmp/my-profiles --gzip

Configuration:

KeySourceDescription
aurora.hostroche-data.tomlAurora PostgreSQL hostname
aurora.portroche-data.tomlAurora PostgreSQL port (default 5432)
aurora.databaseroche-data.tomlAurora database name
aurora.ssl_moderoche-data.tomlSSL mode (default “require”)
snowflake.auth_methodroche-data.tomlAuth method: “keypair” (default) or “wam_oauth”
AURORA_USERNAMEEnvironment variableAurora service account username
AURORA_PASSWORDEnvironment variableAurora service account password
SNOWFLAKE_WAM_TOKEN_URLEnvironment variableWAM OAuth token endpoint
SNOWFLAKE_WAM_CLIENT_IDEnvironment variableWAM OAuth client ID
SNOWFLAKE_WAM_CLIENT_SECRETEnvironment variableWAM OAuth client secret

Dependencies: None from other pipeline modules — this is a standalone entry point for upstream discovery.

Output:

FileFormatDescription
{output_dir}/{db_type}/{schema}.{table}.profile.jsonJSONTable metadata, statistics, and sample data

Output is temporary — not committed to git. The output path is printed to stdout for scripting:

Wrote /tmp/rdt-abc123/snowflake/PUBLIC.CUSTOMERS.profile.json

Access gates:

  • A18 — Aurora PostgreSQL rdt_profiler read-only role + credentials
  • A19 — Snowflake WAM OAuth client registration

Uses StubDatabaseProbe with fixture data until both access tasks resolve.