rdt-model-profile
rdt-model-profile is an ingest module for upstream discovery. When onboarding a new data product from an existing database table that has no RTiS representation yet, this module profiles the table and produces a JSON snapshot containing:
- Full table structure (columns, types, constraints, indexes, foreign keys)
- Column-level statistics (nulls, distinct counts, histograms)
- Sample data rows (array-of-arrays format)
- DDL reconstruction
This output is consumed by rdt-model-infer to generate enriched metadata (terminologies, synonyms, descriptions) via LLM, which then seeds a new RTiS entity. After that, the standard pipeline takes over.
Pipeline phase: Phase 1 — Ingest
Profile existing table → LLM enrichment → Create RTiS entity → Standard pipeline (profile) (infer) (manual/assisted) (pull → compile → store)Usage:
cargo run -p rdt-model-profile -- --target dev --dry-run profile \ --database-type snowflake --schema PUBLIC --table CUSTOMERS
cargo run -p rdt-model-profile -- --target dev profile \ --database-type snowflake --schema PUBLIC --table CUSTOMERS
cargo run -p rdt-model-profile -- --target dev profile \ --database-type aurora-postgres --schema public --table customers
cargo run -p rdt-model-profile -- --target dev profile \ --database-type snowflake --schema PUBLIC --table CUSTOMERS \ --sample-rows 500 --output-dir /tmp/my-profiles --gzipConfiguration:
| Key | Source | Description |
|---|---|---|
aurora.host | roche-data.toml | Aurora PostgreSQL hostname |
aurora.port | roche-data.toml | Aurora PostgreSQL port (default 5432) |
aurora.database | roche-data.toml | Aurora database name |
aurora.ssl_mode | roche-data.toml | SSL mode (default “require”) |
snowflake.auth_method | roche-data.toml | Auth method: “keypair” (default) or “wam_oauth” |
AURORA_USERNAME | Environment variable | Aurora service account username |
AURORA_PASSWORD | Environment variable | Aurora service account password |
SNOWFLAKE_WAM_TOKEN_URL | Environment variable | WAM OAuth token endpoint |
SNOWFLAKE_WAM_CLIENT_ID | Environment variable | WAM OAuth client ID |
SNOWFLAKE_WAM_CLIENT_SECRET | Environment variable | WAM OAuth client secret |
Dependencies: None from other pipeline modules — this is a standalone entry point for upstream discovery.
Output:
| File | Format | Description |
|---|---|---|
{output_dir}/{db_type}/{schema}.{table}.profile.json | JSON | Table metadata, statistics, and sample data |
Output is temporary — not committed to git. The output path is printed to stdout for scripting:
Wrote /tmp/rdt-abc123/snowflake/PUBLIC.CUSTOMERS.profile.jsonAccess gates:
- A18 — Aurora PostgreSQL
rdt_profilerread-only role + credentials - A19 — Snowflake WAM OAuth client registration
Uses StubDatabaseProbe with fixture data until both access tasks resolve.