Gnorman the Diff Gnome mascot Diff Gnome Documentation

CLI-first cross-database verification and repair. MySQL, PostgreSQL, SQL Server.

Diff Gnome is a CLI tool for verifying that two datasets actually match. It's built for real cross-database verification and repair workflows — compare two tables, dry-run or apply syncs, preflight risky jobs with doctor, clean duplicate rows with dedupe, restore backup exports, and batch everything through config files.

MySQL, PostgreSQL, SQL Server
Schema comparison, row counts, chunked row comparison
Recursive drill-down for mismatched chunks
Sample missing, extra, and changed rows
Apply/dry-run sync workflows
Doctor preflight checks
Dedupe and restore workflows
Batch run configs (YAML or JSON)
Text, JSON, and Markdown output

⚙️ How It Works

Diff Gnome uses chunk-hashing to efficiently compare large tables without scanning every row individually.

01
Inspect schemas
Compares column names, types, and nullability on both sides using loose or strict matching rules.
02
Compare row counts
Counts filtered rows on both sides. A mismatch here flags a problem before any data streaming begins.
03
Stream & chunk-hash
Streams both tables ordered by the configured key columns. Matching chunks are skipped instantly with a hash comparison.
04
Recursive drill-down
When a chunk hash mismatches, Diff Gnome recursively splits the chunk until it isolates exactly which rows changed.
05
Record missing / extra / changed
When keys diverge, rows are classified as missing from right, extra on right, or changed — with samples in the report.

🔍 compare

Compare two tables across databases. Reports schema differences, row count mismatches, and changed/missing/extra rows.

basic compare — mysql → postgres
$ ./diff-gnome compare \
  --left-driver mysql \
  --left-dsn 'user:pass@tcp(localhost:3306)/appdb' \
  --right-driver postgres \
  --right-dsn 'postgres://user:pass@localhost:5432/warehouse?sslmode=disable' \
  --table customer_orders \
  --key id

Fail on any difference

Use --fail-on-diff to exit with a non-zero status when any schema, count, or row differences are found — useful in CI pipelines.

--fail-on-diff
$ ./diff-gnome compare \
  --left-driver mysql --left-dsn '...' \
  --right-driver postgres --right-dsn '...' \
  --table customer_orders --key id \
  --fail-on-diff

Export mismatches to a file

Use --mismatch-file to export every mismatch as a JSON object per line (JSONL) while keeping the main report compact.

--mismatch-file
$ ./diff-gnome compare \
  --left-driver mysql --left-dsn '...' \
  --right-driver sqlserver --right-dsn '...' \
  --table customer_orders --key id \
  --mismatch-file reports/customer-orders-mismatches.jsonl

Filter rows with --where

--where filter
$ ./diff-gnome compare \
  --left-driver mysql --left-dsn '...' \
  --right-driver postgres --right-dsn '...' \
  --table customer_orders --key id \
  --where "created_at >= '2026-01-01'"

Different table names on each side

--left-table / --right-table
$ ./diff-gnome compare \
  --left-driver mysql --left-dsn '...' \
  --right-driver sqlserver --right-dsn '...' \
  --left-table customer_orders_archive \
  --right-table customer_orders \
  --key id

Composite keys

composite --key
$ ./diff-gnome compare \
  --table customer_order_lines \
  --key order_id,line_number \
  --left-driver mysql --left-dsn '...' \
  --right-driver sqlserver --right-dsn '...'

Keyless compare

Omit --key to match rows on full row contents instead of a primary key.

keyless
$ ./diff-gnome compare \
  --left-driver mysql --left-dsn 'user:pass@tcp(left-host:3306)/appdb' \
  --right-driver mysql --right-dsn 'user:pass@tcp(right-host:3306)/appdb' \
  --table event_journal

JSON output

--output json
$ ./diff-gnome compare ... --output json

🔄 sync

Sync differences from left to right. Always dry-run first — add --apply only when you're confident in the output.

💡
Run without --apply first to preview exactly what will be inserted, updated, or deleted. Diff Gnome will show you the counts before touching anything.
dry-run, then apply
# Preview first
$ ./diff-gnome sync \
  --left-driver mysql --left-dsn '...' \
  --right-driver postgres --right-dsn '...' \
  --table customer_orders --key id \
  --where "created_at >= '2026-01-01'" \
  --max-writes 500

# Apply when ready
$ ./diff-gnome sync ... --apply

Insert-only (keyless)

--insert-only can run without a key — matches rows on full row contents and inserts rows that exist only on the right.

Safety flags

FlagDescription
--max-writes NAbort if the sync would exceed N total inserts + updates
--max-deletes NAbort if the sync would exceed N deletes
--require-whereRefuse to run without a --where filter
--delete-missingAlso delete rows that exist on right but not left
--insert-onlyOnly insert missing rows, never update or delete
--write-batch-size NCommit writes in batches of N rows

🩺 doctor

A read-only preflight command for two-sided runs. Run doctor before any sync that touches production data.

Doctor checks:

Schema compatibility (using the same loose/strict type rules as compare and sync)
Filtered row counts on both sides
Duplicate-key groups on both sides for the key you provided
Whether the chosen key is backed by an index on each side
Whether the --where columns are backed by leading index columns
How many indexes exist on the target table (flags likely slow writes)
Left-side triggers that would fire during sync --apply or dedupe --apply
Safety warnings based on --apply, --max-writes, --max-deletes, --require-where
⚠️
Doctor does not compare every row and never writes data. Use it to catch weak keys, duplicate-key risk, missing indexes, and trigger-heavy targets before you run a real sync.
doctor preflight
$ ./diff-gnome doctor \
  --left-driver sqlserver --left-dsn 'sqlserver://user:pass@sqlhost:1433?database=appdb&encrypt=disable' \
  --right-driver mysql --right-dsn 'user:pass@tcp(target:3306)/appdb' \
  --table candidate_records \
  --key record_id \
  --where "updated_at >= '2026-01-01'" \
  --apply --max-writes 5000

🧹 dedupe

Finds and removes duplicate rows within a single datasource. Runs a backup export before deleting anything when --backup-file is set.

Use --ignore-column to exclude noise columns (like updated_at) from duplicate matching. --apply supports SQL Server, MySQL, and PostgreSQL.

dedupe with backup
$ ./diff-gnome dedupe \
  --driver sqlserver \
  --dsn 'sqlserver://user:pass@sqlhost:1433?database=appdb&encrypt=disable' \
  --table monthly_statement_items \
  --key account_id,statement_month,recipient_id,invoice_number \
  --ignore-column updated \
  --backup-file reports/monthly-statement-items-dedupe-backup.sql \
  --backup-format sql \
  --apply

♻️ restore

Replays a backup export (SQL or JSONL) back into a datasource. Use SQL backups for direct replay, or JSONL when you want Diff Gnome to translate structured backup records into row-level inserts, updates, and deletes.

Use --verify-only for a stronger read-only check — for JSONL backups it validates the referenced tables and columns against the target without replaying anything.

restore from SQL backup
$ ./diff-gnome restore \
  --driver sqlserver \
  --dsn 'sqlserver://user:pass@sqlhost:1433?database=targetdb&encrypt=disable' \
  --input-file reports/monthly-statement-items-dedupe-backup.sql \
  --input-format sql \
  --max-statements 500 \
  --transaction \
  --apply

run (batch)

Run multiple compare, sync, dedupe, restore, or doctor jobs from a single YAML or JSON config file, with bounded parallelism.

run from config
$ ./diff-gnome run --config examples/run-config.yaml

# Override parallelism from the CLI
$ ./diff-gnome run --config examples/run-config.yaml --parallel 4
Gnorman
For multi-connection setups, prefer diff-gnome run --config over hidden shell state. Named connections keep your DSNs out of shell history and make your jobs reproducible.

🔐 Credentials & Env Vars

Keep DSNs out of your shell history by using environment variables instead of inline credentials.

standard env vars
$ export DIFFGNOME_LEFT_DSN='user:pass@tcp(source-host:3306)/appdb'
$ export DIFFGNOME_RIGHT_DSN='postgres://user:pass@localhost:5432/targetdb?sslmode=disable'

$ ./diff-gnome compare \
  --left-driver mysql \
  --right-driver postgres \
  --table customer_orders --key id

Custom env var names

Point at any environment variable name using --left-dsn-env and --left-driver-env.

custom env var names
$ export SOURCE_DB_DSN='user:pass@tcp(source-host:3306)/appdb'
$ export SOURCE_DB_DRIVER='mysql'

$ ./diff-gnome compare \
  --left-driver-env SOURCE_DB_DRIVER \
  --left-dsn-env SOURCE_DB_DSN \
  --right-driver-env TARGET_DB_DRIVER \
  --right-dsn-env TARGET_DB_DSN \
  --table customer_orders --key id

🚩 Common Flags

Column handling

FlagDescription
--ignore-column colExclude a column from schema comparison and row matching (useful for audit columns like changedon)
--date-only-column colNormalize a column to calendar date only — handles cases where one side stores timestamps and the other stores dates
--equivalent-value '...'Treat rewritten display values as equal. Format: col[,col]:value=value|value=value
--empty-string-as-nullTreat empty strings and NULLs as equivalent during matching

Ignore timestamp precision differences

--ignore-column
$ ./diff-gnome compare \
  --left-driver mysql --left-dsn '...' \
  --right-driver sqlserver --right-dsn '...' \
  --table customer_orders --key id \
  --ignore-column changedon

Normalize timestamps to date only

--date-only-column
$ ./diff-gnome compare \
  --left-table reporting.activity_log_archive \
  --right-table activity_log \
  --ignore-column changedon \
  --date-only-column changedate

Treat empty strings as NULL

--empty-string-as-null
$ ./diff-gnome compare ... --empty-string-as-null

📤 Output Formats

All commands support --output text (default), --output json, or --output markdown.

FormatUse case
textHuman-readable terminal output
jsonMachine-readable, pipe to jq or monitoring tools
markdownReports stored in repos or rendered in CI

🧩 Type Matching

Type matching is loose by default. Loose mode handles common cross-database type differences automatically:

Temporal types: SMALLDATETIME, DATETIME, DATETIME2, TIMESTAMP, TIMESTAMPTZ are treated as compatible
Date and datetime text values are normalized when they parse cleanly (VARCHAR vs DATE)
Integer families: INT, UNSIGNED INT, TINYINT are treated as compatible
Text families: VARCHAR and TEXT are treated as compatible
Numeric values are normalized so 1.16 and 1.160 compare as equal
Common punctuation variants are normalized during text comparison

Use --strict-types or strict_types: true in config when you want exact type matches.

📋 Config File Reference

Run config files (YAML or JSON) support the following top-level fields:

FieldDescription
parallelHow many jobs run at once
continue_on_errorKeep going after a job fails
animateShow animated batch progress on stderr
summary_onlySuppress per-job output; show only live progress and final summary
output_dirDirectory for generated job reports and mismatch exports
summary_fileWrite a consolidated run summary to output_dir/run-summary.txt
summary_formattext, json, or markdown
exception_fileWrite a consolidated failure report to output_dir/run-exceptions.txt
exception_formattext, json, or markdown
connectionsNamed connection blocks — avoids repeating DSNs across jobs
defaultsShared settings applied to all jobs unless overridden
jobsList of jobs: mode: compare, sync, dedupe, restore, or doctor

Job fields mirror CLI flags using snake_case. Two-sided jobs use left and right connection references. Single-datasource jobs (dedupe, restore) use connection.

Per-job file outputs

FieldDescription
output_file: trueWrite a per-job report to output_dir/<job-name>.txt|.json|.md
event_file: trueWrite a machine-readable JSONL event stream to output_dir/<job-name>-events.jsonl
mismatch_file: trueWrite every mismatch as one JSON object per line to output_dir/<job-name>-mismatches.jsonl (compare jobs only)
backup_file: trueAuto-generate a backup file path in output_dir (sync and dedupe jobs only)
backup_formatjsonl or sql

📄 Full Config Example

examples/run-config.yaml
parallel: 2
continue_on_error: true
animate: true
summary_only: true
output_dir: reports
summary_file: true
summary_format: markdown
exception_file: true
exception_format: json

connections:
  target_sqlserver:
    driver: sqlserver
    dsn_env: DIFFGNOME_LEFT_DSN
  source_mysql:
    driver: mysql
    dsn_env: DIFFGNOME_RIGHT_DSN

defaults:
  left: target_sqlserver
  right: source_mysql
  output: text
  chunk_size: 10000
  require_where: true
  output_file: true
  mismatch_file: true
  backup_file: true
  backup_format: sql

jobs:
  - name: audit-trail-compare
    mode: compare
    left_table: reporting.activity_log_archive
    right_table: activity_log
    ignore_columns: changedon
    date_only_columns: changedate

  - name: candidate-doctor
    mode: doctor
    table: candidate_records
    key: record_id
    where: "updated_at >= '2026-01-01'"
    apply: true
    max_writes: 5000
    max_deletes: 100

  - name: candidate-sync
    mode: sync
    table: candidate_records
    key: record_id
    where: "updated_at >= '2026-01-01'"
    apply: true
    write_batch_size: 1000
    output: json

📌 Constraints

Every key column must be non-null and sortable
compare can run without a key by matching rows on full row contents
sync --insert-only can run without a key by matching on full row contents
Use --left-table / --right-table when source and target table names differ
The --where clause is trusted SQL and passed through to both databases unchanged
Only one table is supported per individual compare or sync job
SQL Server support paginates with TOP n ordered by the configured key columns
Table and column names must be bare identifiers or dot-qualified identifiers

🔑 License

Diff Gnome is a commercial tool. A 14-day free trial is included with every download — no credit card required. Licenses are activated per machine.

license commands
# Activate this machine
$ ./diff-gnome license activate --key YOUR-LICENSE-KEY

# Check status and force a live refresh
$ ./diff-gnome license status --check

# Deactivate before swapping machines
$ ./diff-gnome license deactivate

# Open the purchase page
$ ./diff-gnome license purchase
Gnorman
Questions? Reach Gnorman at workshop@gnorman.codes