Before any mass CSV upload, I stage then import inside one transaction, UTF-8 only, ISO dates, strict schema, and unique keys checked. I double verify row counts and checksums, and watch for Excel killing leading zeros. csvkit and Notepad++ help. What’s your go-to for delimiter drift?
Had a carrier dump switch comma to semicolon at row ~12k. Row count matched. columns blew up. For drift, sample head/tail 200, then normalize with csvformat -T and check constant field counts via a quick python csv.reader. If counts vary, split by delimiter segment and reconvert. Also bind Notepad++ column mode to a hotkey to eyeball rogue quotes fast.
Agree with KeyCrew: I add a dialect sanity check,sample head/middle/tail and assert same delimiter/quote and column count,then dry-run 1k rows into a temp table with strict types. Reason: catches mid-file delimiter flips, quoting weirdness, and encoding/newline issues before the big import.