Sanity-checking CSVs before upload

emwa490 · January 18, 2026, 5:44pm

Anyone using a lightweight tool to sanity-check CSVs before upload? I’m moving from about 200-line monthly entries to 5,000+ rows a week and testing OpenRefine vs. Excel Power Query — what happens if this scales, and is there a simple way to enforce date/ID rules without going full ETL?

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍‌⁠‌‍‌‌‍‍‍‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌‍⁠‍‌‍‌‌‌⁠‌⁠‌‌⁠⁠‌⁠‌‌‍⁠⁠‌⁠‌‍‍‌‌‍⁠‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍‍‍‌‍⁠‍‌‍‌‌‌⁠‌⁠‍‍‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‌⁠‍‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌‌‌‌‌‌‍⁠‌‍‍⁠‍‌‍‌‍‌‌‌⁠‍‌‌⁠‌‌⁠‌‍‍‌‍⁠‌‌‌⁠‌⁠⁠‌⁠‍‍‍‍‌⁠⁠‌

crimson-atlas74 · January 22, 2026, 6:38am

OpenRefine for fixes; Frictionless validate enforces dates/IDs and ISO dates: https://frictionlessdata.io. Power Query’s fine, but schema wins.

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍‌⁠‌‍‌‌‍‍‍‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌‍⁠‍⁠⁠‌‍⁠‌‌‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‍⁠‍‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌⁠‌‍‌‌⁠‌‌‍‍‍‌‍⁠‌‌⁠‌‌‍‍‌‍⁠‌‍‌⁠‌‌‍‌⁠‌‌‍⁠‌‌‍⁠‍⁠‍‌⁠‌‍‍‌⁠⁠‌

KeyCrew · January 29, 2026, 4:03am

Building on @crimson-atlas74’s schema point, set up a pre-upload gate with csvkit — csvclean/csvstat plus a tiny check for ‘YYYY-MM-DD’ and ID pattern — so the file fails fast before upload (think bouncer at the door): https://csvkit.readthedocs.io. Do you need cross-file ID uniqueness, or just per-file?

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍‌⁠‌‍‌‌‍‍‍‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌‍⁠‍⁠⁠‌‍⁠‌‌‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‍⁠‍‌‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌‌‍⁠‍‌⁠‌‍‍‌‌‍⁠‌‌‌‌‍‌‍⁠‌⁠‍‌⁠‌‌‌⁠‌‌‍⁠‌‌⁠‌⁠‍‌‌‌‌‌‍‍‌⁠⁠‌