11 months ago
Thurs May 16, 2024 5:48pm PST
Show HN: Open-source tool for data cleaning with LLM
Data cleaning is often the first yet tedious step for DS/ML/BI. It involves tasks such as deduplication, data type casting, handling missing values, correcting typos, and more.

I'm working on an open-source project that guides you through the data cleaning process. It currently supports tables in Pandas/Snowflake/DuckDB, and outputs DBT SQL/YAML for cleaning.

We take an agent approach, which is very different from Copilot. For Copilot, humans need to specify the steps, which is burdensome. For LLM agents, they handle the heavy lifting of exploring data, figuring out what needs cleaning, and guiding humans through steps.

Here is a 1-minute demo: https://youtu.be/D7jw43ccOkg

Please let me know your thoughts and feedback!

read article
comments:
add comment
loading comments...