6 months ago

Thurs Oct 17, 2024 5:17pm PST

Show HN: Zerox v1 – Document OCR with GPT-vision

Hey everyone! Today we're launching the stable release of Zerox, our open source OCR tool we've been building at OmniAi.

This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document". But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost.

In particular, we've seen the vision models do a great job on charts, infographics, and handwritten text. Documents are a visual format after all, so a vision model makes sense!

I posted the first experiments on HN, and since then, we've had some great contributors who have helped turn this into a full package. We have two versions now:

- pip package [https://pypi.org/project/py-zerox/] - npm package [https://www.npmjs.com/package/zerox]

Next steps for us are working on building an open source dataset for fine tuning. We've seen some early success with a charts=>markdown fine tuning data set, and excited to keep building.