After trying to build something more substantial (a script that takes a text description and attempts to scrape that info out of a collection of PDFs/websites), I realized there are a number of annoyances with getting this out of the prototype stage:
* Parsing prompt responses to ensure they match my expected schema (e.g. I want XPATH selectors, sometimes the model hallucinates a DOM id)
* Hacks to avoid long context windows (especially if the context isn't easily vector-searchable, e.g. a DOM tree)
* Retry logic
* Measuring how well the system is doing over multiple examples
AI Twitter is full of examples of how LLMs, AutoGPT, etc. are cure-alls, but what are some of the practical issues that actually come up when you try to build on top of these yourselves?