2 months ago
Mon Aug 26, 2024 4:33pm PST
Show HN: Helix apply -f deploy YAML GenAI with RAG and APIs on local open models [video]
Demo starts at 50m into the video. This was a bit terrifying to record because 2am the previous night everything was totally broken after a major refactor (so that we could add external LLM support as well as local GPUs). But pressure can be a useful force :-D

We start with a stack deployed on my laptop without a GPU, pointing to together.ai so we can run open source LLMs easily without having to have access to a GPU. We show simple inference through the ChatGPT-like web interface (with users, sessions etc) and then simple drag'n'drop RAG.

Then we show some helix apps defined as yaml: Marvin the Paranoid Android (just a system prompt on top of llama3:8b), an HR app that interacts with an API, and a surprise API integration I'd done that morning with the podcast host's own OpenAPI spec for their app Screenly.

Finally, we deploy it for real on a DigitalOcean droplet for the controlplane - see https://docs.helix.ml/helix/getting-started/architecture/ - and a $0.35/h A40 on runpod.io. Armed with a real GPU, we can do image inference and fine-tuning as well as all the things described above!

read article
comments:
add comment
loading comments...