We start with a stack deployed on my laptop without a GPU, pointing to together.ai so we can run open source LLMs easily without having to have access to a GPU. We show simple inference through the ChatGPT-like web interface (with users, sessions etc) and then simple drag'n'drop RAG.
Then we show some helix apps defined as yaml: Marvin the Paranoid Android (just a system prompt on top of llama3:8b), an HR app that interacts with an API, and a surprise API integration I'd done that morning with the podcast host's own OpenAPI spec for their app Screenly.
Finally, we deploy it for real on a DigitalOcean droplet for the controlplane - see https://docs.helix.ml/helix/getting-started/architecture/ - and a $0.35/h A40 on runpod.io. Armed with a real GPU, we can do image inference and fine-tuning as well as all the things described above!