For regular dictation, hold the right Ctrl key while speaking, then release to have your words typed automatically wherever your cursor is - perfect for coding, emails, or chat apps.
The more interesting feature is the AI command mode: hold the Scroll Lock key, speak a prompt, and a local LLM responds based on both your words AND a screenshot of what you're looking at. The AI's response gets typed directly into your application as if you typed it yourself.
Everything runs locally using Whisper for transcription and Ollama for the LLM (I recommend gemma3:27b for best results). No cloud services or API costs.
The tool was inspired by Karpathy's "vibe coding" and builds upon Vlad's whisper-keyboard project. I extended it to work with local models and added the screenshot context feature, which makes the AI much more useful for everyday tasks like writing e-mails.
Let me know what you think!