5 months ago

Fri Jan 3, 2025 4:04pm PST

Ask HN: Do you think Python will disappear for LLM inference?

For non-LLMs, we have a high variety of frameworks with Python interfaces for running models, often built to call C bindings. This makes it impractical to run models directly from languages like Rust—you’d need to implement an idiomatic layer for each model runtime, something already done for Python. Nvidia's Triton covers a lot, but is it even designed for embedded use? And how feasible is adding custom logic?

For LLMs, it feels like the opposite. There’s a smaller set of frameworks (e.g., llama.cpp, vllm), each supporting a wide range of models. This makes it relatively straightforward to integrate them into other languages like Go, as you only need to maintain a few idiomatic layers.

To me, it’s a no-brainer that Go or Rust will replace Python for serving LLMs. They’re CPU-intensive, Python is generally slow, and the limited number of LLM runtimes simplifies the transition.

comments:

add comment

loading comments...