This article is the result of research I conducted between April and June 2025. The focus is on solutions from the EU (and Germany in particular).
Open-source large language models (also called “open-weight” models) have emerged as viable alternatives to proprietary solutions (ChatGPT, Claude), providing organisations with more control over their AI infrastructure and data sovereignty. Leading examples include models like DeepSeek, Qwen, Llama and Mistral.
From Local Development to Production
When first exploring open-source LLMs, most developers start with tools like Ollama, which excels for local development, where you run and access the LLM on the same machine (your local computer). Ollama is perfect for experimentation and prototyping, but it’s designed for single-user, local scenarios.
What if you want to build an application that allows users to access the LLM over the internet? What happens when you need to serve hundreds or thousands of users simultaneously?
Note on terminology: I noticed people say “local” when they mean self-hosting. I use the term “local” to describe offline use on your computer, while “self-hosting” means hosting on your infrastructure with internet access. Also, this guide covers inference (querying trained models) not training or…
