Sitemap
Press enter or click to view image in full size

How to host an open source LLM

6 min readOct 24, 2025

--

This article is the result of research I conducted between April and June 2025. The focus is on solutions from the EU (and Germany in particular).

Open-source large language models (also called “open-weight” models) have emerged as viable alternatives to proprietary solutions (ChatGPT, Claude), providing organisations with more control over their AI infrastructure and data sovereignty. Leading examples include models like DeepSeek, Qwen, Llama and Mistral.

From Local Development to Production

When first exploring open-source LLMs, most developers start with tools like Ollama, which excels for local development, where you run and access the LLM on the same machine (your local computer). Ollama is perfect for experimentation and prototyping, but it’s designed for single-user, local scenarios.

What if you want to build an application that allows users to access the LLM over the internet? What happens when you need to serve hundreds or thousands of users simultaneously?

Note on terminology: I noticed people say “local” when they mean self-hosting. I use the term “local” to describe offline use on your computer, while “self-hosting” means hosting on your infrastructure with internet access. Also, this guide covers inference (querying trained models) not training or…

--

--

Alexa Steinbrück
Alexa Steinbrück

Written by Alexa Steinbrück

A mix of web development and critical AI discourse. I love dogs and dictionaries!

No responses yet