Problem

I run several LLM backends in my homelab - local models via Ollama plus API access to multiple cloud providers. Without a proxy, every application needs its own configuration for each backend, and adding new models or switching between them means touching multiple configs.

Solution

LiteLLM proxy running in an LXC container on Proxmox. It provides a single OpenAI-compatible API endpoint and handles routing to the appropriate backend based on the model name in the request.

Resource usage is negligible: 0.6% vCPU, 1GB RAM, 3GB Storage.

Backends

Currently routing to: Ollama (local), OpenAI, Anthropic, Google Gemini, MiniMax, xAI, Nvidia NIM, and OpenRouter as a fallback for everything else.

Advantages

Disadvantages