Running LiteLLM proxy on Proxmox

Problem

I run several LLM backends in my homelab - local models via Ollama plus API access to multiple cloud providers. Without a proxy, every application needs its own configuration for each backend, and adding new models or switching between them means touching multiple configs.

Solution

LiteLLM proxy running in an LXC container on Proxmox. It provides a single OpenAI-compatible API endpoint and handles routing to the appropriate backend based on the model name in the request.

Resource usage is negligible: 0.6% vCPU, 1GB RAM, 3GB Storage.

Backends

Currently routing to: Ollama (local), OpenAI, Anthropic, Google Gemini, MiniMax, xAI, Nvidia NIM, and OpenRouter as a fallback for everything else.

Advantages

Single endpoint for everything. Most tools support OpenAI-compatible APIs. Point them at the internal endpoint, set an API key, done. No per-tool configuration when you add a new model.
Caching via Redis. Identical requests return cached responses. Useful for repeated prompts in automation workflows - cuts latency and cost.
Observability. Unified logging across all backends - token usage, latency, cost per request, model breakdown. One dashboard instead of checking each provider separately.
Access control. Virtual API keys map to specific models or budgets. Different keys for different applications, revoke individually without touching other tools.

Disadvantages

Single point of failure. If the proxy goes down, everything goes down. Worth setting up a health check.
Security surface. The proxy holds all your provider API keys in one place. If the LXC is compromised, all keys are compromised. Keep it in isolated VLAN, not exposed to the internet.
Debugging gets harder. Errors from upstream providers get wrapped in LiteLLM’s response format. Sometimes you need to check raw logs to figure out whether the issue is the proxy config or the provider.
Redis adds complexity. Caching is useful but now you have another service to maintain. Cache invalidation and stale responses can cause subtle bugs in stateful workflows.