FreeInference Deployment

FastAPI + systemd (current)

We serve OpenRouter-compatible traffic directly through a FastAPI application listening on port 80. Removing Nginx reduces operational overhead, keeps debugging straightforward, and lets systemd own the lifecycle of the gateway process.

Overview

┌─────────────┐      ┌─────────────────┐      ┌────────────────────┐
│  OpenRouter │─────▶│ FastAPI Gateway │─────▶│ Model Executors... │
└─────────────┘      └─────────────────┘      └────────────────────┘

FastAPI binds to 0.0.0.0:80 and exposes /v1 endpoints consumed by OpenRouter clients.
The gateway handles request authentication, routing, and backpressure before invoking the selected model adapter.
systemd supervises the process, ensuring automatic restarts after crashes or host reboots.

Deployment Steps

Install runtime dependencies

Ensure Python environment and model weights are ready. Confirm the FastAPI entry point (serving.servers.bootstrap:app) is reachable via uvicorn or the configured launcher script.

Create the unit file

sudo tee /etc/systemd/system/freeinference.service <<'UNIT'
[Unit]
Description=FreeInference FastAPI service
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/home/ubuntu/hybridInference
ExecStart=/usr/bin/env uvicorn serving.servers.bootstrap:app --host 0.0.0.0 --port 80
Restart=always
RestartSec=5
Environment=PYTHONUNBUFFERED=1

[Install]
WantedBy=multi-user.target
UNIT

Replace User, WorkingDirectory, and Environment entries as needed for the target host. The repository carries a maintained version of this unit at infrastructure/systemd/hybrid_inference.service; copy or symlink it into /etc/systemd/system/freeinference.service during deploys.

Reload and enable the service

sudo systemctl daemon-reload
sudo systemctl enable freeinference.service
sudo systemctl start freeinference.service
sudo systemctl status freeinference.service

Runtime Operations

Restart on demand: sudo systemctl restart freeinference.service
Follow logs: journalctl -u freeinference.service -f
Health check: curl https://freeinference.org/health
List registered models: curl https://freeinference.org/v1/models | jq

Why We Dropped Nginx

FastAPI already terminates HTTP and exposes the required OpenRouter-compatible endpoints.
Nginx added another moving part, increasing failover complexity and opaque error handling.
Debugging latency or request routing is simpler when traffic is handled in a single process.

Legacy Architectures

Nginx (v2, abandoned)

We briefly fronted FastAPI (running on port 8080) with vanilla Nginx that exposed http://freeinference.org on port 80 and terminated TLS for the public endpoint. Once Cloudflare took over edge SSL duties, the extra hop mostly added deployment and observability complexity without material benefit, so the setup was removed.

Nginx + Lua via OpenResty (v1, abandoned)

We previously relied on OpenResty (Nginx + Lua) to provide a production routing tier across multiple LLM backends. The stack handled model mapping, load balancing, health checks, and error handling. We keep the installation notes for posterity.

Overview

┌─────────────┐      ┌──────────────────┐      ┌─────────────────┐
│   Client    │─────▶│  OpenResty       │─────▶│  Backend 1      │
│  (API Call) │      │  (Router)        │      │  (Qwen@8000)    │
└─────────────┘      │                  │      └─────────────────┘
                     │  - Model Mapping │
                     │  - Load Balancing│      ┌─────────────────┐
                     │  - Health Checks │─────▶│  Backend 2      │
                     │  - Error Handling│      │  (Llama@8001)   │
                     └──────────────────┘      └─────────────────┘

Installation Notes

# Add repository
wget -O - https://openresty.org/package/pubkey.gpg | sudo apt-key add -
echo "deb http://openresty.org/package/ubuntu $(lsb_release -sc) main" | \
    sudo tee /etc/apt/sources.list.d/openresty.list

# Install
sudo apt-get update
sudo apt-get install openresty

# Create directory
sudo mkdir -p /usr/local/openresty/nginx/conf/sites-available
sudo mkdir -p /usr/local/openresty/nginx/conf/sites-enabled

# Copy Config file
sudo cp <your config file> /usr/local/openresty/nginx/conf/sites-available/vllm

# Enable the site
sudo ln -s /usr/local/openresty/nginx/conf/sites-available/vllm \
           /usr/local/openresty/nginx/conf/sites-enabled/vllm

http {
    # ... Others ...

    # Lua settings
    lua_package_path "/usr/local/openresty/lualib/?.lua;;";
    lua_shared_dict model_cache 10m;

    # Include Site Configuration
    include /usr/local/openresty/nginx/conf/sites-enabled/*;
}

# test openresty config
sudo openresty -t

# Start
sudo systemctl start openresty

# Enable auto-start
sudo systemctl enable openresty

# reload openresty
sudo openresty -s reload

# check service status
curl https://freeinference.org/health

# list all models
curl https://freeinference.org/v1/models | jq

# Chat with Qwen3-Coder
curl -X POST http://freeinference.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "/models/Qwen_Qwen3-Coder-480B-A35B-Instruct-FP8", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 50}'

# Chat with llama4-scout
curl -X POST http://freeinference.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "/models/meta-llama_Llama-4-Scout-17B-16E", "messages": [{"role": "user", "content": "Hi"}], "max_tokens": 50}'

Nginx (v0, abandoned)

sudo vim /etc/nginx/sites-available/vllm
sudo nginx -t
sudo systemctl reload nginx

# to test the endpoint
curl https://freeinference.org/v1/models