HybridInference API Documentation

OpenRouter-compatible API for accessing multiple LLM models

Get started with HybridInference in minutes. Our API provides seamless access to state-of-the-art language models including Llama 3.3, Llama 4, Gemini, GPT-5, and Claude.

Quick Links

Quick Start - Get started in 5 minutes
Available Models - View available models
Code Examples - Code examples in Python, JavaScript, and more
IDE & Coding Agent Integrations - Configure with Cursor, Windsurf, and other coding agents
API Reference - Complete API reference

Key Features

Fast & Reliable: Low-latency inference with automatic failover
OpenRouter Compatible: Drop-in replacement for OpenRouter API
Multiple Models: Access Llama, Gemini, GPT, and Claude models
Production Ready: Built for scale with monitoring and observability

Getting Started

Get your API key (contact the team)
Install the OpenAI SDK:
```
pip install openai
```

Make your first request:

import openai

client = openai.OpenAI(
    base_url="https://freeinference.org/v1",
    api_key="your-api-key-here"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

See the Quick Start guide for more details.

Available Models

Model	Context Length	Pricing
Llama 3.3 70B Instruct	131K tokens	Free
Llama 4 Maverick	128K tokens	Free
Gemini 2.5 Flash	1M tokens	Free
GPT-5	128K tokens	Free

See the complete Available Models list for all available models.

Support

Need help? Check out:

Code Examples - Code examples
API Reference - API documentation
GitHub Issues - Report bugs or request features