HybridInference API Documentation

OpenRouter-compatible API for accessing multiple LLM models

Get started with HybridInference in minutes. Our API provides seamless access to state-of-the-art language models including Llama 3.3, Llama 4, Gemini, GPT-5, and Claude.

Key Features

Fast & Reliable

Low-latency inference with automatic failover

OpenRouter Compatible

Drop-in replacement for OpenRouter API

Multiple Models

Access Llama, Gemini, GPT, and Claude models

Free Tier Available

Get started with our free tier

Production Ready

Built for scale with monitoring and observability

Getting Started

  1. Get your API key (contact the team)

  2. Install the OpenAI SDK:

    pip install openai
    
  3. Make your first request:

    import openai
    
    client = openai.OpenAI(
        base_url="https://freeinference.org/v1",
        api_key="your-api-key-here"
    )
    
    response = client.chat.completions.create(
        model="llama-3.3-70b-instruct",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    
    print(response.choices[0].message.content)
    

See the Quick Start guide for more details.

Available Models

Model

Context Length

Pricing

Llama 3.3 70B Instruct

131K tokens

Free

Llama 4 Maverick

128K tokens

Free

Gemini 2.5 Flash

1M tokens

Free

GPT-5

128K tokens

Free

See the complete Available Models list for all available models.

Support

Need help? Check out: