Code Examples

Essential examples for using the HybridInference API.

Installation

Python:

pip install openai

JavaScript/TypeScript:

npm install openai

Basic Chat Completion

Python

import openai

client = openai.OpenAI(
    base_url="https://freeinference.org/v1",
    api_key="your-api-key-here"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain machine learning in simple terms"}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)

JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://freeinference.org/v1',
  apiKey: 'your-api-key-here',
});

const response = await client.chat.completions.create({
  model: 'llama-3.3-70b-instruct',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain machine learning in simple terms' }
  ],
  temperature: 0.7,
  max_tokens: 1000
});

console.log(response.choices[0].message.content);

curl

curl https://freeinference.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain machine learning in simple terms"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'

Streaming Responses

Stream responses in real-time for better user experience.

Python

import openai

client = openai.OpenAI(
    base_url="https://freeinference.org/v1",
    api_key="your-api-key-here"
)

stream = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[
        {"role": "user", "content": "Write a short story about a robot"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

JavaScript

const stream = await client.chat.completions.create({
  model: 'llama-3.3-70b-instruct',
  messages: [
    { role: 'user', content: 'Write a short story about a robot' }
  ],
  stream: true
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

curl

cURL https://freeinference.org/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Write a short story about a robot"}],
    "stream": true
  }' \
  --no-buffer

Function Calling

Enable the model to call functions you define.

Python

import openai
import json

client = openai.OpenAI(
    base_url="https://freeinference.org/v1",
    api_key="your-api-key-here"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. San Francisco"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# Check if model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)

    print(f"Function to call: {function_name}")
    print(f"Arguments: {function_args}")

    # Here you would call your actual function
    # result = get_weather(**function_args)

JavaScript

const tools = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get the current weather in a location',
      parameters: {
        type: 'object',
        properties: {
          location: {
            type: 'string',
            description: 'City name, e.g. San Francisco'
          },
          unit: {
            type: 'string',
            enum: ['celsius', 'fahrenheit']
          }
        },
        required: ['location']
      }
    }
  }
];

const response = await client.chat.completions.create({
  model: 'llama-3.3-70b-instruct',
  messages: [
    { role: 'user', content: "What's the weather in Paris?" }
  ],
  tools: tools,
  tool_choice: 'auto'
});

if (response.choices[0].message.tool_calls) {
  const toolCall = response.choices[0].message.tool_calls[0];
  const functionName = toolCall.function.name;
  const functionArgs = JSON.parse(toolCall.function.arguments);

  console.log(`Function to call: ${functionName}`);
  console.log(`Arguments:`, functionArgs);

  // Here you would call your actual function
  // const result = await getWeather(functionArgs);
}

Structured Output (JSON Mode)

Force the model to respond with valid JSON.

Python

import openai
import json

client = openai.OpenAI(
    base_url="https://freeinference.org/v1",
    api_key="your-api-key-here"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[
        {
            "role": "user",
            "content": "Extract the name, age, and occupation from: John is a 30-year-old software engineer. Return as JSON."
        }
    ],
    response_format={"type": "json_object"}
)

# Parse the JSON response
content = response.choices[0].message.content

# Note: Response may be wrapped in markdown code blocks
# Handle both pure JSON and markdown-wrapped JSON
try:
    result = json.loads(content)
except json.JSONDecodeError:
    # Extract from markdown if needed
    if "```json" in content:
        start = content.find("```json") + 7
        end = content.find("```", start)
        content = content[start:end].strip()
        result = json.loads(content)

print(result)
# Output: {'name': 'John', 'age': 30, 'occupation': 'software engineer'}

JavaScript

const response = await client.chat.completions.create({
  model: 'llama-3.3-70b-instruct',
  messages: [
    {
      role: 'user',
      content: 'Extract the name, age, and occupation from: John is a 30-year-old software engineer. Return as JSON.'
    }
  ],
  response_format: { type: 'json_object' }
});

let content = response.choices[0].message.content;

// Handle markdown-wrapped JSON
try {
  const result = JSON.parse(content);
  console.log(result);
} catch (error) {
  if (content.includes('```json')) {
    const start = content.indexOf('```json') + 7;
    const end = content.indexOf('```', start);
    content = content.substring(start, end).trim();
    const result = JSON.parse(content);
    console.log(result);
  }
}

Tips

Temperature Settings

  • 0.0 - 0.3: Deterministic, focused (good for factual tasks, code generation)

  • 0.7: Balanced (general use)

  • 0.9 - 1.5: Creative, diverse (good for storytelling, brainstorming)

Max Tokens

Always set max_tokens to control response length and costs:

response = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Summarize this article..."}],
    max_tokens=500  # Limit response length
)

Choosing Models

  • llama-3.3-70b-instruct: Best for general tasks, long context

  • llama-4-scout: Fastest inference

  • gemini-2.5-flash: Multimodal, high throughput

  • glm-4.5: Chinese language support

See the Models page for full details.