HybridInference

Developer Guide:

  • Installation
    • Production (Docker)
    • Development Setup
      • System Requirements
      • Using uv (Recommended)
      • Using conda
    • Configuration
      • Environment Variables
    • Verification
    • Documentation Structure
    • Troubleshooting
  • Deployment Guide
    • Quick Start (Docker)
    • Prerequisites
    • Service Architecture
    • Common Operations
    • Configuration
      • Environment Variables
      • Local GPU Endpoints
    • Nginx and HTTPS
      • Cloudflare
    • Monitoring
      • Health Checks
      • Alerting
    • Database
    • Troubleshooting
      • Service won’t start
      • Rebuild after code changes
      • Full reset (preserves data)
      • Full reset (destroy data)
  • Architecture Overview
    • System Architecture
    • Network Layer
    • Core Components
      • Serving Layer (serving/)
      • Routing Layer (routing/)
      • Configuration (config/)
      • Deploy (deploy/)
    • Key Design Principles
    • Data Flow
  • Hybrid Inference Routing System
    • Architecture
      • Decision Layer (routing/manager.py + routing/strategies/weight.py)
      • Execution Layer (routing/routers.py)
    • Features
    • Configuration
      • Required Files
      • Optional Files
      • Example Configuration (60/40 split):
    • Running the Server
    • API Endpoints
    • Extending the System
      • Adding New Strategies
      • RouteWise Strategy
      • Health Monitoring
    • Session affinity
    • Migration Notes
  • Adding a New Local Model
    • Overview
    • Private Server (No Public Internet)
    • Step 1: Start the Local Model Server
    • Step 2: Add the Model to config/models.yaml
    • Step 3: Add Optional Remote Fallbacks
    • Step 4: Restart the Gateway
    • Step 5: Verify Through HybridInference
    • Routing Notes
    • Troubleshooting
      • Model Does Not Appear in /v1/models
      • Gateway Cannot Reach the Local Server
      • Requests Fail After Registration
  • Adding a New Model (OpenRouter-Compatible)
    • Overview
    • Quick Start
      • Adding a Model with an Existing Provider
    • Adding a New Provider
      • Step 1: Create Provider Adapter
      • Step 2: Register the Adapter
      • Step 3: Add Model Configuration
      • Step 4: Configure Environment Variables
      • Step 5: Test the Integration
    • Configuration Reference
      • ModelConfig Fields
      • Route Configuration
      • Supported Adapter Kinds
      • Hybrid Routing
    • BaseAdapter API Reference
      • Required Methods
      • Utility Methods
      • Available Attributes
    • Advanced Features
      • Multi-Modal Support
      • Tool/Function Calling
      • Structured Output (JSON Mode)
      • Rate Limiting (Optional)
    • Examples
      • Example 1: OpenAI-Compatible Provider
      • Example 2: Custom API Format
      • Example 3: Local Deployment
    • Troubleshooting
      • Model Not Appearing in /v1/models
      • Authentication Failures
      • Response Format Errors
      • Streaming Issues
    • Best Practices
    • See Also
  • Configuration Guide
    • 1. Environment Variables and Priority
      • Variable Substitution
      • Configuration Priority
    • 2. models.yaml (Required)
      • Key Points:
    • 3. routing.yaml (Optional)
      • Fixed-Ratio Strategy
      • Health Checking (Optional)
      • Example: Hybrid Deployment (60% local / 40% remote)
      • How It Works:
      • Local-Only Deployment
    • 4. Running the System
      • Set Environment Variables:
      • Start the Server:
      • Verify Operation:
    • 5. FAQ
  • PostgreSQL Admin Playbook
    • Environment Variables
    • Restart Admin Stack
    • Access pgAdmin Securely
    • Register Primary Database
    • Post-Restart Checks
  • OpenRouter-Compatible API Gateway
    • Architecture
      • Key Components
    • Features
    • Development Setup
      • Prerequisites
      • Create Environment
      • Local Environment Variables
      • Run Locally
      • Quick Checks
    • Production Deployment
    • API Surface
      • Example Requests
    • Logging and Metrics
    • Testing
    • Troubleshooting
    • Related Docs
  • FreeInference Deployment
    • Cloudflare + Nginx + FastAPI (current)
      • Deployment
      • Runtime Operations
      • Why Nginx Is Back
    • Legacy Architectures
      • FastAPI direct (v3, abandoned)
      • Nginx (v2, abandoned)
      • Nginx + Lua via OpenResty (v1, abandoned)
        • Overview
        • Installation Notes
      • Nginx (v0, abandoned)
  • Claude Code Setup
    • Quick Setup (macOS / Linux)
    • Manual Setup
    • Available Models
    • Usage
    • Troubleshooting
    • Uninstall
  • FASRC Deployment
    • docker
  • Contributing
    • Development Setup
    • Code Quality Standards
      • Pre-commit Hooks
    • Development Workflow
    • Testing
    • Documentation
    • Pull Request Guidelines
  • Staging Guide
    • Why this staging shape
    • Default ports
    • First-time server bootstrap
    • Required env for a typical staging run
    • Start staging
    • SSH forwarding
    • Common operations
    • Notes for model monitor
HybridInference
  • Search


© Copyright 2025-2026, Harvard System Lab.

Built with Sphinx using a theme provided by Read the Docs.