Multi-LLM Provider Setup Guide

Overview

The AI routing service now supports three LLM providers:

Anthropic Claude (default)
OpenAI (GPT-4, GPT-3.5)
Ollama (local, free)

Quick Setup

Option 1: Anthropic Claude (Default)

# .env
AI_ROUTING_ENABLED=true
AI_ROUTING_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
AI_ROUTING_MODEL=claude-3-5-sonnet-20241022

Option 2: OpenAI

# .env
AI_ROUTING_ENABLED=true
AI_ROUTING_PROVIDER=openai
OPENAI_API_KEY=sk-...
AI_ROUTING_MODEL=gpt-4-turbo-preview

Option 3: Ollama (Local, Free)

# .env
AI_ROUTING_ENABLED=true
AI_ROUTING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434  # Optional
AI_ROUTING_MODEL=llama2

First, install Ollama:

# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai

# Pull a model
ollama pull llama2

Provider Comparison

Provider	Cost	Speed	Setup	Privacy	Best For
Anthropic	$$	Fast	Easy	Cloud	Production
OpenAI	$$-$$$	Fast	Easy	Cloud	Production
Ollama	Free	Medium	Medium	Local	Development, Privacy

Environment Variables

Required (Provider-Specific)

Anthropic:

ANTHROPIC_API_KEY=sk-ant-...

OpenAI:

OPENAI_API_KEY=sk-...

Ollama:

# No API key needed!
# Just install Ollama and pull a model

Optional

# Provider selection (default: anthropic)
AI_ROUTING_PROVIDER=anthropic  # or 'openai' or 'ollama'

# Model selection (provider-specific defaults)
AI_ROUTING_MODEL=claude-3-5-sonnet-20241022  # Anthropic
AI_ROUTING_MODEL=gpt-4-turbo-preview         # OpenAI
AI_ROUTING_MODEL=llama2                      # Ollama

# Ollama base URL (default: http://localhost:11434)
OLLAMA_BASE_URL=http://localhost:11434

# Timeout (default: 10000ms)
AI_ROUTING_TIMEOUT_MS=10000

Model Selection Guide

Anthropic Models

Model	Cost	Speed	Use Case
`claude-3-5-sonnet-20241022`	$$	Fast	Recommended - Best balance
`claude-3-opus-20240229`	$$$	Medium	Highest quality
`claude-3-haiku-20240307`	$	Very Fast	Fast, cheap

OpenAI Models

Model	Cost	Speed	Use Case
`gpt-4-turbo-preview`	$$$	Fast	Recommended - Best quality
`gpt-4`	$$$	Fast	High quality
`gpt-3.5-turbo`	$	Very Fast	Fast, cheap

Ollama Models

Model	Size	Speed	Use Case
`llama2`	7B	Medium	Recommended - Good balance
`llama2:13b`	13B	Slow	Better accuracy
`mistral`	7B	Fast	Fast, efficient
`codellama`	7B	Medium	Code-focused

Pull models:

ollama pull llama2
ollama pull mistral
ollama pull codellama

Auto-Provider Selection

The system automatically selects an available provider if the requested one is not configured:

// Priority order:
1. Requested provider (AI_ROUTING_PROVIDER)
2. Anthropic (if ANTHROPIC_API_KEY set)
3. OpenAI (if OPENAI_API_KEY set)
4. Ollama (always available, but may fail if not running)

Testing Providers

Test Anthropic

curl -X POST http://localhost:3000/api/v1/route/ai \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "tenant-001",
    "app_id": "app-001",
    "roles": ["developer"],
    "query": "Show network device inventory"
  }'

Test OpenAI

# Set provider
export AI_ROUTING_PROVIDER=openai
export OPENAI_API_KEY=sk-...

# Restart service
npm run dev

# Same API call
curl -X POST http://localhost:3000/api/v1/route/ai ...

Test Ollama

# Start Ollama (if not running)
ollama serve

# Pull model (if not already pulled)
ollama pull llama2

# Set provider
export AI_ROUTING_PROVIDER=ollama

# Restart service
npm run dev

# Same API call
curl -X POST http://localhost:3000/api/v1/route/ai ...

Troubleshooting

Issue: "No LLM providers are configured"

Solution: Configure at least one provider:

Set ANTHROPIC_API_KEY OR
Set OPENAI_API_KEY OR
Install and run Ollama

Issue: "Ollama API error: Connection refused"

Solution:

Make sure Ollama is running: ollama serve
Check OLLAMA_BASE_URL is correct
Verify Ollama is accessible: curl http://localhost:11434/api/tags

Issue: "Ollama model not found"

Solution: Pull the model first:

ollama pull llama2

Issue: Slow responses with Ollama

Solutions:

Use smaller model (llama2:7b instead of llama2:13b)
Use GPU-accelerated Ollama
Increase timeout: AI_ROUTING_TIMEOUT_MS=30000

Cost Optimization

Development

Use Ollama (free, local)
No API costs
Privacy for sensitive data

Production (Low Volume)

Use Claude Haiku or GPT-3.5 (cheaper)
~$0.01 per query

Production (High Volume)

Use deterministic routing (/api/v1/route) for clear queries
Use AI routing only for ambiguous queries
Consider caching LLM responses

Provider-Specific Notes

Anthropic

Excellent reasoning capabilities
Good for Chain of Thought
Fast response times
Requires API key

OpenAI

Widely available
Good performance
Multiple model options
Requires API key

Ollama

Free (runs locally)
Privacy (data stays local)
No rate limits
Requires local setup
Performance depends on hardware
Best for development/testing

Switching Providers

You can switch providers without code changes:

# Switch to OpenAI
export AI_ROUTING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
npm run dev

# Switch to Ollama
export AI_ROUTING_PROVIDER=ollama
npm run dev

# Switch back to Anthropic
export AI_ROUTING_PROVIDER=anthropic
npm run dev

The system automatically detects available providers and uses the best one.

Next Steps

See AI Routing Quick Start for basic usage
See Multi-LLM Provider Support for technical details
See RAG and CoT Implementation for how it works