Multi-LLM Provider Setup Guide
Multi-LLM Provider Setup Guide
Overview
The AI routing service now supports three LLM providers:
- Anthropic Claude (default)
- OpenAI (GPT-4, GPT-3.5)
- Ollama (local, free)
Quick Setup
Option 1: Anthropic Claude (Default)
# .env
AI_ROUTING_ENABLED=true
AI_ROUTING_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
AI_ROUTING_MODEL=claude-3-5-sonnet-20241022
Option 2: OpenAI
# .env
AI_ROUTING_ENABLED=true
AI_ROUTING_PROVIDER=openai
OPENAI_API_KEY=sk-...
AI_ROUTING_MODEL=gpt-4-turbo-preview
Option 3: Ollama (Local, Free)
# .env
AI_ROUTING_ENABLED=true
AI_ROUTING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434 # Optional
AI_ROUTING_MODEL=llama2
First, install Ollama:
# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Or download from https://ollama.ai
# Pull a model
ollama pull llama2
Provider Comparison
| Provider | Cost | Speed | Setup | Privacy | Best For |
|---|---|---|---|---|---|
| Anthropic | $$ | Fast | Easy | Cloud | Production |
| OpenAI | $$-$$$ | Fast | Easy | Cloud | Production |
| Ollama | Free | Medium | Medium | Local | Development, Privacy |
Environment Variables
Required (Provider-Specific)
Anthropic:
ANTHROPIC_API_KEY=sk-ant-...
OpenAI:
OPENAI_API_KEY=sk-...
Ollama:
# No API key needed!
# Just install Ollama and pull a model
Optional
# Provider selection (default: anthropic)
AI_ROUTING_PROVIDER=anthropic # or 'openai' or 'ollama'
# Model selection (provider-specific defaults)
AI_ROUTING_MODEL=claude-3-5-sonnet-20241022 # Anthropic
AI_ROUTING_MODEL=gpt-4-turbo-preview # OpenAI
AI_ROUTING_MODEL=llama2 # Ollama
# Ollama base URL (default: http://localhost:11434)
OLLAMA_BASE_URL=http://localhost:11434
# Timeout (default: 10000ms)
AI_ROUTING_TIMEOUT_MS=10000
Model Selection Guide
Anthropic Models
| Model | Cost | Speed | Use Case |
|---|---|---|---|
claude-3-5-sonnet-20241022 | $$ | Fast | Recommended - Best balance |
claude-3-opus-20240229 | $$$ | Medium | Highest quality |
claude-3-haiku-20240307 | $ | Very Fast | Fast, cheap |
OpenAI Models
| Model | Cost | Speed | Use Case |
|---|---|---|---|
gpt-4-turbo-preview | $$$ | Fast | Recommended - Best quality |
gpt-4 | $$$ | Fast | High quality |
gpt-3.5-turbo | $ | Very Fast | Fast, cheap |
Ollama Models
| Model | Size | Speed | Use Case |
|---|---|---|---|
llama2 | 7B | Medium | Recommended - Good balance |
llama2:13b | 13B | Slow | Better accuracy |
mistral | 7B | Fast | Fast, efficient |
codellama | 7B | Medium | Code-focused |
Pull models:
ollama pull llama2
ollama pull mistral
ollama pull codellama
Auto-Provider Selection
The system automatically selects an available provider if the requested one is not configured:
// Priority order:
1. Requested provider (AI_ROUTING_PROVIDER)
2. Anthropic (if ANTHROPIC_API_KEY set)
3. OpenAI (if OPENAI_API_KEY set)
4. Ollama (always available, but may fail if not running)
Testing Providers
Test Anthropic
curl -X POST http://localhost:3000/api/v1/route/ai \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "tenant-001",
"app_id": "app-001",
"roles": ["developer"],
"query": "Show network device inventory"
}'
Test OpenAI
# Set provider
export AI_ROUTING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
# Restart service
npm run dev
# Same API call
curl -X POST http://localhost:3000/api/v1/route/ai ...
Test Ollama
# Start Ollama (if not running)
ollama serve
# Pull model (if not already pulled)
ollama pull llama2
# Set provider
export AI_ROUTING_PROVIDER=ollama
# Restart service
npm run dev
# Same API call
curl -X POST http://localhost:3000/api/v1/route/ai ...
Troubleshooting
Issue: "No LLM providers are configured"
Solution: Configure at least one provider:
- Set
ANTHROPIC_API_KEYOR - Set
OPENAI_API_KEYOR - Install and run Ollama
Issue: "Ollama API error: Connection refused"
Solution:
- Make sure Ollama is running:
ollama serve - Check
OLLAMA_BASE_URLis correct - Verify Ollama is accessible:
curl http://localhost:11434/api/tags
Issue: "Ollama model not found"
Solution: Pull the model first:
ollama pull llama2
Issue: Slow responses with Ollama
Solutions:
- Use smaller model (llama2:7b instead of llama2:13b)
- Use GPU-accelerated Ollama
- Increase timeout:
AI_ROUTING_TIMEOUT_MS=30000
Cost Optimization
Development
- Use Ollama (free, local)
- No API costs
- Privacy for sensitive data
Production (Low Volume)
- Use Claude Haiku or GPT-3.5 (cheaper)
- ~$0.01 per query
Production (High Volume)
- Use deterministic routing (
/api/v1/route) for clear queries - Use AI routing only for ambiguous queries
- Consider caching LLM responses
Provider-Specific Notes
Anthropic
- Excellent reasoning capabilities
- Good for Chain of Thought
- Fast response times
- Requires API key
OpenAI
- Widely available
- Good performance
- Multiple model options
- Requires API key
Ollama
- Free (runs locally)
- Privacy (data stays local)
- No rate limits
- Requires local setup
- Performance depends on hardware
- Best for development/testing
Switching Providers
You can switch providers without code changes:
# Switch to OpenAI
export AI_ROUTING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
npm run dev
# Switch to Ollama
export AI_ROUTING_PROVIDER=ollama
npm run dev
# Switch back to Anthropic
export AI_ROUTING_PROVIDER=anthropic
npm run dev
The system automatically detects available providers and uses the best one.
Next Steps
- See AI Routing Quick Start for basic usage
- See Multi-LLM Provider Support for technical details
- See RAG and CoT Implementation for how it works