Multi-LLM Provider Support Explanation
Multi-LLM Provider Support Explanation
Current Implementation
The AI routing orchestrator currently uses Anthropic Claude exclusively:
// Current implementation (hardcoded to Anthropic)
private anthropic: Anthropic;
private model: string;
private constructor() {
this.anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
this.model = process.env.AI_ROUTING_MODEL || 'claude-3-5-sonnet-20241022';
}
// LLM call
const response = await this.anthropic.messages.create({
model: this.model,
max_tokens: 2000,
temperature: 0,
messages: [{ role: 'user', content: prompt }],
});
Can It Support OpenAI or Ollama?
Yes, absolutely! The architecture can be extended to support multiple LLM providers. Here's how:
Architecture for Multi-Provider Support
Option 1: Provider Abstraction Layer (Recommended)
Create an abstraction layer that supports multiple providers:
// Abstract LLM interface
interface LLMProvider {
invoke(prompt: string, options: LLMOptions): Promise<string>;
}
// Provider implementations
class AnthropicProvider implements LLMProvider {
async invoke(prompt: string, options: LLMOptions): Promise<string> {
const response = await this.anthropic.messages.create({
model: options.model,
max_tokens: options.maxTokens,
temperature: options.temperature,
messages: [{ role: 'user', content: prompt }],
});
return response.content[0].text;
}
}
class OpenAIProvider implements LLMProvider {
async invoke(prompt: string, options: LLMOptions): Promise<string> {
const response = await this.openai.chat.completions.create({
model: options.model,
max_tokens: options.maxTokens,
temperature: options.temperature,
messages: [{ role: 'user', content: prompt }],
});
return response.choices[0].message.content;
}
}
class OllamaProvider implements LLMProvider {
async invoke(prompt: string, options: LLMOptions): Promise<string> {
const response = await fetch('http://localhost:11434/api/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: options.model,
prompt: prompt,
stream: false,
}),
});
const data = await response.json();
return data.response;
}
}
Option 2: Unified SDK (LangChain/LangSmith)
Use LangChain which provides a unified interface:
import { ChatOpenAI } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';
import { ChatOllama } from '@langchain/community/chat_models/ollama';
// Unified interface
const provider = process.env.AI_ROUTING_PROVIDER || 'anthropic';
let llm;
switch (provider) {
case 'openai':
llm = new ChatOpenAI({
modelName: process.env.AI_ROUTING_MODEL || 'gpt-4-turbo-preview',
temperature: 0,
});
break;
case 'anthropic':
llm = new ChatAnthropic({
modelName: process.env.AI_ROUTING_MODEL || 'claude-3-5-sonnet-20241022',
temperature: 0,
});
break;
case 'ollama':
llm = new ChatOllama({
model: process.env.AI_ROUTING_MODEL || 'llama2',
baseUrl: process.env.OLLAMA_BASE_URL || 'http://localhost:11434',
});
break;
}
// Use unified interface
const response = await llm.invoke(prompt);
Provider-Specific Considerations
1. OpenAI
API Format:
// OpenAI uses different message format
messages: [
{ role: 'system', content: '...' },
{ role: 'user', content: '...' }
]
// Response format
response.choices[0].message.content
Models:
gpt-4-turbo-preview- Best accuracy, higher costgpt-4- Good balancegpt-3.5-turbo- Fast, cheaper
Advantages:
- ✅ Widely available
- ✅ Good performance
- ✅ Well-documented
- ✅ Fast response times
Disadvantages:
- ❌ Higher cost than local models
- ❌ Requires API key
- ❌ Rate limits
Cost: ~0.03-0.10 (GPT-4)
2. Ollama (Local)
API Format:
// Ollama uses REST API (not SDK)
POST http://localhost:11434/api/generate
{
"model": "llama2",
"prompt": "...",
"stream": false
}
// Response format
{ "response": "..." }
Models:
llama2- Good general purposellama2:13b- Better accuracymistral- Fast, efficientcodellama- Code-focused
Advantages:
- ✅ Free (runs locally)
- ✅ No API costs
- ✅ Privacy (data stays local)
- ✅ No rate limits
- ✅ Offline capable
Disadvantages:
- ❌ Requires local setup
- ❌ Lower accuracy than cloud models
- ❌ Slower (depends on hardware)
- ❌ Requires GPU for good performance
Cost: $0 (but requires hardware)
3. Anthropic Claude (Current)
API Format:
// Current implementation
messages: [{ role: 'user', content: '...' }]
// Response format
response.content[0].text
Models:
claude-3-5-sonnet-20241022- Best accuracyclaude-3-opus-20240229- Highest qualityclaude-3-haiku-20240307- Fast, cheap
Advantages:
- ✅ Excellent reasoning
- ✅ Good for CoT
- ✅ Fast response times
Disadvantages:
- ❌ Higher cost than local
- ❌ Requires API key
Cost: ~$0.01-0.05 per query
Implementation Strategy
Step 1: Create Provider Factory
// src/services/llm-provider-factory.ts
export class LLMProviderFactory {
static create(provider: string): LLMProvider {
switch (provider) {
case 'openai':
return new OpenAIProvider();
case 'anthropic':
return new AnthropicProvider();
case 'ollama':
return new OllamaProvider();
default:
throw new Error(`Unsupported LLM provider: ${provider}`);
}
}
}
Step 2: Update AI Routing Orchestrator
// src/services/ai-routing-orchestrator.service.ts
export class AIRoutingOrchestrator {
private llmProvider: LLMProvider;
private provider: string;
private model: string;
private constructor() {
this.provider = process.env.AI_ROUTING_PROVIDER || 'anthropic';
this.model = process.env.AI_ROUTING_MODEL || this.getDefaultModel();
this.llmProvider = LLMProviderFactory.create(this.provider);
}
private getDefaultModel(): string {
switch (this.provider) {
case 'openai':
return 'gpt-4-turbo-preview';
case 'anthropic':
return 'claude-3-5-sonnet-20241022';
case 'ollama':
return 'llama2';
default:
return 'claude-3-5-sonnet-20241022';
}
}
private async cotReasoning(...): Promise<...> {
// Build prompt (same for all providers)
const prompt = this.buildCotPrompt(...);
// Use provider abstraction
const response = await this.llmProvider.invoke(prompt, {
model: this.model,
maxTokens: 2000,
temperature: 0,
});
return this.parseCotResponse(response, candidates);
}
}
Step 3: Environment Variables
# Provider selection
AI_ROUTING_PROVIDER=anthropic # or 'openai' or 'ollama'
# Provider-specific API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
# Provider-specific models
AI_ROUTING_MODEL=claude-3-5-sonnet-20241022 # or gpt-4-turbo-preview or llama2
# Ollama-specific
OLLAMA_BASE_URL=http://localhost:11434
Comparison Table
| Provider | Cost | Speed | Accuracy | Setup | Privacy |
|---|---|---|---|---|---|
| Anthropic Claude | $$ | Fast | Excellent | Easy | Cloud |
| OpenAI GPT-4 | $$$ | Fast | Excellent | Easy | Cloud |
| OpenAI GPT-3.5 | $ | Very Fast | Good | Easy | Cloud |
| Ollama (Local) | Free | Medium | Good* | Medium | Local |
*Accuracy depends on model and hardware
Recommended Approach
For Production (Cloud)
Option A: Multi-Provider Support
- Support OpenAI and Anthropic
- Let users choose based on cost/performance
- Default to Anthropic (current)
Option B: Provider Selection Logic
// Auto-select based on availability
if (process.env.OPENAI_API_KEY && !process.env.ANTHROPIC_API_KEY) {
provider = 'openai';
} else if (process.env.ANTHROPIC_API_KEY) {
provider = 'anthropic';
} else {
throw new Error('No LLM provider configured');
}
For Development/Testing (Local)
Use Ollama:
- Free to test
- No API costs
- Privacy for sensitive data
- Good for development
Hybrid Approach
// Use Ollama for development, cloud for production
const provider = process.env.NODE_ENV === 'production'
? (process.env.AI_ROUTING_PROVIDER || 'anthropic')
: 'ollama';
Code Changes Required
Minimal Changes (Provider Abstraction)
- Create
LLMProviderinterface (~50 lines) - Implement providers (~100 lines each)
- Update
AIRoutingOrchestrator(~20 lines) - Add environment variables (documentation)
Dependencies Needed
For OpenAI:
npm install openai
For Ollama:
# No SDK needed - uses fetch/axios
# But can use:
npm install ollama # Optional official SDK
For LangChain (Unified):
npm install @langchain/openai @langchain/anthropic @langchain/community
Example: Using OpenAI
// OpenAI provider implementation
import OpenAI from 'openai';
class OpenAIProvider implements LLMProvider {
private client: OpenAI;
constructor() {
this.client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
}
async invoke(prompt: string, options: LLMOptions): Promise<string> {
const response = await this.client.chat.completions.create({
model: options.model || 'gpt-4-turbo-preview',
messages: [
{
role: 'user',
content: prompt,
},
],
max_tokens: options.maxTokens || 2000,
temperature: options.temperature || 0,
});
return response.choices[0].message.content || '';
}
}
Example: Using Ollama
// Ollama provider implementation
class OllamaProvider implements LLMProvider {
private baseUrl: string;
constructor() {
this.baseUrl = process.env.OLLAMA_BASE_URL || 'http://localhost:11434';
}
async invoke(prompt: string, options: LLMOptions): Promise<string> {
const response = await fetch(`${this.baseUrl}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: options.model || 'llama2',
prompt: prompt,
stream: false,
options: {
temperature: options.temperature || 0,
num_predict: options.maxTokens || 2000,
},
}),
});
if (!response.ok) {
throw new Error(`Ollama API error: ${response.statusText}`);
}
const data = await response.json();
return data.response;
}
}
Benefits of Multi-Provider Support
- Flexibility: Choose provider based on needs
- Cost Optimization: Use cheaper providers when appropriate
- Resilience: Fallback if one provider fails
- Privacy: Use local Ollama for sensitive data
- Development: Test locally without API costs
Current Limitations
The current implementation is hardcoded to Anthropic:
- Direct SDK usage: Uses
@anthropic-ai/sdkdirectly - No abstraction: LLM calls are embedded in
cotReasoning() - Single provider: Only supports Anthropic
To support multiple providers, you need to:
- Extract LLM calls into provider abstraction
- Support different API formats
- Handle provider-specific configurations
- Add provider selection logic
Recommendation
Yes, it can and should support multiple providers! The current architecture makes it straightforward to add:
- Short-term: Add OpenAI support (similar API to Anthropic)
- Medium-term: Add Ollama support (for local development)
- Long-term: Use LangChain for unified interface
The prompt format and reasoning logic are provider-agnostic, so only the API call layer needs to change.
Summary
✅ Can use OpenAI: Yes, with provider abstraction
✅ Can use Ollama: Yes, with REST API integration
✅ Current limitation: Hardcoded to Anthropic
✅ Solution: Create provider abstraction layer
✅ Effort: ~200-300 lines of code
✅ Benefit: Flexibility, cost optimization, privacy options
The implementation is well-structured for this extension - the CoT reasoning logic is separate from the LLM API calls, making it easy to swap providers.