Multi-LLM Provider Support Explanation

Current Implementation

The AI routing orchestrator currently uses Anthropic Claude exclusively:

// Current implementation (hardcoded to Anthropic)
private anthropic: Anthropic;
private model: string;

private constructor() {
  this.anthropic = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY,
  });
  this.model = process.env.AI_ROUTING_MODEL || 'claude-3-5-sonnet-20241022';
}

// LLM call
const response = await this.anthropic.messages.create({
  model: this.model,
  max_tokens: 2000,
  temperature: 0,
  messages: [{ role: 'user', content: prompt }],
});

Can It Support OpenAI or Ollama?

Yes, absolutely! The architecture can be extended to support multiple LLM providers. Here's how:

Architecture for Multi-Provider Support

Option 1: Provider Abstraction Layer (Recommended)

Create an abstraction layer that supports multiple providers:

// Abstract LLM interface
interface LLMProvider {
  invoke(prompt: string, options: LLMOptions): Promise<string>;
}

// Provider implementations
class AnthropicProvider implements LLMProvider {
  async invoke(prompt: string, options: LLMOptions): Promise<string> {
    const response = await this.anthropic.messages.create({
      model: options.model,
      max_tokens: options.maxTokens,
      temperature: options.temperature,
      messages: [{ role: 'user', content: prompt }],
    });
    return response.content[0].text;
  }
}

class OpenAIProvider implements LLMProvider {
  async invoke(prompt: string, options: LLMOptions): Promise<string> {
    const response = await this.openai.chat.completions.create({
      model: options.model,
      max_tokens: options.maxTokens,
      temperature: options.temperature,
      messages: [{ role: 'user', content: prompt }],
    });
    return response.choices[0].message.content;
  }
}

class OllamaProvider implements LLMProvider {
  async invoke(prompt: string, options: LLMOptions): Promise<string> {
    const response = await fetch('http://localhost:11434/api/generate', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: options.model,
        prompt: prompt,
        stream: false,
      }),
    });
    const data = await response.json();
    return data.response;
  }
}

Option 2: Unified SDK (LangChain/LangSmith)

Use LangChain which provides a unified interface:

import { ChatOpenAI } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';
import { ChatOllama } from '@langchain/community/chat_models/ollama';

// Unified interface
const provider = process.env.AI_ROUTING_PROVIDER || 'anthropic';
let llm;

switch (provider) {
  case 'openai':
    llm = new ChatOpenAI({
      modelName: process.env.AI_ROUTING_MODEL || 'gpt-4-turbo-preview',
      temperature: 0,
    });
    break;
  case 'anthropic':
    llm = new ChatAnthropic({
      modelName: process.env.AI_ROUTING_MODEL || 'claude-3-5-sonnet-20241022',
      temperature: 0,
    });
    break;
  case 'ollama':
    llm = new ChatOllama({
      model: process.env.AI_ROUTING_MODEL || 'llama2',
      baseUrl: process.env.OLLAMA_BASE_URL || 'http://localhost:11434',
    });
    break;
}

// Use unified interface
const response = await llm.invoke(prompt);

Provider-Specific Considerations

1. OpenAI

API Format:

// OpenAI uses different message format
messages: [
  { role: 'system', content: '...' },
  { role: 'user', content: '...' }
]

// Response format
response.choices[0].message.content

Models:

gpt-4-turbo-preview - Best accuracy, higher cost
gpt-4 - Good balance
gpt-3.5-turbo - Fast, cheaper

Advantages:

✅ Widely available
✅ Good performance
✅ Well-documented
✅ Fast response times

Disadvantages:

❌ Higher cost than local models
❌ Requires API key
❌ Rate limits

Cost: ~ $0.01-0.03 per query (GPT-3.5) to ~$ 0.03-0.10 (GPT-4)

2. Ollama (Local)

API Format:

// Ollama uses REST API (not SDK)
POST http://localhost:11434/api/generate
{
  "model": "llama2",
  "prompt": "...",
  "stream": false
}

// Response format
{ "response": "..." }

Models:

llama2 - Good general purpose
llama2:13b - Better accuracy
mistral - Fast, efficient
codellama - Code-focused

Advantages:

✅ Free (runs locally)
✅ No API costs
✅ Privacy (data stays local)
✅ No rate limits
✅ Offline capable

Disadvantages:

❌ Requires local setup
❌ Lower accuracy than cloud models
❌ Slower (depends on hardware)
❌ Requires GPU for good performance

Cost: $0 (but requires hardware)

3. Anthropic Claude (Current)

API Format:

// Current implementation
messages: [{ role: 'user', content: '...' }]

// Response format
response.content[0].text

Models:

claude-3-5-sonnet-20241022 - Best accuracy
claude-3-opus-20240229 - Highest quality
claude-3-haiku-20240307 - Fast, cheap

Advantages:

✅ Excellent reasoning
✅ Good for CoT
✅ Fast response times

Disadvantages:

❌ Higher cost than local
❌ Requires API key

Cost: ~$0.01-0.05 per query

Implementation Strategy

Step 1: Create Provider Factory

// src/services/llm-provider-factory.ts
export class LLMProviderFactory {
  static create(provider: string): LLMProvider {
    switch (provider) {
      case 'openai':
        return new OpenAIProvider();
      case 'anthropic':
        return new AnthropicProvider();
      case 'ollama':
        return new OllamaProvider();
      default:
        throw new Error(`Unsupported LLM provider: ${provider}`);
    }
  }
}

Step 2: Update AI Routing Orchestrator

// src/services/ai-routing-orchestrator.service.ts
export class AIRoutingOrchestrator {
  private llmProvider: LLMProvider;
  private provider: string;
  private model: string;

  private constructor() {
    this.provider = process.env.AI_ROUTING_PROVIDER || 'anthropic';
    this.model = process.env.AI_ROUTING_MODEL || this.getDefaultModel();
    this.llmProvider = LLMProviderFactory.create(this.provider);
  }

  private getDefaultModel(): string {
    switch (this.provider) {
      case 'openai':
        return 'gpt-4-turbo-preview';
      case 'anthropic':
        return 'claude-3-5-sonnet-20241022';
      case 'ollama':
        return 'llama2';
      default:
        return 'claude-3-5-sonnet-20241022';
    }
  }

  private async cotReasoning(...): Promise<...> {
    // Build prompt (same for all providers)
    const prompt = this.buildCotPrompt(...);

    // Use provider abstraction
    const response = await this.llmProvider.invoke(prompt, {
      model: this.model,
      maxTokens: 2000,
      temperature: 0,
    });

    return this.parseCotResponse(response, candidates);
  }
}

Step 3: Environment Variables

# Provider selection
AI_ROUTING_PROVIDER=anthropic  # or 'openai' or 'ollama'

# Provider-specific API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

# Provider-specific models
AI_ROUTING_MODEL=claude-3-5-sonnet-20241022  # or gpt-4-turbo-preview or llama2

# Ollama-specific
OLLAMA_BASE_URL=http://localhost:11434

Comparison Table

Provider	Cost	Speed	Accuracy	Setup	Privacy
Anthropic Claude	$$	Fast	Excellent	Easy	Cloud
OpenAI GPT-4	$$$	Fast	Excellent	Easy	Cloud
OpenAI GPT-3.5	$	Very Fast	Good	Easy	Cloud
Ollama (Local)	Free	Medium	Good*	Medium	Local

*Accuracy depends on model and hardware

Recommended Approach

For Production (Cloud)

Option A: Multi-Provider Support

Support OpenAI and Anthropic
Let users choose based on cost/performance
Default to Anthropic (current)

Option B: Provider Selection Logic

// Auto-select based on availability
if (process.env.OPENAI_API_KEY && !process.env.ANTHROPIC_API_KEY) {
  provider = 'openai';
} else if (process.env.ANTHROPIC_API_KEY) {
  provider = 'anthropic';
} else {
  throw new Error('No LLM provider configured');
}

For Development/Testing (Local)

Use Ollama:

Free to test
No API costs
Privacy for sensitive data
Good for development

Hybrid Approach

// Use Ollama for development, cloud for production
const provider = process.env.NODE_ENV === 'production'
  ? (process.env.AI_ROUTING_PROVIDER || 'anthropic')
  : 'ollama';

Code Changes Required

Minimal Changes (Provider Abstraction)

Create LLMProvider interface (~50 lines)
Implement providers (~100 lines each)
Update AIRoutingOrchestrator (~20 lines)
Add environment variables (documentation)

Dependencies Needed

For OpenAI:

npm install openai

For Ollama:

# No SDK needed - uses fetch/axios
# But can use:
npm install ollama  # Optional official SDK

For LangChain (Unified):

npm install @langchain/openai @langchain/anthropic @langchain/community

Example: Using OpenAI

// OpenAI provider implementation
import OpenAI from 'openai';

class OpenAIProvider implements LLMProvider {
  private client: OpenAI;

  constructor() {
    this.client = new OpenAI({
      apiKey: process.env.OPENAI_API_KEY,
    });
  }

  async invoke(prompt: string, options: LLMOptions): Promise<string> {
    const response = await this.client.chat.completions.create({
      model: options.model || 'gpt-4-turbo-preview',
      messages: [
        {
          role: 'user',
          content: prompt,
        },
      ],
      max_tokens: options.maxTokens || 2000,
      temperature: options.temperature || 0,
    });

    return response.choices[0].message.content || '';
  }
}

Example: Using Ollama

// Ollama provider implementation
class OllamaProvider implements LLMProvider {
  private baseUrl: string;

  constructor() {
    this.baseUrl = process.env.OLLAMA_BASE_URL || 'http://localhost:11434';
  }

  async invoke(prompt: string, options: LLMOptions): Promise<string> {
    const response = await fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: options.model || 'llama2',
        prompt: prompt,
        stream: false,
        options: {
          temperature: options.temperature || 0,
          num_predict: options.maxTokens || 2000,
        },
      }),
    });

    if (!response.ok) {
      throw new Error(`Ollama API error: ${response.statusText}`);
    }

    const data = await response.json();
    return data.response;
  }
}

Benefits of Multi-Provider Support

Flexibility: Choose provider based on needs
Cost Optimization: Use cheaper providers when appropriate
Resilience: Fallback if one provider fails
Privacy: Use local Ollama for sensitive data
Development: Test locally without API costs

Current Limitations

The current implementation is hardcoded to Anthropic:

Direct SDK usage: Uses @anthropic-ai/sdk directly
No abstraction: LLM calls are embedded in cotReasoning()
Single provider: Only supports Anthropic

To support multiple providers, you need to:

Extract LLM calls into provider abstraction
Support different API formats
Handle provider-specific configurations
Add provider selection logic

Recommendation

Yes, it can and should support multiple providers! The current architecture makes it straightforward to add:

Short-term: Add OpenAI support (similar API to Anthropic)
Medium-term: Add Ollama support (for local development)
Long-term: Use LangChain for unified interface

The prompt format and reasoning logic are provider-agnostic, so only the API call layer needs to change.

Summary

✅ Can use OpenAI: Yes, with provider abstraction
✅ Can use Ollama: Yes, with REST API integration
✅ Current limitation: Hardcoded to Anthropic
✅ Solution: Create provider abstraction layer
✅ Effort: ~200-300 lines of code
✅ Benefit: Flexibility, cost optimization, privacy options

The implementation is well-structured for this extension - the CoT reasoning logic is separate from the LLM API calls, making it easy to swap providers.