Logo

Ollama Docker Setup Guide

Ollama Docker Setup Guide

Overview

This guide explains how to set up Ollama with Llama 3.2 in Docker for use with the AI routing service.


Quick Start

1. Start Ollama Service

# Start Ollama container
docker-compose up -d ollama

# Wait for Ollama to be ready (about 60 seconds)
docker-compose logs -f ollama

2. Pull Llama 3.2 Model

# Use the setup script (recommended)
./scripts/setup-ollama.sh

# Or manually:
docker-compose exec ollama ollama pull llama3.2

3. Configure Environment

Add to your .env file:

AI_ROUTING_ENABLED=true
AI_ROUTING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
AI_ROUTING_MODEL=llama3.2

4. Restart Service

docker-compose restart haya-routing

Docker Compose Configuration

The docker-compose.yml includes an Ollama service:

ollama:
  image: ollama/ollama:latest
  ports:
    - "11434:11434"
  volumes:
    - ollama_data:/root/.ollama
  healthcheck:
    test: ["CMD-SHELL", "curl -f http://localhost:11434/api/tags || exit 1"]
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 60s

Key points:

  • Port: 11434 (standard Ollama port)
  • Volume: ollama_data persists models between restarts
  • Health check: Verifies Ollama is ready
  • GPU support: Configured for NVIDIA GPU (optional)

GPU vs CPU Mode

If you have an NVIDIA GPU:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

Requirements:

  • NVIDIA GPU with CUDA support
  • nvidia-docker2 installed
  • Docker with GPU support

Benefits:

  • Much faster inference
  • Better performance
  • Can handle larger models

CPU Mode

If you don't have a GPU, the service will run on CPU (slower but works):

# Uncomment these in docker-compose.yml
environment:
  - OLLAMA_NUM_PARALLEL=1
  - OLLAMA_MAX_LOADED_MODELS=1

Note: CPU mode is slower but functional for development/testing.


Available Models

docker-compose exec ollama ollama pull llama3.2
  • Size: ~2GB
  • Best for: General purpose, good balance
  • Speed: Fast (with GPU)

Other Models

# Llama 3.2 (3B - smaller, faster)
docker-compose exec ollama ollama pull llama3.2:3b

# Llama 3.2 (1B - smallest, fastest)
docker-compose exec ollama ollama pull llama3.2:1b

# Mistral (alternative)
docker-compose exec ollama ollama pull mistral

# Code Llama (for code-focused tasks)
docker-compose exec ollama ollama pull codellama

Verification

Check Ollama is Running

# Check container status
docker-compose ps ollama

# Check logs
docker-compose logs ollama

# Test API
curl http://localhost:11434/api/tags

List Available Models

docker-compose exec ollama ollama list

Test Model

docker-compose exec ollama ollama run llama3.2 "Hello, how are you?"

Test via API

curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama3.2",
    "prompt": "What is routing?",
    "stream": false
  }'

Troubleshooting

Issue: Container won't start

Check logs:

docker-compose logs ollama

Common causes:

  • Port 11434 already in use
  • Insufficient memory
  • Docker not running

Solution:

# Check if port is in use
lsof -i :11434

# Stop conflicting service or change port in docker-compose.yml

Issue: Model download fails

Solution:

# Retry download
docker-compose exec ollama ollama pull llama3.2

# Check disk space
df -h

# Check network connectivity
docker-compose exec ollama ping -c 3 8.8.8.8

Issue: Slow performance

Solutions:

  1. Use GPU (if available):

    # Install nvidia-docker2
    # Restart Docker
    docker-compose up -d ollama
    
  2. Use smaller model:

    docker-compose exec ollama ollama pull llama3.2:1b
    # Update .env: AI_ROUTING_MODEL=llama3.2:1b
    
  3. Increase timeout:

    # In .env
    AI_ROUTING_TIMEOUT_MS=30000
    

Issue: Out of memory

Solutions:

  1. Use smaller model (llama3.2:1b or llama3.2:3b)
  2. Reduce parallel requests:
    environment:
      - OLLAMA_NUM_PARALLEL=1
    
  3. Increase Docker memory limit

Performance Tips

1. Use GPU When Available

GPU provides 10-100x speedup:

# Check GPU availability
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

2. Choose Right Model Size

ModelSizeSpeed (CPU)Speed (GPU)Use Case
llama3.2:1b~700MBSlowFastDevelopment
llama3.2:3b~2GBMediumVery FastRecommended
llama3.2~2GBMediumVery FastProduction

3. Cache Models

Models are cached in ollama_data volume, so they persist between restarts.

4. Monitor Resources

# Check container resources
docker stats ollama

# Check disk usage
docker system df

Integration with AI Routing

Once Ollama is running:

1. Update Environment

# .env
AI_ROUTING_ENABLED=true
AI_ROUTING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
AI_ROUTING_MODEL=llama3.2

2. Restart Service

docker-compose restart haya-routing

3. Test AI Routing

curl -X POST http://localhost:3000/api/v1/route/ai \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "tenant-001",
    "app_id": "app-001",
    "roles": ["developer"],
    "query": "Show network device inventory"
  }'

Maintenance

Update Ollama

docker-compose pull ollama
docker-compose up -d ollama

Update Model

docker-compose exec ollama ollama pull llama3.2

Clean Up

# Remove unused models
docker-compose exec ollama ollama list
docker-compose exec ollama ollama rm <model-name>

# Remove volume (deletes all models)
docker-compose down -v

Backup Models

# Models are stored in ollama_data volume
docker run --rm -v haya-routing_ollama_data:/data -v $(pwd):/backup \
  alpine tar czf /backup/ollama-backup.tar.gz /data

Resource Requirements

Minimum (CPU Mode)

  • CPU: 2+ cores
  • RAM: 4GB
  • Disk: 5GB (for model)
  • Speed: ~5-10 seconds per request
  • GPU: NVIDIA with 4GB+ VRAM
  • CPU: 2+ cores
  • RAM: 8GB
  • Disk: 5GB
  • Speed: ~1-2 seconds per request

Summary

Ollama is configured in docker-compose.yml
Runs on port 11434
Models persist in ollama_data volume
GPU support included (optional)
Setup script available: ./scripts/setup-ollama.sh

Next steps:

  1. Run docker-compose up -d ollama
  2. Run ./scripts/setup-ollama.sh to pull Llama 3.2
  3. Configure .env with AI_ROUTING_PROVIDER=ollama
  4. Restart service and test!

© 2025 All rights reservedBuilt with DataHub Cloud

Built with LogoDataHub Cloud