Skip to content

Ollama (Local Models)

Configure MyDeskBot to use Ollama for running AI models locally.

Overview

Ollama allows you to run AI models completely offline on your own hardware. Perfect for privacy-sensitive work.

Supported Models

Ollama supports many open-source models:

  • Llama 3 - Meta's open models
  • Mistral - High-quality open models
  • Codestral - Specialized for coding
  • Phi-3 - Small, efficient models
  • And many more...

Getting Started

1. Install Ollama

Visit ollama.ai and download for your platform:

  • macOS - Download DMG and install
  • Windows - Download EXE installer
  • Linux - Run install script: curl -fsSL https://ollama.ai/install.sh | sh

2. Pull a Model

bash
# Llama 3 (recommended)
ollama pull llama3

# Mistral
ollama pull mistral

# Codestral (for coding)
ollama pull codestral

# Phi-3 (small, fast)
ollama pull phi3

Model Selection

ModelSizeBest ForRAM Required
Llama 3 8B4.7GBGeneral use8GB
Llama 3 70B40GBComplex tasks64GB
Mistral 7B4.1GBCoding, general8GB
Codestral6.7GBCoding8GB
Phi-32.3GBQuick tasks4GB

Hardware Requirements

  • CPU: Any modern CPU
  • RAM: At least 8GB (more for larger models)
  • GPU: Optional, but greatly improves speed
    • NVIDIA GPUs with 8GB+ VRAM recommended
    • AMD GPUs also supported

Advantages

Privacy

  • 100% Offline - No internet required
  • Data Stays Local - Your data never leaves your device
  • No API Keys - No third-party services needed

Cost

  • Free to Use - No per-token costs
  • One-time Hardware Cost - Invest once, use forever
  • Unlimited Usage - No rate limits

Flexibility

  • Custom Models - Fine-tune your own models
  • Multiple Models - Switch between models easily
  • Version Control - Pin to specific model versions

Limitations

  • Hardware Dependent - Requires capable hardware
  • Slower than Cloud - Generally slower than API-based models
  • Model Quality - Not as capable as GPT-4 or Claude 3.5
  • Limited Multimodal - Some models lack image capabilities

Troubleshooting

Connection Failed

  • Ensure Ollama is running: ollama serve
  • Check the endpoint URL
  • Verify port 11434 is not blocked

Out of Memory

  • Use a smaller model
  • Close other applications
  • Consider upgrading RAM

Slow Performance

  • Use a GPU-accelerated setup
  • Choose a smaller model
  • Reduce context length

Advanced Usage

Custom Models

bash
# Create a custom Modelfile
cat > Modelfile <<EOF
FROM llama3
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are a helpful coding assistant.
EOF

# Build the model
ollama create mycoder -f Modelfile

GPU Acceleration

Ollama automatically uses available GPU. For NVIDIA GPUs, ensure CUDA is installed.

Quantization

Use quantized models to reduce memory usage:

bash
ollama pull llama3:8b-q4_k_m  # 4-bit quantization

See Also