Ollama (Local Models)
Configure MyDeskBot to use Ollama for running AI models locally.
Overview
Ollama allows you to run AI models completely offline on your own hardware. Perfect for privacy-sensitive work.
Supported Models
Ollama supports many open-source models:
- Llama 3 - Meta's open models
- Mistral - High-quality open models
- Codestral - Specialized for coding
- Phi-3 - Small, efficient models
- And many more...
Getting Started
1. Install Ollama
Visit ollama.ai and download for your platform:
- macOS - Download DMG and install
- Windows - Download EXE installer
- Linux - Run install script:
curl -fsSL https://ollama.ai/install.sh | sh
2. Pull a Model
bash
# Llama 3 (recommended)
ollama pull llama3
# Mistral
ollama pull mistral
# Codestral (for coding)
ollama pull codestral
# Phi-3 (small, fast)
ollama pull phi3Model Selection
Recommended Models
| Model | Size | Best For | RAM Required |
|---|---|---|---|
| Llama 3 8B | 4.7GB | General use | 8GB |
| Llama 3 70B | 40GB | Complex tasks | 64GB |
| Mistral 7B | 4.1GB | Coding, general | 8GB |
| Codestral | 6.7GB | Coding | 8GB |
| Phi-3 | 2.3GB | Quick tasks | 4GB |
Hardware Requirements
- CPU: Any modern CPU
- RAM: At least 8GB (more for larger models)
- GPU: Optional, but greatly improves speed
- NVIDIA GPUs with 8GB+ VRAM recommended
- AMD GPUs also supported
Advantages
Privacy
- 100% Offline - No internet required
- Data Stays Local - Your data never leaves your device
- No API Keys - No third-party services needed
Cost
- Free to Use - No per-token costs
- One-time Hardware Cost - Invest once, use forever
- Unlimited Usage - No rate limits
Flexibility
- Custom Models - Fine-tune your own models
- Multiple Models - Switch between models easily
- Version Control - Pin to specific model versions
Limitations
- Hardware Dependent - Requires capable hardware
- Slower than Cloud - Generally slower than API-based models
- Model Quality - Not as capable as GPT-4 or Claude 3.5
- Limited Multimodal - Some models lack image capabilities
Troubleshooting
Connection Failed
- Ensure Ollama is running:
ollama serve - Check the endpoint URL
- Verify port 11434 is not blocked
Out of Memory
- Use a smaller model
- Close other applications
- Consider upgrading RAM
Slow Performance
- Use a GPU-accelerated setup
- Choose a smaller model
- Reduce context length
Advanced Usage
Custom Models
bash
# Create a custom Modelfile
cat > Modelfile <<EOF
FROM llama3
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are a helpful coding assistant.
EOF
# Build the model
ollama create mycoder -f ModelfileGPU Acceleration
Ollama automatically uses available GPU. For NVIDIA GPUs, ensure CUDA is installed.
Quantization
Use quantized models to reduce memory usage:
bash
ollama pull llama3:8b-q4_k_m # 4-bit quantization