Ollama (Local Models)

Configure MyDeskBot to use Ollama for running AI models locally.

Overview

Ollama allows you to run AI models completely offline on your own hardware. Perfect for privacy-sensitive work.

Supported Models

Ollama supports many open-source models:

Llama 3 - Meta's open models
Mistral - High-quality open models
Codestral - Specialized for coding
Phi-3 - Small, efficient models
And many more...

Getting Started

1. Install Ollama

Visit ollama.ai and download for your platform:

macOS - Download DMG and install
Windows - Download EXE installer
Linux - Run install script: curl -fsSL https://ollama.ai/install.sh | sh

2. Pull a Model

bash

# Llama 3 (recommended)
ollama pull llama3

# Mistral
ollama pull mistral

# Codestral (for coding)
ollama pull codestral

# Phi-3 (small, fast)
ollama pull phi3

Model Selection

Recommended Models

Model	Size	Best For	RAM Required
Llama 3 8B	4.7GB	General use	8GB
Llama 3 70B	40GB	Complex tasks	64GB
Mistral 7B	4.1GB	Coding, general	8GB
Codestral	6.7GB	Coding	8GB
Phi-3	2.3GB	Quick tasks	4GB

Hardware Requirements

CPU: Any modern CPU
RAM: At least 8GB (more for larger models)
GPU: Optional, but greatly improves speed
- NVIDIA GPUs with 8GB+ VRAM recommended
- AMD GPUs also supported

Advantages

Privacy

100% Offline - No internet required
Data Stays Local - Your data never leaves your device
No API Keys - No third-party services needed

Cost

Free to Use - No per-token costs
One-time Hardware Cost - Invest once, use forever
Unlimited Usage - No rate limits

Flexibility

Custom Models - Fine-tune your own models
Multiple Models - Switch between models easily
Version Control - Pin to specific model versions

Limitations

Hardware Dependent - Requires capable hardware
Slower than Cloud - Generally slower than API-based models
Model Quality - Not as capable as GPT-4 or Claude 3.5
Limited Multimodal - Some models lack image capabilities

Troubleshooting

Connection Failed

Ensure Ollama is running: ollama serve
Check the endpoint URL
Verify port 11434 is not blocked

Out of Memory

Use a smaller model
Close other applications
Consider upgrading RAM

Slow Performance

Use a GPU-accelerated setup
Choose a smaller model
Reduce context length

Advanced Usage

Custom Models

bash

# Create a custom Modelfile
cat > Modelfile <<EOF
FROM llama3
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are a helpful coding assistant.
EOF

# Build the model
ollama create mycoder -f Modelfile

GPU Acceleration

Ollama automatically uses available GPU. For NVIDIA GPUs, ensure CUDA is installed.

Quantization

Use quantized models to reduce memory usage:

bash

ollama pull llama3:8b-q4_k_m  # 4-bit quantization

Model Providers

Popular Providers

More Providers

Model Roles

Ollama (Local Models)

Overview

Supported Models

Getting Started

1. Install Ollama

2. Pull a Model

Model Selection

Recommended Models

Hardware Requirements

Advantages

Privacy

Cost

Flexibility

Limitations

Troubleshooting

Connection Failed

Out of Memory

Slow Performance

Advanced Usage

Custom Models

GPU Acceleration

Quantization

See Also

Popular Providers

More Providers

Ollama (Local Models) ​

Overview ​

Supported Models ​

Getting Started ​

1. Install Ollama ​

2. Pull a Model ​

Model Selection ​

Recommended Models ​

Hardware Requirements ​

Advantages ​

Privacy ​

Cost ​

Flexibility ​

Limitations ​

Troubleshooting ​

Connection Failed ​

Out of Memory ​

Slow Performance ​

Advanced Usage ​

Custom Models ​

GPU Acceleration ​

Quantization ​

See Also ​

Ollama (Local Models)

Overview

Supported Models

Getting Started

1. Install Ollama

2. Pull a Model

Model Selection

Recommended Models

Hardware Requirements

Advantages

Privacy

Cost

Flexibility

Limitations

Troubleshooting

Connection Failed

Out of Memory

Slow Performance

Advanced Usage

Custom Models

GPU Acceleration

Quantization

See Also