Run DeepSeek-R1 on Your Laptop with Ollama

This content originally appeared on DEV Community and was authored by Shayan

Yesterday, DeepSeek released a series of very powerful language models including DeepSeek R1 and a number of distilled (smaller) models that are based on the Qwen and Llama architectures. These models have made a lot of noise in the AI community for their performance, reasoning capabilities and most importantly for being open source with an MIT license.

I have been testing these models both from their own API and locally on my own MacBook Pro and I have to say that the performance has been amazing, even for the smaller models such as the 8B and 14B models. Here's a benchmark comparing DeepSeek R1 to other state-of-the-art models from OpenAI and Anthropic.

In this guide, I'm going to walk you through how to setup Ollama and run the latest DeepSeek R1 models locally on your own computer. But before we get started, let's take a look at the models themselves.

DeepSeek R1

DeepSeek R1 is a large language model that focuses on reasoning. It can handle tasks that need multi-step problem-solving and logical thinking. The model uses a special training method that puts more emphasis on Reinforcement Learning (RL) instead of Supervised Fine-Tuning (SFT). This approach helps the model to be better at figuring things out on its own.

The model is open source, which means its weights are available under the MIT license. This allows people to use it for commercial purposes, make changes to it, and create new versions based on it. This is different from many other big language models that are not open source.

Distilled Models: Smaller but Still Powerful

DeepSeek AI also released smaller versions of the model. These distilled models come in different sizes like 1.5B, 7B, 8B, 14B, 32B, and 70B parameters. They are based on Qwen and Llama architectures. These smaller models keep a lot of the reasoning power of the bigger model but are easier to use on personal computers.

The smaller models, especially the 8B and smaller ones, can run on regular computers with CPUs, GPUs, or Apple Silicon. This makes them easy for people to experiment with at home.

What is Ollama?

Ollama is a tool that lets you run and manage large language models (LLMs) on your own computer. It makes it easier to download, run, and use these models without needing a powerful server. Ollama supports various operating systems, including macOS, Linux, and Windows. It is designed to be simple to use, with basic commands to pull, run, and manage models.

Ollama also provides a way to use the models through an API, which allows you to integrate them into other applications. Importantly, Ollama offers an experimental compatibility layer with the OpenAI API. This means you can often use existing applications and tools designed for OpenAI with your local Ollama server. It can be configured to use GPUs for faster processing, and it offers features like custom model creation and model sharing. Ollama is a great way to explore and use LLMs without relying on cloud-based services.

Installing Ollama

Before you can use DeepSeek models, you need to install Ollama. Here's how to do it on different operating systems:

macOS

Go to the Ollama website and download the macOS installer.
Open the downloaded file and drag the Ollama application to your Applications folder.
Start the Ollama application. It will run in the background and show up in your system tray.
Open a terminal and type ollama -v to check if the installation was successful.

Linux

Open a terminal and run the following command to install Ollama:
```
curl -fsSL https://ollama.com/install.sh | sh
```
If you prefer a manual install, download the correct .tgz package from the Ollama website. Then, extract the package to /usr using these commands:
```
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
sudo tar -C /usr -xzf ollama-linux-amd64.tgz
```
To start Ollama, run ollama serve. You can check if it's working by typing ollama -v in another terminal.

For a more reliable setup, create a systemd service. First, create a user and group for Ollama:

sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo usermod -a -G ollama $(whoami)

Then, create a service file in /etc/systemd/system/ollama.service with the following content:

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"

[Install]
WantedBy=default.target

Finally, start and enable the service:

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama

Windows

Go to the Ollama website and download the Windows installer (OllamaSetup.exe).
Run the installer. Ollama will be installed in your user profile.
Ollama will run in the background and show up in your system tray.
Open a command prompt or PowerShell and type ollama -v to check if the installation was successful.

Understanding Ollama Commands

Ollama uses simple commands to manage models. Here are some key commands you'll need:

ollama -v: Checks the installed version of Ollama.
ollama pull <model_name>:<tag>: Downloads a model from the Ollama library.
ollama run <model_name>:<tag>: Runs a model and starts an interactive chat session.
ollama create <model_name> -f <Modelfile>: Creates a custom model using a Modelfile.
ollama show <model_name>: Shows information about a model.
ollama ps: Lists the models that are currently running.
ollama stop <model_name>: Unloads a model from memory.
ollama cp <source_model> <destination_model>: Copies a model.
ollama delete <model_name>: Deletes a model.
ollama push <model_name>:<tag>: Uploads a model to a model library.

DeepSeek Models on Ollama

DeepSeek models are available on the Ollama library in different sizes and formats. Here's a breakdown:

Model Sizes: The models come in various sizes, such as 1.5b, 7b, 8b, 14b, 32b, 70b, and 671b. The 'b' stands for billion parameters. Larger models usually perform better but need more resources.
Quantized Versions: Some models are available in quantized versions (e.g., q4_K_M, q8_0). These versions use less memory and can run faster, but may have a slight drop in quality.
Distilled Versions: DeepSeek also offers distilled versions (e.g., qwen-distill, llama-distill). These are smaller models that have been trained to act like the larger ones, balancing performance and resource use.
Tags: Each model has a latest tag and specific tags that show the size, quantization, and distillation method.

Using DeepSeek Models

Here's how to use DeepSeek models with Ollama:

Pulling a Model

To download a DeepSeek model, use the command:

ollama pull deepseek-r1:<model_tag>

Replace <model_tag> with the specific tag of the model you want to use. For example:

To download the latest 7B model:
```
ollama pull deepseek-r1:7b
```
To download the 14B Qwen-distilled model with q4_K_M quantization:
```
ollama pull deepseek-r1:14b-qwen-distill-q4_K_M
```
To download the 70B Llama-distilled model with fp16 precision:
```
ollama pull deepseek-r1:70b-llama-distill-fp16
```

Here are some of the available tags:

latest
1.5b
7b
8b
14b
32b
70b
671b
1.5b-qwen-distill-fp16
1.5b-qwen-distill-q4_K_M
1.5b-qwen-distill-q8_0
14b-qwen-distill-fp16
14b-qwen-distill-q4_K_M
14b-qwen-distill-q8_0
32b-qwen-distill-fp16
32b-qwen-distill-q4_K_M
32b-qwen-distill-q8_0
70b-llama-distill-fp16
70b-llama-distill-q4_K_M
70b-llama-distill-q8_0
7b-qwen-distill-fp16
7b-qwen-distill-q4_K_M
7b-qwen-distill-q8_0
8b-llama-distill-fp16
8b-llama-distill-q4_K_M
8b-llama-distill-q8_0

Running a Model

After downloading a model, you can run it using the command:

ollama run deepseek-r1:<model_tag>

For example:

To run the latest 7B model:
```
ollama run deepseek-r1:7b
```
To run the 14B Qwen-distilled model with q4_K_M quantization:
```
ollama run deepseek-r1:14b-qwen-distill-q4_K_M
```
To run the 70B Llama-distilled model with fp16 precision:
```
ollama run deepseek-r1:70b-llama-distill-fp16
```

This will start an interactive chat session where you can ask the model questions.

Using the API

You can also use the Ollama API with DeepSeek models. Here's an example using curl:

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:7b",
  "prompt": "Write a short poem about the stars."
}'

For chat completions:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:7b",
  "messages": [
    {
      "role": "user",
      "content": "Write a short poem about the stars."
    }
  ]
}'

Using the OpenAI-Compatible API

Ollama provides an experimental compatibility layer with parts of the OpenAI API. This allows you to use existing applications and tools designed for OpenAI with your local Ollama server.

Key Concepts

API Endpoint: Ollama's OpenAI-compatible API is served at http://localhost:11434/v1.
Authentication: Ollama's API doesn't require an API key for local use. You can often use a placeholder like "ollama" for the api_key parameter in your client.
Partial Compatibility: Ollama's compatibility is experimental and partial. Not all features of the OpenAI API are supported, and there may be some differences in behavior.
Focus on Core Functionality: Ollama primarily aims to support the core functionality of the OpenAI API, such as chat completions, text completions, model listing, and embeddings.

Supported Endpoints and Features

Here's a breakdown of the supported endpoints and their features:

/v1/chat/completions
- Purpose: Generate chat-style responses.
- Supported Features:
  - Chat completions (multi-turn conversations).
  - Streaming responses (real-time output).
  - JSON mode (structured JSON output).
  - Reproducible outputs (using a seed).
  - Vision (multimodal models like llava that can process images).
  - Tools (function calling).
- Supported Request Fields:
  - model: The name of the Ollama model to use.
  - messages: An array of message objects, each with a role (system, user, assistant, or tool) and content (text or image).
  - frequency_penalty, presence_penalty: Controls repetition.
  - response_format: Specifies the output format (e.g. json).
  - seed: For reproducible outputs.
  - stop: Sequences to stop generation.
  - stream: Enables/disables streaming.
  - stream_options: Additional options for streaming.
    - include_usage: Includes usage information in the stream.
  - temperature: Controls randomness.
  - top_p: Controls diversity.
  - max_tokens: Maximum tokens to generate.
  - tools: List of tools the model can access.
/v1/completions
- Purpose: Generate text completions.
- Supported Features:
  - Text completions (single-turn generation).
  - Streaming responses.
  - JSON mode
  - Reproducible outputs.
- Supported Request Fields:
  - model: The name of the Ollama model.
  - prompt: The input text.
  - frequency_penalty, presence_penalty: Controls repetition.
  - seed: For reproducible outputs.
  - stop: Stop sequences.
  - stream: Enables/disables streaming.
  - stream_options: Additional options for streaming.
    - include_usage: Includes usage information in the stream.
  - temperature: Controls randomness.
  - top_p: Controls diversity.
  - max_tokens: Maximum tokens to generate.
  - suffix: Text to append after the model's response
/v1/models
/v1/models/{model}
/v1/embeddings

How to Use Ollama with OpenAI Clients

Here's how to configure popular OpenAI clients to work with Ollama:

OpenAI Python Library:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # Required but ignored
)

# Example chat completion
chat_completion = client.chat.completions.create(
    messages=[
        {'role': 'user', 'content': 'Say this is a test'},
    ],
    model='deepseek-r1:7b',
)

# Example text completion
completion = client.completions.create(
    model="deepseek-r1:7b",
    prompt="Say this is a test",
)

# Example list models
list_completion = client.models.list()

# Example get model info
model = client.models.retrieve("deepseek-r1:7b")

OpenAI JavaScript Library:

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'http://localhost:11434/v1/',
  apiKey: 'ollama', // Required but ignored
});

// Example chat completion
const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: 'user', content: 'Say this is a test' }],
  model: 'deepseek-r1:7b',
});

// Example text completion
const completion = await openai.completions.create({
  model: "deepseek-r1:7b",
  prompt: "Say this is a test.",
});

// Example list models
const listCompletion = await openai.models.list()

// Example get model info
const model = await openai.models.retrieve("deepseek-r1:7b")

curl (Direct API Calls):

# Chat completion
curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "deepseek-r1:7b",
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

# Text completion
curl http://localhost:11434/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "deepseek-r1:7b",
        "prompt": "Say this is a test"
    }'

# List models
curl http://localhost:11434/v1/models

# Get model info
curl http://localhost:11434/v1/models/deepseek-r1:7b

# Embeddings
curl http://localhost:11434/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
        "model": "all-minilm",
        "input": ["why is the sky blue?", "why is the grass green?"]
    }'

Choosing the Right Model

When choosing a DeepSeek model, consider these factors:

Size: Larger models generally perform better but need more resources. Start with smaller models if you have limited resources.
Quantization: Quantized models use less memory but may have slightly lower quality.
Distillation: Distilled models offer a good balance between performance and resource usage.

It's best to experiment with different models to see which one works best for you.

Additional Tips

Always check the Ollama library for the latest models and tags.
Use ollama ps to monitor the resources used by your models.
You can adjust parameters like temperature, top_p, and num_ctx to fine-tune the model's output.

Troubleshooting

If you have any issues, check the Ollama logs:

macOS: ~/.ollama/logs/server.log
Linux: journalctl -u ollama --no-pager
Windows: %LOCALAPPDATA%\Ollama\server.log

You can also use the OLLAMA_DEBUG=1 environment variable for more detailed logs.

Going Further with LLMs

Of course, running these models locally is just the beginning. You can integrate these models into your own applications using the API, build custom applications such as chatbots, research tools with Retriever-augmented generation (RAG), and more.

I have written a number of guides on exploring these models further such as:

Setting up Postgres and pgvector with Docker for building RAG applications - Learn how to set up Postgres and pgvector with Docker for RAG (Retrieval-Augmented Generation) in this step-by-step guide.
Deep Dive into Vector Similarity Search within Postgres and pgvector - Learn how to use pgvector to make vector similarity search easier in Postgres. Discover functions for creating indexes, querying vectors, and more.
Creating AI Agents in Node Using the AI SDK - Learn how to create AI agents in Node using the AI SDK to automate workflows and tasks.
How to Enrich Customer Data with LLMs and Web Crawling - Learn how to use LLMs and Puppeteer to crawl customer websites and enrich their data within your SaaS product.

Conclusion

I hope this guide has been helpful to show you how easy it is to get started with Ollama and run state-of-the-art language models on your own computer. Remember that you are not just limited to DeepSeek models, you can use any model that is available on Ollama or even directly from the models available on other platforms like Hugging Face.

If you have any questions or feedback, please let me know in the comments below.

This content originally appeared on DEV Community and was authored by Shayan

Print Share Comment Cite Upload Translate Updates

APA

Shayan | Sciencx (2025-01-21T21:30:45+00:00) Run DeepSeek-R1 on Your Laptop with Ollama. Retrieved from https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/

MLA

" » Run DeepSeek-R1 on Your Laptop with Ollama." Shayan | Sciencx - Tuesday January 21, 2025, https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/

HARVARD

Shayan | Sciencx Tuesday January 21, 2025 » Run DeepSeek-R1 on Your Laptop with Ollama., viewed ,<https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/>

VANCOUVER

Shayan | Sciencx - » Run DeepSeek-R1 on Your Laptop with Ollama. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/

CHICAGO

" » Run DeepSeek-R1 on Your Laptop with Ollama." Shayan | Sciencx - Accessed . https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/

IEEE

" » Run DeepSeek-R1 on Your Laptop with Ollama." Shayan | Sciencx [Online]. Available: https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/. [Accessed: ]

rf:citation

» Run DeepSeek-R1 on Your Laptop with Ollama | Shayan | Sciencx | https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.