Run DeepSeek-R1 on Your Laptop with Ollama

Yesterday, DeepSeek released a series of very powerful language models including DeepSeek R1 and a number of distilled (smaller) models that are based on the Qwen and Llama architectures. These models have made a lot of noise in the AI community for th…


This content originally appeared on DEV Community and was authored by Shayan

Yesterday, DeepSeek released a series of very powerful language models including DeepSeek R1 and a number of distilled (smaller) models that are based on the Qwen and Llama architectures. These models have made a lot of noise in the AI community for their performance, reasoning capabilities and most importantly for being open source with an MIT license.

I have been testing these models both from their own API and locally on my own MacBook Pro and I have to say that the performance has been amazing, even for the smaller models such as the 8B and 14B models. Here's a benchmark comparing DeepSeek R1 to other state-of-the-art models from OpenAI and Anthropic.

DeepSeek R1 Benchmark

In this guide, I'm going to walk you through how to setup Ollama and run the latest DeepSeek R1 models locally on your own computer. But before we get started, let's take a look at the models themselves.

DeepSeek R1

DeepSeek R1 is a large language model that focuses on reasoning. It can handle tasks that need multi-step problem-solving and logical thinking. The model uses a special training method that puts more emphasis on Reinforcement Learning (RL) instead of Supervised Fine-Tuning (SFT). This approach helps the model to be better at figuring things out on its own.

The model is open source, which means its weights are available under the MIT license. This allows people to use it for commercial purposes, make changes to it, and create new versions based on it. This is different from many other big language models that are not open source.

Distilled Models: Smaller but Still Powerful

DeepSeek AI also released smaller versions of the model. These distilled models come in different sizes like 1.5B, 7B, 8B, 14B, 32B, and 70B parameters. They are based on Qwen and Llama architectures. These smaller models keep a lot of the reasoning power of the bigger model but are easier to use on personal computers.

The smaller models, especially the 8B and smaller ones, can run on regular computers with CPUs, GPUs, or Apple Silicon. This makes them easy for people to experiment with at home.

What is Ollama?

Ollama is a tool that lets you run and manage large language models (LLMs) on your own computer. It makes it easier to download, run, and use these models without needing a powerful server. Ollama supports various operating systems, including macOS, Linux, and Windows. It is designed to be simple to use, with basic commands to pull, run, and manage models.

Ollama also provides a way to use the models through an API, which allows you to integrate them into other applications. Importantly, Ollama offers an experimental compatibility layer with the OpenAI API. This means you can often use existing applications and tools designed for OpenAI with your local Ollama server. It can be configured to use GPUs for faster processing, and it offers features like custom model creation and model sharing. Ollama is a great way to explore and use LLMs without relying on cloud-based services.

Installing Ollama

Before you can use DeepSeek models, you need to install Ollama. Here's how to do it on different operating systems:

macOS

  1. Go to the Ollama website and download the macOS installer.
  2. Open the downloaded file and drag the Ollama application to your Applications folder.
  3. Start the Ollama application. It will run in the background and show up in your system tray.
  4. Open a terminal and type ollama -v to check if the installation was successful.

Linux

  1. Open a terminal and run the following command to install Ollama:

    curl -fsSL https://ollama.com/install.sh | sh
    
  2. If you prefer a manual install, download the correct .tgz package from the Ollama website. Then, extract the package to /usr using these commands:

    curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
    sudo tar -C /usr -xzf ollama-linux-amd64.tgz
    
  3. To start Ollama, run ollama serve. You can check if it's working by typing ollama -v in another terminal.

  4. For a more reliable setup, create a systemd service. First, create a user and group for Ollama:

    sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
    sudo usermod -a -G ollama $(whoami)
    
  5. Then, create a service file in /etc/systemd/system/ollama.service with the following content:

    [Unit]
    Description=Ollama Service
    After=network-online.target
    
    [Service]
    ExecStart=/usr/bin/ollama serve
    User=ollama
    Group=ollama
    Restart=always
    RestartSec=3
    Environment="PATH=$PATH"
    
    [Install]
    WantedBy=default.target
    
  6. Finally, start and enable the service:

    sudo systemctl daemon-reload
    sudo systemctl enable ollama
    sudo systemctl start ollama
    sudo systemctl status ollama
    

Windows

  1. Go to the Ollama website and download the Windows installer (OllamaSetup.exe).
  2. Run the installer. Ollama will be installed in your user profile.
  3. Ollama will run in the background and show up in your system tray.
  4. Open a command prompt or PowerShell and type ollama -v to check if the installation was successful.

Understanding Ollama Commands

Ollama uses simple commands to manage models. Here are some key commands you'll need:

  • ollama -v: Checks the installed version of Ollama.
  • ollama pull <model_name>:<tag>: Downloads a model from the Ollama library.
  • ollama run <model_name>:<tag>: Runs a model and starts an interactive chat session.
  • ollama create <model_name> -f <Modelfile>: Creates a custom model using a Modelfile.
  • ollama show <model_name>: Shows information about a model.
  • ollama ps: Lists the models that are currently running.
  • ollama stop <model_name>: Unloads a model from memory.
  • ollama cp <source_model> <destination_model>: Copies a model.
  • ollama delete <model_name>: Deletes a model.
  • ollama push <model_name>:<tag>: Uploads a model to a model library.

DeepSeek Models on Ollama

DeepSeek models are available on the Ollama library in different sizes and formats. Here's a breakdown:

  • Model Sizes: The models come in various sizes, such as 1.5b, 7b, 8b, 14b, 32b, 70b, and 671b. The 'b' stands for billion parameters. Larger models usually perform better but need more resources.
  • Quantized Versions: Some models are available in quantized versions (e.g., q4_K_M, q8_0). These versions use less memory and can run faster, but may have a slight drop in quality.
  • Distilled Versions: DeepSeek also offers distilled versions (e.g., qwen-distill, llama-distill). These are smaller models that have been trained to act like the larger ones, balancing performance and resource use.
  • Tags: Each model has a latest tag and specific tags that show the size, quantization, and distillation method.

Using DeepSeek Models

Here's how to use DeepSeek models with Ollama:

Pulling a Model

To download a DeepSeek model, use the command:

ollama pull deepseek-r1:<model_tag>

Replace <model_tag> with the specific tag of the model you want to use. For example:

  • To download the latest 7B model:

    ollama pull deepseek-r1:7b
    
  • To download the 14B Qwen-distilled model with q4_K_M quantization:

    ollama pull deepseek-r1:14b-qwen-distill-q4_K_M
    
  • To download the 70B Llama-distilled model with fp16 precision:

    ollama pull deepseek-r1:70b-llama-distill-fp16
    

Here are some of the available tags:

  • latest
  • 1.5b
  • 7b
  • 8b
  • 14b
  • 32b
  • 70b
  • 671b
  • 1.5b-qwen-distill-fp16
  • 1.5b-qwen-distill-q4_K_M
  • 1.5b-qwen-distill-q8_0
  • 14b-qwen-distill-fp16
  • 14b-qwen-distill-q4_K_M
  • 14b-qwen-distill-q8_0
  • 32b-qwen-distill-fp16
  • 32b-qwen-distill-q4_K_M
  • 32b-qwen-distill-q8_0
  • 70b-llama-distill-fp16
  • 70b-llama-distill-q4_K_M
  • 70b-llama-distill-q8_0
  • 7b-qwen-distill-fp16
  • 7b-qwen-distill-q4_K_M
  • 7b-qwen-distill-q8_0
  • 8b-llama-distill-fp16
  • 8b-llama-distill-q4_K_M
  • 8b-llama-distill-q8_0

Running a Model

After downloading a model, you can run it using the command:

ollama run deepseek-r1:<model_tag>

For example:

  • To run the latest 7B model:

    ollama run deepseek-r1:7b
    
  • To run the 14B Qwen-distilled model with q4_K_M quantization:

    ollama run deepseek-r1:14b-qwen-distill-q4_K_M
    
  • To run the 70B Llama-distilled model with fp16 precision:

    ollama run deepseek-r1:70b-llama-distill-fp16
    

This will start an interactive chat session where you can ask the model questions.

Using the API

You can also use the Ollama API with DeepSeek models. Here's an example using curl:

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:7b",
  "prompt": "Write a short poem about the stars."
}'

For chat completions:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:7b",
  "messages": [
    {
      "role": "user",
      "content": "Write a short poem about the stars."
    }
  ]
}'

Using the OpenAI-Compatible API

Ollama provides an experimental compatibility layer with parts of the OpenAI API. This allows you to use existing applications and tools designed for OpenAI with your local Ollama server.

Key Concepts

  • API Endpoint: Ollama's OpenAI-compatible API is served at http://localhost:11434/v1.
  • Authentication: Ollama's API doesn't require an API key for local use. You can often use a placeholder like "ollama" for the api_key parameter in your client.
  • Partial Compatibility: Ollama's compatibility is experimental and partial. Not all features of the OpenAI API are supported, and there may be some differences in behavior.
  • Focus on Core Functionality: Ollama primarily aims to support the core functionality of the OpenAI API, such as chat completions, text completions, model listing, and embeddings.

Supported Endpoints and Features

Here's a breakdown of the supported endpoints and their features:

  1. /v1/chat/completions

    • Purpose: Generate chat-style responses.
    • Supported Features:
      • Chat completions (multi-turn conversations).
      • Streaming responses (real-time output).
      • JSON mode (structured JSON output).
      • Reproducible outputs (using a seed).
      • Vision (multimodal models like llava that can process images).
      • Tools (function calling).
    • Supported Request Fields:
      • model: The name of the Ollama model to use.
      • messages: An array of message objects, each with a role (system, user, assistant, or tool) and content (text or image).
      • frequency_penalty, presence_penalty: Controls repetition.
      • response_format: Specifies the output format (e.g. json).
      • seed: For reproducible outputs.
      • stop: Sequences to stop generation.
      • stream: Enables/disables streaming.
      • stream_options: Additional options for streaming.
        • include_usage: Includes usage information in the stream.
      • temperature: Controls randomness.
      • top_p: Controls diversity.
      • max_tokens: Maximum tokens to generate.
      • tools: List of tools the model can access.
  2. /v1/completions

    • Purpose: Generate text completions.
    • Supported Features:
      • Text completions (single-turn generation).
      • Streaming responses.
      • JSON mode
      • Reproducible outputs.
    • Supported Request Fields:
      • model: The name of the Ollama model.
      • prompt: The input text.
      • frequency_penalty, presence_penalty: Controls repetition.
      • seed: For reproducible outputs.
      • stop: Stop sequences.
      • stream: Enables/disables streaming.
      • stream_options: Additional options for streaming.
        • include_usage: Includes usage information in the stream.
      • temperature: Controls randomness.
      • top_p: Controls diversity.
      • max_tokens: Maximum tokens to generate.
      • suffix: Text to append after the model's response
  3. /v1/models

  4. /v1/models/{model}

  5. /v1/embeddings

How to Use Ollama with OpenAI Clients

Here's how to configure popular OpenAI clients to work with Ollama:

  1. OpenAI Python Library:

    from openai import OpenAI
    
    client = OpenAI(
        base_url='http://localhost:11434/v1/',
        api_key='ollama',  # Required but ignored
    )
    
    # Example chat completion
    chat_completion = client.chat.completions.create(
        messages=[
            {'role': 'user', 'content': 'Say this is a test'},
        ],
        model='deepseek-r1:7b',
    )
    
    # Example text completion
    completion = client.completions.create(
        model="deepseek-r1:7b",
        prompt="Say this is a test",
    )
    
    # Example list models
    list_completion = client.models.list()
    
    # Example get model info
    model = client.models.retrieve("deepseek-r1:7b")
    
  2. OpenAI JavaScript Library:

    import OpenAI from 'openai';
    
    const openai = new OpenAI({
      baseURL: 'http://localhost:11434/v1/',
      apiKey: 'ollama', // Required but ignored
    });
    
    // Example chat completion
    const chatCompletion = await openai.chat.completions.create({
      messages: [{ role: 'user', content: 'Say this is a test' }],
      model: 'deepseek-r1:7b',
    });
    
    // Example text completion
    const completion = await openai.completions.create({
      model: "deepseek-r1:7b",
      prompt: "Say this is a test.",
    });
    
    // Example list models
    const listCompletion = await openai.models.list()
    
    // Example get model info
    const model = await openai.models.retrieve("deepseek-r1:7b")
    
  3. curl (Direct API Calls):

    # Chat completion
    curl http://localhost:11434/v1/chat/completions \
        -H "Content-Type: application/json" \
        -d '{
            "model": "deepseek-r1:7b",
            "messages": [
                {
                    "role": "user",
                    "content": "Hello!"
                }
            ]
        }'
    
    # Text completion
    curl http://localhost:11434/v1/completions \
        -H "Content-Type: application/json" \
        -d '{
            "model": "deepseek-r1:7b",
            "prompt": "Say this is a test"
        }'
    
    # List models
    curl http://localhost:11434/v1/models
    
    # Get model info
    curl http://localhost:11434/v1/models/deepseek-r1:7b
    
    # Embeddings
    curl http://localhost:11434/v1/embeddings \
        -H "Content-Type: application/json" \
        -d '{
            "model": "all-minilm",
            "input": ["why is the sky blue?", "why is the grass green?"]
        }'
    

Choosing the Right Model

When choosing a DeepSeek model, consider these factors:

  • Size: Larger models generally perform better but need more resources. Start with smaller models if you have limited resources.
  • Quantization: Quantized models use less memory but may have slightly lower quality.
  • Distillation: Distilled models offer a good balance between performance and resource usage.

It's best to experiment with different models to see which one works best for you.

Additional Tips

  • Always check the Ollama library for the latest models and tags.
  • Use ollama ps to monitor the resources used by your models.
  • You can adjust parameters like temperature, top_p, and num_ctx to fine-tune the model's output.

Troubleshooting

If you have any issues, check the Ollama logs:

  • macOS: ~/.ollama/logs/server.log
  • Linux: journalctl -u ollama --no-pager
  • Windows: %LOCALAPPDATA%\Ollama\server.log

You can also use the OLLAMA_DEBUG=1 environment variable for more detailed logs.

Going Further with LLMs

Of course, running these models locally is just the beginning. You can integrate these models into your own applications using the API, build custom applications such as chatbots, research tools with Retriever-augmented generation (RAG), and more.

I have written a number of guides on exploring these models further such as:

Conclusion

I hope this guide has been helpful to show you how easy it is to get started with Ollama and run state-of-the-art language models on your own computer. Remember that you are not just limited to DeepSeek models, you can use any model that is available on Ollama or even directly from the models available on other platforms like Hugging Face.

If you have any questions or feedback, please let me know in the comments below.


This content originally appeared on DEV Community and was authored by Shayan


Print Share Comment Cite Upload Translate Updates
APA

Shayan | Sciencx (2025-01-21T21:30:45+00:00) Run DeepSeek-R1 on Your Laptop with Ollama. Retrieved from https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/

MLA
" » Run DeepSeek-R1 on Your Laptop with Ollama." Shayan | Sciencx - Tuesday January 21, 2025, https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/
HARVARD
Shayan | Sciencx Tuesday January 21, 2025 » Run DeepSeek-R1 on Your Laptop with Ollama., viewed ,<https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/>
VANCOUVER
Shayan | Sciencx - » Run DeepSeek-R1 on Your Laptop with Ollama. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/
CHICAGO
" » Run DeepSeek-R1 on Your Laptop with Ollama." Shayan | Sciencx - Accessed . https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/
IEEE
" » Run DeepSeek-R1 on Your Laptop with Ollama." Shayan | Sciencx [Online]. Available: https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/. [Accessed: ]
rf:citation
» Run DeepSeek-R1 on Your Laptop with Ollama | Shayan | Sciencx | https://www.scien.cx/2025/01/21/run-deepseek-r1-on-your-laptop-with-ollama/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.