The Raspberry Pi has been the go-to platform for a small, affordable computer that can be used for a wide variety of projects. Similarly, Ollama is an open-source project that simplifies getting started with Large Language Models (LLMs) providing pre-built binaries that can be easily installed as well as prebuilt models that can be used to answer questions.

In this article, we’ll combine the two by running Ollama on the Raspberry Pi as a way of introducing the two and help remove any barriers to getting started with your own LLM exploration.

Prerequisites

For this example, we’ll be using Ubuntu 24.04 on a Raspberry Pi 5 with 8GB of RAM:

$ lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:    24.04
Codename:   noble

Note

These instructions will work with Raspberry Pi OS as well.

Installation

Install dependencies:
```
sudo apt install curl git
```

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux arm64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode.

Add your user to the ollama group:

sudo usermod -aG ollama <username>
newgrp ollama

Install Models

List the currently installed models:
```
$ ollama list
```
```
NAME    ID    SIZE    MODIFIED
```
You can see that we currently don’t have any downloaded.

Install and run the llama3.2:1b model.

$  ollama run llama3.2:1b --verbose

  pulling manifest
  pulling 74701a8c35f6... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.3 GB
  pulling 966de95ca8a6... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.4 KB
  pulling fcc5a6bec9da... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.7 KB
  pulling a70ff7e570d9... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 6.0 KB
  pulling 4f659a1e86d7... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  485 B
  verifying sha256 digest
  writing manifest
  success
  >>> Send a message (/? for help)

Test Drive Ollama

Ask a question:

>>> why is water wet?

Water is wet because of its unique properties and the way it interacts with our senses. Here's why:

1. **Surface tension**: Water has a natural surface tension that allows it to resist external forces, such as gravity or air pressure, when it's at the surface. This means that water can maintain its shape and
keep itself upright in a container, which is why it feels wet.
2. **Adhesion**: Water molecules have a strong attraction to each other, which helps them stick together and form a cohesive layer on surfaces. This adhesion creates a "skin" of sorts around the surface, making
it feel wet and slippery.
3. **Sensory perception**: When we touch or put our skin in contact with water, our nerve endings detect changes in temperature, pressure, and texture, which triggers a sensation of wetness. The sensation is
often described as cool, refreshing, and invigorating.
4. **Evaporation**: As water evaporates from the surface, it leaves behind a residue that can feel wet to our skin. This process helps to distribute heat evenly across the skin, making us feel cooler when we're
wet.

So, in summary, water is wet because of its unique combination of properties, such as surface tension, adhesion, and sensory perception. These factors work together to create an experience of wetness that's both
enjoyable and essential for human survival.

total duration:       37.769620479s
load duration:        35.034502ms
prompt eval count:    30 token(s)
prompt eval duration: 1.542s
prompt eval rate:     19.46 tokens/s
eval count:           281 token(s)
eval duration:        36.191s
eval rate:            7.76 tokens/s
>>> Send a message (/? for help)

NOTE: The very nature of LLMs assures each question is treated uniquely and, as such, each response, while hopefully consistent factually, varies slightly in its delivered verbiage. ie., the output in the example above won’t necessarily be the exact output you see when you ask the same question. This is normal behavour.

Notice the statistics output at the end. These were output because we included --verbose on the command line when starting Ollama.

Exit Ollama entering /bye, or pressing ctrl-D.

Ollama Maintenance

With the Ollama serve running and some questions run through it, we can look at some stats for the server process using this command:

$ systemctl status ollama

● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-01-01 07:01:41 CST; 3min 53s ago
   Main PID: 34167 (ollama)
      Tasks: 17 (limit: 9063)
     Memory: 1.5G (peak: 1.5G)
        CPU: 2min 4.806s
     CGroup: /system.slice/ollama.service
             ├─34167 /usr/local/bin/ollama serve
             └─34185 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 8192 --batch-size 512 --threads 4 --n>

Jan 01 07:01:46 rpi58gb2 ollama[34167]: llama_new_context_with_model:        CPU  output buffer size =     1.99 MiB
Jan 01 07:01:46 rpi58gb2 ollama[34167]: llama_new_context_with_model:        CPU compute buffer size =   544.01 MiB
Jan 01 07:01:46 rpi58gb2 ollama[34167]: llama_new_context_with_model: graph nodes  = 518
Jan 01 07:01:46 rpi58gb2 ollama[34167]: llama_new_context_with_model: graph splits = 1
Jan 01 07:01:46 rpi58gb2 ollama[34167]: time=2025-01-01T07:01:46.495-06:00 level=INFO source=server.go:594 msg="llama runner started in 2.51 seconds"
Jan 01 07:01:46 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:01:46 | 200 |  2.600923681s |       127.0.0.1 | POST     "/api/generate"
Jan 01 07:02:19 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:02:19 | 200 |  31.07531176s |       127.0.0.1 | POST     "/api/chat"
Jan 01 07:04:33 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:04:33 | 200 |      29.963µs |       127.0.0.1 | HEAD     "/"
Jan 01 07:04:33 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:04:33 | 200 |   31.033856ms |       127.0.0.1 | POST     "/api/show"
Jan 01 07:04:33 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:04:33 | 200 |   31.687482ms |       127.0.0.1 | POST     "/api/generate"

Here, we see the following:

The server has been up for 3min 53s.
Its peak memory usage is 1.5GB.
The amount of CPU time consumed is 2min 4.806s.
And notice this line:
```
34185 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 ...
```
Here we can see where our model files are located on the filesystem:
```
/usr/share/ollama/.ollama/models
```
Given the size of these model files, it’s often desireable to copy them from an LLM server on one host to a different LLM server running on a different host. Rather than pull them from the Internet multiple times, we can simply copy the relevant files from one host to the next.

In the next article, we’ll explore just how to do this. Stay tuned!

Running Ollama on a Raspberry Pi

Prerequisites

Installation

Install Models

Test Drive Ollama

Ollama Maintenance

Copyright