The Raspberry Pi has been the go-to platform for a small, affordable computer that can be used for a wide variety of projects. Similarly, Ollama is an open-source project that simplifies getting started with Large Language Models (LLMs) providing pre-built binaries that can be easily installed as well as prebuilt models that can be used to answer questions.
In this article, we’ll combine the two by running Ollama on the Raspberry Pi as a way of introducing the two and help remove any barriers to getting started with your own LLM exploration.
Prerequisites
For this example, we’ll be using Ubuntu 24.04 on a Raspberry Pi 5 with 8GB of RAM:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04 LTS
Release: 24.04
Codename: noble
These instructions will work with Raspberry Pi OS as well.
Installation
Install dependencies:
sudo apt install curl git
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local >>> Downloading Linux arm64 bundle ######################################################################## 100.0% >>> Creating ollama user... >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode.
Add your user to the
ollama
group:sudo usermod -aG ollama <username> newgrp ollama
Install Models
List the currently installed models:
$ ollama list
NAME ID SIZE MODIFIED
You can see that we currently don’t have any downloaded.
Install and run the
llama3.2:1b
model.$ ollama run llama3.2:1b --verbose
pulling manifest pulling 74701a8c35f6... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.3 GB pulling 966de95ca8a6... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.4 KB pulling fcc5a6bec9da... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.7 KB pulling a70ff7e570d9... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 6.0 KB pulling 4f659a1e86d7... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B verifying sha256 digest writing manifest success >>> Send a message (/? for help)
Test Drive Ollama
Ask a question:
>>> why is water wet?
Water is wet because of its unique properties and the way it interacts with our senses. Here's why: 1. **Surface tension**: Water has a natural surface tension that allows it to resist external forces, such as gravity or air pressure, when it's at the surface. This means that water can maintain its shape and keep itself upright in a container, which is why it feels wet. 2. **Adhesion**: Water molecules have a strong attraction to each other, which helps them stick together and form a cohesive layer on surfaces. This adhesion creates a "skin" of sorts around the surface, making it feel wet and slippery. 3. **Sensory perception**: When we touch or put our skin in contact with water, our nerve endings detect changes in temperature, pressure, and texture, which triggers a sensation of wetness. The sensation is often described as cool, refreshing, and invigorating. 4. **Evaporation**: As water evaporates from the surface, it leaves behind a residue that can feel wet to our skin. This process helps to distribute heat evenly across the skin, making us feel cooler when we're wet. So, in summary, water is wet because of its unique combination of properties, such as surface tension, adhesion, and sensory perception. These factors work together to create an experience of wetness that's both enjoyable and essential for human survival. total duration: 37.769620479s load duration: 35.034502ms prompt eval count: 30 token(s) prompt eval duration: 1.542s prompt eval rate: 19.46 tokens/s eval count: 281 token(s) eval duration: 36.191s eval rate: 7.76 tokens/s >>> Send a message (/? for help)
NOTE: The very nature of LLMs assures each question is treated uniquely and, as such, each response, while hopefully consistent factually, varies slightly in its delivered verbiage. ie., the output in the example above won’t necessarily be the exact output you see when you ask the same question. This is normal behavour.
Notice the statistics output at the end. These were output because we included
--verbose
on the command line when starting Ollama.Exit Ollama entering
/bye
, or pressing ctrl-D.
Ollama Maintenance
With the Ollama serve running and some questions run through it, we can look at some stats for the server process using this command:
$ systemctl status ollama
● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
Active: active (running) since Wed 2025-01-01 07:01:41 CST; 3min 53s ago
Main PID: 34167 (ollama)
Tasks: 17 (limit: 9063)
Memory: 1.5G (peak: 1.5G)
CPU: 2min 4.806s
CGroup: /system.slice/ollama.service
├─34167 /usr/local/bin/ollama serve
└─34185 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 8192 --batch-size 512 --threads 4 --n>
Jan 01 07:01:46 rpi58gb2 ollama[34167]: llama_new_context_with_model: CPU output buffer size = 1.99 MiB
Jan 01 07:01:46 rpi58gb2 ollama[34167]: llama_new_context_with_model: CPU compute buffer size = 544.01 MiB
Jan 01 07:01:46 rpi58gb2 ollama[34167]: llama_new_context_with_model: graph nodes = 518
Jan 01 07:01:46 rpi58gb2 ollama[34167]: llama_new_context_with_model: graph splits = 1
Jan 01 07:01:46 rpi58gb2 ollama[34167]: time=2025-01-01T07:01:46.495-06:00 level=INFO source=server.go:594 msg="llama runner started in 2.51 seconds"
Jan 01 07:01:46 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:01:46 | 200 | 2.600923681s | 127.0.0.1 | POST "/api/generate"
Jan 01 07:02:19 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:02:19 | 200 | 31.07531176s | 127.0.0.1 | POST "/api/chat"
Jan 01 07:04:33 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:04:33 | 200 | 29.963µs | 127.0.0.1 | HEAD "/"
Jan 01 07:04:33 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:04:33 | 200 | 31.033856ms | 127.0.0.1 | POST "/api/show"
Jan 01 07:04:33 rpi58gb2 ollama[34167]: [GIN] 2025/01/01 - 07:04:33 | 200 | 31.687482ms | 127.0.0.1 | POST "/api/generate"
Here, we see the following:
The server has been up for 3min 53s.
Its peak memory usage is 1.5GB.
The amount of CPU time consumed is 2min 4.806s.
And notice this line:
34185 /usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 ...
Here we can see where our model files are located on the filesystem:
/usr/share/ollama/.ollama/models
Given the size of these model files, it’s often desireable to copy them from an LLM server on one host to a different LLM server running on a different host. Rather than pull them from the Internet multiple times, we can simply copy the relevant files from one host to the next.
In the next article, we’ll explore just how to do this. Stay tuned!