ThemeliOS

ThemeliOS (from Greek θεμέλιο — “foundation”) is an experimental capability-based microkernel operating system written in Rust. It is designed from the ground up to do one thing well: run container workloads securely.

What is ThemeliOS?

ThemeliOS is a from-scratch kernel — it does not use or build on top of Linux. It implements its own memory management, process scheduling, inter-process communication, and security model.

The long-term vision is a minimal, immutable OS that:

Boots on virtual machines and bare metal
Runs OCI-compatible container images
Serves as a Kubernetes/K3s worker node
Provides hardware-enforced isolation between containers via capabilities
Has no SSH, no shell, and no way to “log in” — all management is via API

Why build a new kernel?

Existing container OSes (Bottlerocket, Talos Linux, Flatcar) all use the Linux kernel with a stripped-down userspace. This is practical, but it inherits Linux’s security model — namespaces and cgroups are opt-in isolation bolted onto a kernel designed for general-purpose computing.

ThemeliOS takes the opposite approach: isolation is the default. The capability-based security model means a process has zero access to anything unless explicitly granted. There’s nothing to escape from because there’s no ambient authority to escalate to.

Project status

ThemeliOS is in early development. See the Milestones page for the current roadmap.

License

Development Setup

This guide walks through setting up a development environment for ThemeliOS on macOS or Linux.

Prerequisites

1. Rust nightly toolchain

ThemeliOS requires Rust nightly because the kernel uses unstable features (#![no_std], #![no_main], inline assembly, custom allocators).

The project pins the exact toolchain via rust-toolchain.toml, so you just need rustup installed — it will automatically download the correct nightly version.

Install rustup (if you don’t have it):

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

After cloning the repo, the first cargo command will automatically install the pinned nightly toolchain plus the bare-metal targets (x86_64-unknown-none, aarch64-unknown-none).

You can verify with:

rustup show

You should see a nightly toolchain with the x86_64-unknown-none and aarch64-unknown-none targets listed.

2. QEMU

QEMU emulates the hardware that ThemeliOS runs on. You need qemu-system-x86_64 for the primary amd64 target and optionally qemu-system-aarch64 for arm64.

macOS (Homebrew):

brew install qemu

This installs all QEMU system emulators.

3. xorriso

xorriso creates bootable ISO images. The build pipeline uses it to package the kernel with the Limine bootloader into a hybrid BIOS+UEFI ISO.

macOS (Homebrew):

brew install xorriso

Ubuntu/Debian:

sudo apt install xorriso

Fedora:

sudo dnf install xorriso

4. C compiler (for Limine CLI tool)

The first cargo xtask run downloads and builds the Limine bootloader’s CLI tool, which is a small C program. This requires a C compiler.

macOS: Xcode Command Line Tools (xcode-select --install)
Linux: gcc or clang (usually pre-installed)

Ubuntu/Debian:

sudo apt install qemu-system-x86 qemu-system-arm

Fedora:

sudo dnf install qemu-system-x86 qemu-system-aarch64

Arch Linux:

sudo pacman -S qemu-full

Verify installation:

qemu-system-x86_64 --version
qemu-system-aarch64 --version

3. mdbook (optional, for building documentation)

cargo install mdbook

Building and running

All build and run commands go through the xtask tool. You never need to invoke cargo build for the kernel directly.

Build the kernel

cargo xtask build

This cross-compiles the kernel for x86_64-unknown-none (the default target).

For arm64:

cargo xtask build --arch arm64

Run in QEMU

cargo xtask run

This builds the kernel, creates a bootable ISO, and launches it in QEMU in headless mode — serial output is piped to your terminal, but no graphical window opens. Press Ctrl+A, X to exit QEMU.

For arm64 (not yet implemented):

cargo xtask run --arch arm64

Build ISO only (without launching QEMU)

cargo xtask iso

This builds the kernel and creates a bootable ISO at target/themelios.iso without launching QEMU. Useful when you want to run QEMU manually with custom flags.

Run with QEMU display window

To see the QEMU graphical window (shows the Limine bootloader screen and any framebuffer output):

cargo xtask run --display

This does everything cargo xtask run does but opens a QEMU window instead of running headless. Serial output still goes to your terminal.

Build documentation

cargo xtask docs

This builds both the mdbook (to docs/book/) and the rustdoc API docs.

Shorthand alias

The workspace defines a cargo xt alias, so these also work:

cargo xt build
cargo xt run
cargo xt docs

Project layout

themelios/
├── kernel/          # The kernel crate (#![no_std], bare-metal)
│   └── src/
│       ├── main.rs  # Kernel entry point, module declarations
│       ├── arch/    # Architecture-specific (x86_64, aarch64)
│       ├── mm/      # Memory management
│       ├── sched/   # Scheduler
│       ├── cap/     # Capability system
│       ├── ipc/     # Inter-process communication
│       ├── drivers/ # Device drivers (VirtIO, serial, etc.)
│       ├── fs/      # Filesystem
│       └── net/     # Networking
├── xtask/           # Build tooling (runs on host)
├── docs/            # mdbook documentation
├── .cargo/          # Cargo configuration
└── CLAUDE.md        # Project documentation for AI assistants

IDE setup

VS Code

Install the rust-analyzer extension. It should pick up the workspace configuration automatically.

If rust-analyzer struggles with the #![no_std] kernel crate, you may need to add this to .vscode/settings.json:

{
    "rust-analyzer.cargo.target": "x86_64-unknown-none",
    "rust-analyzer.cargo.buildScripts.enable": true
}

Other editors

Any editor with rust-analyzer LSP support should work. The key setting is ensuring the target is set to x86_64-unknown-none for the kernel crate.

Troubleshooting

“can’t find crate for `core`”

This means the bare-metal target isn’t installed. Run:

rustup target add x86_64-unknown-none aarch64-unknown-none

Or let rust-toolchain.toml handle it by running any cargo command in the project.

“error: `-Zbuild-std` is unstable”

You need to be on the nightly toolchain. Check with rustup show — the project’s rust-toolchain.toml should select nightly automatically.

QEMU not found

Make sure QEMU is installed and on your $PATH. See the QEMU installation section above.

Bootloader

ThemeliOS uses the Limine bootloader. This page explains why, how it works, and how it fits into the build pipeline.

Why Limine?

We evaluated several options for booting ThemeliOS:

Option	Pros	Cons
Custom UEFI app	Full control	Massive effort, x86_64 UEFI only initially
Multiboot2	Simple, QEMU `-kernel` flag	BIOS only, no arm64, no UEFI
`bootloader` crate	Very easy Rust integration	x86_64 only, no arm64
Limine	BIOS + UEFI, x86_64 + arm64, well-maintained	External dependency

Limine was chosen because:

Multi-architecture: Supports x86_64 and aarch64 (and RISC-V, LoongArch). We need both for our cloud targets.
Multi-firmware: Works on both BIOS (legacy) and UEFI (modern). Cloud platforms use UEFI; QEMU defaults to BIOS.
Higher-half kernel: Limine sets up page tables that map our kernel at 0xffffffff80000000, which is the standard layout for 64-bit kernels.
Clean protocol: The Limine boot protocol gives us a memory map, framebuffer, and other boot info without writing any assembly.
Active maintenance: Regular releases, good documentation.

Cloud compatibility

Limine’s UEFI support means ThemeliOS can boot on:

AWS EC2 (Nitro): UEFI supported on most instance types
GCP Compute Engine: UEFI supported
Azure Gen2 VMs: UEFI
Bare metal: UEFI is standard on modern server hardware
QEMU/KVM: Both BIOS (default) and UEFI (via OVMF)

The same kernel binary works on all platforms — only the bootloader firmware interface differs, and Limine handles that.

How it works

Boot sequence

Firmware (BIOS or UEFI) loads the Limine bootloader from the boot media
Limine reads limine.conf to find the kernel path and boot protocol
Limine loads the kernel ELF into memory at the addresses specified in the linker script
Limine sets up:
- 64-bit long mode (x86_64) or EL1 (aarch64)
- 4-level page tables with identity + higher-half mappings
- A valid stack
Limine scans the kernel’s .requests ELF section for boot protocol requests
Limine fills in the requests (memory map, framebuffer, etc.)
Limine jumps to the kernel entry point (kmain)

Boot protocol requests

The kernel communicates with Limine through static data structures placed in a special ELF section. These are “requests” — the kernel declares what boot information it needs, and Limine fills in the responses.

#![allow(unused)]
fn main() {
// Placed in the .requests ELF section via the linker script
#[used]
#[link_section = ".requests"]
static BASE_REVISION: BaseRevision = BaseRevision::new();
}

The linker script places these between start/end markers so Limine knows where to scan:

.data : {
    ...
    KEEP(*(.requests_start_marker))
    KEEP(*(.requests))
    KEEP(*(.requests_end_marker))
}

Configuration file

limine.conf (in the project root) uses the v8 format:

timeout: 0

/ThemeliOS
    protocol: limine
    kernel_path: boot():/boot/themelios

timeout: 0 — boot immediately without showing a menu
/ThemeliOS — defines a boot entry
protocol: limine — use the Limine protocol (not Linux or Multiboot)
kernel_path: boot():/boot/themelios — load the kernel from the boot volume

Linker script

The linker script (kernel/linker-x86_64.ld) controls the kernel’s memory layout:

Entry point: ENTRY(kmain) — tells the ELF where execution begins
Load address: 0xffffffff80000000 — the higher-half virtual address
Sections: .text (code), .rodata (constants), .data (mutable data + Limine requests), .bss (zeroed data)

The kernel must be compiled with -Crelocation-model=static to produce a non-PIE executable with fixed addresses that match the linker script.

Build pipeline

The cargo xtask run command handles the full pipeline:

Cross-compile the kernel for x86_64-unknown-none
Download Limine (one-time: git clone of the v8.x-binary branch to target/limine/)
Build Limine CLI (one-time: make compiles limine.c)
Create ISO via xorriso:
- Copies kernel, Limine files, and limine.conf into an ISO directory structure
- Creates a hybrid BIOS+UEFI bootable ISO
- Installs BIOS boot sectors via limine bios-install
Launch QEMU with the ISO attached as a CD-ROM

Limine version

Bootloader: v8.x (binary distribution from v8.x-binary branch)
Rust crate: limine = "0.5" (boot protocol structures)

The bootloader binaries are cached in target/limine/ and not committed to git.

Architecture Overview

ThemeliOS is a capability-based microkernel. This page explains the high-level design and the reasoning behind key architectural decisions.

Microkernel vs monolithic

In a monolithic kernel (like Linux), drivers, filesystems, and networking all run inside the kernel with full hardware access. A bug in any driver can crash or compromise the entire system.

In a microkernel, only the absolute minimum runs in kernel space:

Kernel space	Userspace
Memory management	Device drivers
Process scheduling	Filesystem
IPC (message passing)	Network stack
Capability enforcement	Container runtime
	Management API

Everything else runs as isolated userspace processes that communicate via IPC. A buggy driver crashes its own process, not the kernel.

Why microkernel for ThemeliOS? Since we’re building an OS specifically for running untrusted container workloads, minimizing the trusted computing base (the code that can compromise the whole system) is critical. The smaller the kernel, the smaller the attack surface.

Capability-based security

ThemeliOS does not use Linux-style permissions (UID/GID, filesystem permissions) or Linux-style isolation (namespaces, cgroups). Instead, it uses capabilities.

What is a capability?

A capability is an unforgeable token that grants its holder specific permissions on a specific resource. For example:

“Read and write to memory region 0x1000–0x2000”
“Send messages to IPC endpoint #42”
“Access VirtIO block device at MMIO address 0xFE00”

Key properties

No ambient authority: A newly created process has zero capabilities. It can’t do anything until its parent grants it capabilities.
Unforgeable: Capabilities are managed by the kernel. Userspace can’t create them or guess valid ones.
Transferable: Capabilities can be passed between processes via IPC, enabling controlled delegation.
Revocable: A capability can be revoked, immediately cutting off access.

Why not namespaces?

Linux namespaces are “isolation after the fact” — processes start with broad access and namespaces restrict what they can see. Capabilities are “isolation by default” — processes start with nothing and are explicitly granted only what they need.

For a container OS, this means a compromised container literally cannot access resources it wasn’t given capabilities for. There’s no kernel interface to probe, no /proc to read, no syscall to escalate through — the authority simply doesn’t exist.

Inspiration

seL4: Formally verified capability microkernel. ThemeliOS borrows its capability model.
Fuchsia/Zircon: Google’s capability-based OS. Demonstrates the model works at scale.

Memory model

ThemeliOS uses hardware-enforced memory isolation:

Each process runs in its own virtual address space (page tables enforced by the MMU).
The kernel has its own address space that userspace cannot access.
Shared memory between processes requires explicit capabilities from both sides.

Physical memory management

A frame allocator tracks free physical memory pages (4 KiB). Frames are allocated to:

Process page tables
Kernel heap
Shared memory regions
DMA buffers for device drivers

Virtual memory layout

The virtual address space layout will be defined per-architecture, but the general structure is:

0x0000_0000_0000_0000  ┌──────────────────────┐
                        │   Userspace           │
                        │   (per-process)       │
0x0000_7FFF_FFFF_FFFF  └──────────────────────┘
                        ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
                          Non-canonical hole
                        └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
0xFFFF_8000_0000_0000  ┌──────────────────────┐
                        │   Kernel space        │
                        │   (shared, all procs) │
0xFFFF_FFFF_FFFF_FFFF  └──────────────────────┘

(This is the x86_64 layout; aarch64 is similar but with different conventions.)

IPC

Inter-process communication is the backbone of the microkernel. Since drivers, filesystems, and networking all run in userspace, every system operation involves IPC.

Synchronous message passing

The primary mechanism: a client sends a message to a server and blocks until it gets a reply. This is used for request/response patterns like “read this file” or “send this network packet.”

Performance consideration

IPC overhead is the classic criticism of microkernels. ThemeliOS will address this by:

Keeping messages small (pointers to shared memory for bulk data)
Using register-based fast-path for small messages
Careful cache-aware scheduling of communicating processes

Immutability

The OS root filesystem is read-only. The entire OS image is a single artifact that is booted as-is.

Updates: Swap the entire image. No package managers, no apt-get, no partial updates.
Configuration: Injected at boot time via cloud-init-style metadata or the management API.
Ephemeral state: Container images and runtime state live on a RAM-backed ephemeral layer that is lost on reboot.

This model treats nodes as cattle: if a node is unhealthy, replace it with a fresh one. No debugging on the node, no SSHing in, no manual fixes.

Target platforms

ThemeliOS is designed to run as a virtual machine, with bare-metal support as a secondary goal.

Platform	Status	Notes
QEMU/KVM (x86_64)	Primary dev target	Used for all development and testing
QEMU (aarch64)	Secondary dev target	ARM64 support
AWS (EC2)	Future	Nitro hypervisor
GCP (Compute Engine)	Future	KVM-based
Azure (VMs)	Future	Hyper-V
Bare metal (headless)	Future	Server hardware, no GPU/display

Capability System

This document details the design of ThemeliOS’s capability system — the core security mechanism of the kernel.

Status: Design phase. Implementation begins in Phase 2.

Overview

In ThemeliOS, every resource is accessed through capabilities. A capability is a kernel-managed, unforgeable token that encodes:

Which resource (identified by a kernel object ID)
What operations are permitted (a bitmask of rights)

Capability types

Capability type	Resource	Example rights
`MemoryCap`	Physical memory region	Read, Write, Execute, Map
`EndpointCap`	IPC endpoint	Send, Receive
`ThreadCap`	Thread/process	Start, Stop, Suspend, Resume
`DeviceCap`	Hardware device (MMIO region)	Read, Write
`IRQCap`	Interrupt line	Acknowledge, Bind

Capability spaces

Each process has a capability space (CSpace) — a table mapping local capability slots to kernel objects. A process refers to its capabilities by slot index, not by object ID. The kernel translates slot indices to objects on each syscall.

Process A's CSpace:
  Slot 0 → MemoryCap(region=0x1000, rights=RW)
  Slot 1 → EndpointCap(endpoint=#7, rights=Send)
  Slot 2 → (empty)
  Slot 3 → ThreadCap(thread=#12, rights=Start|Stop)

Process B's CSpace:
  Slot 0 → EndpointCap(endpoint=#7, rights=Receive)
  Slot 1 → MemoryCap(region=0x2000, rights=R)

Process A can send to endpoint #7 (slot 1), and Process B can receive from it (slot 0). Neither can access the other’s memory — they’d need explicit capabilities for that.

Capability operations

Grant

A parent process can grant a capability to a child process, optionally with reduced rights:

Parent has: MemoryCap(region=X, rights=RWX)
Parent grants child: MemoryCap(region=X, rights=R)

The child gets read-only access. Rights can only be reduced, never elevated.

Transfer via IPC

Capabilities can be attached to IPC messages. This is how services delegate access:

FileServer receives "open /config" request
FileServer replies with MemoryCap(region=file_data, rights=R)
Client now has read access to the file's memory region

Revoke

The kernel (or a process with the appropriate meta-capability) can revoke a capability, immediately invalidating it. Any future use of the revoked slot returns an error.

Container mapping

In ThemeliOS, a “container” is a group of processes sharing a common set of capabilities. The container’s capability set defines its sandbox:

Memory: Only the memory regions granted to it
Network: Only the network endpoints it has capabilities for
Filesystem: Only the filesystem views it’s been granted
IPC: Only the services it has endpoint capabilities for

A container cannot discover or access anything outside its capability set. Unlike Linux containers (where a kernel exploit can escape the namespace), escaping a capability sandbox requires forging a kernel object — which is impossible without a kernel memory corruption bug.

Comparison with Linux isolation

Aspect	Linux (namespaces/cgroups)	ThemeliOS (capabilities)
Default	Access everything, restrict selectively	Access nothing, grant explicitly
Enforcement	Kernel checks on each syscall	No syscall exists without capability
Escape risk	Kernel bugs can bypass namespaces	Requires kernel memory corruption
Resource discovery	Can probe for resources	Can’t even address unknown resources
Granularity	Per-namespace	Per-object, per-right

Memory Management

This document describes ThemeliOS’s memory management subsystem design.

Status: Design phase. Implementation begins in Phase 1.

Overview

The memory management (MM) subsystem is responsible for:

Physical frame allocation — tracking which 4 KiB pages of physical RAM are free or in use
Virtual memory — creating and managing page tables for each process
Kernel heap — providing dynamic allocation (alloc-style) for kernel data structures

Physical memory

Boot-time discovery

The bootloader provides a memory map describing which physical address ranges are usable RAM, reserved by firmware, or used for MMIO. The frame allocator uses this map to initialize its free list.

Frame allocator

The frame allocator hands out 4 KiB physical memory frames. Initial implementation will use a bitmap allocator:

One bit per physical frame (1 = allocated, 0 = free)
Simple, predictable, easy to implement
For 4 GiB of RAM: bitmap is 128 KiB (manageable)

Later optimization: replace with a buddy allocator for efficient allocation of contiguous multi-frame regions (needed for DMA buffers, large pages).

Capability integration

Physical frames are resources protected by capabilities. When a process requests memory:

Kernel allocates a frame from the free pool
Kernel creates a MemoryCap for that frame
Kernel inserts the capability into the process’s CSpace
Process can now map the frame into its address space using the capability

A process cannot access physical memory it doesn’t have a capability for — the page tables are configured to reflect capability permissions.

Virtual memory

Address space layout (x86_64)

 Lower half (user space, per-process):
   0x0000_0000_0000_0000 - 0x0000_7FFF_FFFF_FFFF

 Upper half (kernel space, shared across all processes):
   0xFFFF_8000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF
     ├── Physical memory direct map
     ├── Kernel code and data
     ├── Kernel heap
     └── Per-CPU data

Page tables

x86_64 uses 4-level page tables (PML4 → PDPT → PD → PT), each with 512 entries. Each entry is 8 bytes and can point to:

The next level table
A large page (2 MiB at PD level, 1 GiB at PDPT level)
A 4 KiB page (at PT level)

The kernel manages page tables for each process. When a context switch occurs, the CPU’s CR3 register is loaded with the new process’s PML4 physical address, instantly switching the entire address space.

aarch64 differences

aarch64 uses a similar 4-level translation table scheme but with different register names (TTBR0/TTBR1 instead of CR3) and different table entry formats. The architecture abstraction layer hides these differences from the rest of the kernel.

Kernel heap

The kernel needs dynamic allocation for data structures like:

Process control blocks
Capability tables
IPC message buffers
Driver state

We’ll use the linked_list_allocator crate initially (a simple free-list allocator suitable for #![no_std] kernels), backed by physical frames allocated from the frame allocator.

The kernel heap lives in the upper-half virtual address space and is shared across all contexts (but only accessible from kernel mode).

Memory safety

Rust’s ownership model provides compile-time guarantees against:

Use-after-free: The compiler prevents using a frame after it’s been freed
Double-free: The compiler prevents freeing a frame twice
Data races: Shared mutable access requires synchronization (Mutex, RefCell)

The unsafe keyword is required for raw pointer operations (hardware register access, page table manipulation) — these are confined to small, well-documented blocks.