ThemeliOS
ThemeliOS (from Greek θεμέλιο — “foundation”) is an experimental capability-based microkernel operating system written in Rust. It is designed from the ground up to do one thing well: run container workloads securely.
What is ThemeliOS?
ThemeliOS is a from-scratch kernel — it does not use or build on top of Linux. It implements its own memory management, process scheduling, inter-process communication, and security model.
The long-term vision is a minimal, immutable OS that:
- Boots on virtual machines and bare metal
- Runs OCI-compatible container images
- Serves as a Kubernetes/K3s worker node
- Provides hardware-enforced isolation between containers via capabilities
- Has no SSH, no shell, and no way to “log in” — all management is via API
Why build a new kernel?
Existing container OSes (Bottlerocket, Talos Linux, Flatcar) all use the Linux kernel with a stripped-down userspace. This is practical, but it inherits Linux’s security model — namespaces and cgroups are opt-in isolation bolted onto a kernel designed for general-purpose computing.
ThemeliOS takes the opposite approach: isolation is the default. The capability-based security model means a process has zero access to anything unless explicitly granted. There’s nothing to escape from because there’s no ambient authority to escalate to.
Project status
ThemeliOS is in early development. See the Milestones page for the current roadmap.
License
MIT — Copyright (c) 2026 Rudi MK
Development Setup
This guide walks through setting up a development environment for ThemeliOS on macOS or Linux.
Prerequisites
1. Rust nightly toolchain
ThemeliOS requires Rust nightly because the kernel uses unstable features (#![no_std], #![no_main], inline assembly, custom allocators).
The project pins the exact toolchain via rust-toolchain.toml, so you just need rustup installed — it will automatically download the correct nightly version.
Install rustup (if you don’t have it):
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
After cloning the repo, the first cargo command will automatically install the pinned nightly toolchain plus the bare-metal targets (x86_64-unknown-none, aarch64-unknown-none).
You can verify with:
rustup show
You should see a nightly toolchain with the x86_64-unknown-none and aarch64-unknown-none targets listed.
2. QEMU
QEMU emulates the hardware that ThemeliOS runs on. You need qemu-system-x86_64 for the primary amd64 target and optionally qemu-system-aarch64 for arm64.
macOS (Homebrew):
brew install qemu
This installs all QEMU system emulators.
3. xorriso
xorriso creates bootable ISO images. The build pipeline uses it to package the kernel with the Limine bootloader into a hybrid BIOS+UEFI ISO.
macOS (Homebrew):
brew install xorriso
Ubuntu/Debian:
sudo apt install xorriso
Fedora:
sudo dnf install xorriso
4. C compiler (for Limine CLI tool)
The first cargo xtask run downloads and builds the Limine bootloader’s CLI tool, which is a small C program. This requires a C compiler.
- macOS: Xcode Command Line Tools (
xcode-select --install) - Linux:
gccorclang(usually pre-installed)
Ubuntu/Debian:
sudo apt install qemu-system-x86 qemu-system-arm
Fedora:
sudo dnf install qemu-system-x86 qemu-system-aarch64
Arch Linux:
sudo pacman -S qemu-full
Verify installation:
qemu-system-x86_64 --version
qemu-system-aarch64 --version
3. mdbook (optional, for building documentation)
cargo install mdbook
Building and running
All build and run commands go through the xtask tool. You never need to invoke cargo build for the kernel directly.
Build the kernel
cargo xtask build
This cross-compiles the kernel for x86_64-unknown-none (the default target).
For arm64:
cargo xtask build --arch arm64
Run in QEMU
cargo xtask run
This builds the kernel, creates a bootable ISO, and launches it in QEMU in headless mode — serial output is piped to your terminal, but no graphical window opens. Press Ctrl+A, X to exit QEMU.
For arm64 (not yet implemented):
cargo xtask run --arch arm64
Build ISO only (without launching QEMU)
cargo xtask iso
This builds the kernel and creates a bootable ISO at target/themelios.iso without launching QEMU. Useful when you want to run QEMU manually with custom flags.
Run with QEMU display window
To see the QEMU graphical window (shows the Limine bootloader screen and any framebuffer output):
cargo xtask run --display
This does everything cargo xtask run does but opens a QEMU window instead of running headless. Serial output still goes to your terminal.
Build documentation
cargo xtask docs
This builds both the mdbook (to docs/book/) and the rustdoc API docs.
Shorthand alias
The workspace defines a cargo xt alias, so these also work:
cargo xt build
cargo xt run
cargo xt docs
Project layout
themelios/
├── kernel/ # The kernel crate (#![no_std], bare-metal)
│ └── src/
│ ├── main.rs # Kernel entry point, module declarations
│ ├── arch/ # Architecture-specific (x86_64, aarch64)
│ ├── mm/ # Memory management
│ ├── sched/ # Scheduler
│ ├── cap/ # Capability system
│ ├── ipc/ # Inter-process communication
│ ├── drivers/ # Device drivers (VirtIO, serial, etc.)
│ ├── fs/ # Filesystem
│ └── net/ # Networking
├── xtask/ # Build tooling (runs on host)
├── docs/ # mdbook documentation
├── .cargo/ # Cargo configuration
└── CLAUDE.md # Project documentation for AI assistants
IDE setup
VS Code
Install the rust-analyzer extension. It should pick up the workspace configuration automatically.
If rust-analyzer struggles with the #![no_std] kernel crate, you may need to add this to .vscode/settings.json:
{
"rust-analyzer.cargo.target": "x86_64-unknown-none",
"rust-analyzer.cargo.buildScripts.enable": true
}
Other editors
Any editor with rust-analyzer LSP support should work. The key setting is ensuring the target is set to x86_64-unknown-none for the kernel crate.
Troubleshooting
“can’t find crate for core”
This means the bare-metal target isn’t installed. Run:
rustup target add x86_64-unknown-none aarch64-unknown-none
Or let rust-toolchain.toml handle it by running any cargo command in the project.
“error: -Zbuild-std is unstable”
You need to be on the nightly toolchain. Check with rustup show — the project’s rust-toolchain.toml should select nightly automatically.
QEMU not found
Make sure QEMU is installed and on your $PATH. See the QEMU installation section above.
Bootloader
ThemeliOS uses the Limine bootloader. This page explains why, how it works, and how it fits into the build pipeline.
Why Limine?
We evaluated several options for booting ThemeliOS:
| Option | Pros | Cons |
|---|---|---|
| Custom UEFI app | Full control | Massive effort, x86_64 UEFI only initially |
| Multiboot2 | Simple, QEMU -kernel flag | BIOS only, no arm64, no UEFI |
bootloader crate | Very easy Rust integration | x86_64 only, no arm64 |
| Limine | BIOS + UEFI, x86_64 + arm64, well-maintained | External dependency |
Limine was chosen because:
- Multi-architecture: Supports x86_64 and aarch64 (and RISC-V, LoongArch). We need both for our cloud targets.
- Multi-firmware: Works on both BIOS (legacy) and UEFI (modern). Cloud platforms use UEFI; QEMU defaults to BIOS.
- Higher-half kernel: Limine sets up page tables that map our kernel at
0xffffffff80000000, which is the standard layout for 64-bit kernels. - Clean protocol: The Limine boot protocol gives us a memory map, framebuffer, and other boot info without writing any assembly.
- Active maintenance: Regular releases, good documentation.
Cloud compatibility
Limine’s UEFI support means ThemeliOS can boot on:
- AWS EC2 (Nitro): UEFI supported on most instance types
- GCP Compute Engine: UEFI supported
- Azure Gen2 VMs: UEFI
- Bare metal: UEFI is standard on modern server hardware
- QEMU/KVM: Both BIOS (default) and UEFI (via OVMF)
The same kernel binary works on all platforms — only the bootloader firmware interface differs, and Limine handles that.
How it works
Boot sequence
- Firmware (BIOS or UEFI) loads the Limine bootloader from the boot media
- Limine reads
limine.confto find the kernel path and boot protocol - Limine loads the kernel ELF into memory at the addresses specified in the linker script
- Limine sets up:
- 64-bit long mode (x86_64) or EL1 (aarch64)
- 4-level page tables with identity + higher-half mappings
- A valid stack
- Limine scans the kernel’s
.requestsELF section for boot protocol requests - Limine fills in the requests (memory map, framebuffer, etc.)
- Limine jumps to the kernel entry point (
kmain)
Boot protocol requests
The kernel communicates with Limine through static data structures placed in a special ELF section. These are “requests” — the kernel declares what boot information it needs, and Limine fills in the responses.
#![allow(unused)]
fn main() {
// Placed in the .requests ELF section via the linker script
#[used]
#[link_section = ".requests"]
static BASE_REVISION: BaseRevision = BaseRevision::new();
}
The linker script places these between start/end markers so Limine knows where to scan:
.data : {
...
KEEP(*(.requests_start_marker))
KEEP(*(.requests))
KEEP(*(.requests_end_marker))
}
Configuration file
limine.conf (in the project root) uses the v8 format:
timeout: 0
/ThemeliOS
protocol: limine
kernel_path: boot():/boot/themelios
timeout: 0— boot immediately without showing a menu/ThemeliOS— defines a boot entryprotocol: limine— use the Limine protocol (not Linux or Multiboot)kernel_path: boot():/boot/themelios— load the kernel from the boot volume
Linker script
The linker script (kernel/linker-x86_64.ld) controls the kernel’s memory layout:
- Entry point:
ENTRY(kmain)— tells the ELF where execution begins - Load address:
0xffffffff80000000— the higher-half virtual address - Sections:
.text(code),.rodata(constants),.data(mutable data + Limine requests),.bss(zeroed data)
The kernel must be compiled with -Crelocation-model=static to produce a non-PIE executable with fixed addresses that match the linker script.
Build pipeline
The cargo xtask run command handles the full pipeline:
- Cross-compile the kernel for
x86_64-unknown-none - Download Limine (one-time:
git cloneof thev8.x-binarybranch totarget/limine/) - Build Limine CLI (one-time:
makecompileslimine.c) - Create ISO via
xorriso:- Copies kernel, Limine files, and
limine.confinto an ISO directory structure - Creates a hybrid BIOS+UEFI bootable ISO
- Installs BIOS boot sectors via
limine bios-install
- Copies kernel, Limine files, and
- Launch QEMU with the ISO attached as a CD-ROM
Limine version
- Bootloader: v8.x (binary distribution from
v8.x-binarybranch) - Rust crate:
limine = "0.5"(boot protocol structures)
The bootloader binaries are cached in target/limine/ and not committed to git.
Architecture Overview
ThemeliOS is a capability-based microkernel. This page explains the high-level design and the reasoning behind key architectural decisions.
Microkernel vs monolithic
In a monolithic kernel (like Linux), drivers, filesystems, and networking all run inside the kernel with full hardware access. A bug in any driver can crash or compromise the entire system.
In a microkernel, only the absolute minimum runs in kernel space:
| Kernel space | Userspace |
|---|---|
| Memory management | Device drivers |
| Process scheduling | Filesystem |
| IPC (message passing) | Network stack |
| Capability enforcement | Container runtime |
| Management API |
Everything else runs as isolated userspace processes that communicate via IPC. A buggy driver crashes its own process, not the kernel.
Why microkernel for ThemeliOS? Since we’re building an OS specifically for running untrusted container workloads, minimizing the trusted computing base (the code that can compromise the whole system) is critical. The smaller the kernel, the smaller the attack surface.
Capability-based security
ThemeliOS does not use Linux-style permissions (UID/GID, filesystem permissions) or Linux-style isolation (namespaces, cgroups). Instead, it uses capabilities.
What is a capability?
A capability is an unforgeable token that grants its holder specific permissions on a specific resource. For example:
- “Read and write to memory region 0x1000–0x2000”
- “Send messages to IPC endpoint #42”
- “Access VirtIO block device at MMIO address 0xFE00”
Key properties
-
No ambient authority: A newly created process has zero capabilities. It can’t do anything until its parent grants it capabilities.
-
Unforgeable: Capabilities are managed by the kernel. Userspace can’t create them or guess valid ones.
-
Transferable: Capabilities can be passed between processes via IPC, enabling controlled delegation.
-
Revocable: A capability can be revoked, immediately cutting off access.
Why not namespaces?
Linux namespaces are “isolation after the fact” — processes start with broad access and namespaces restrict what they can see. Capabilities are “isolation by default” — processes start with nothing and are explicitly granted only what they need.
For a container OS, this means a compromised container literally cannot access resources it wasn’t given capabilities for. There’s no kernel interface to probe, no /proc to read, no syscall to escalate through — the authority simply doesn’t exist.
Inspiration
- seL4: Formally verified capability microkernel. ThemeliOS borrows its capability model.
- Fuchsia/Zircon: Google’s capability-based OS. Demonstrates the model works at scale.
Memory model
ThemeliOS uses hardware-enforced memory isolation:
- Each process runs in its own virtual address space (page tables enforced by the MMU).
- The kernel has its own address space that userspace cannot access.
- Shared memory between processes requires explicit capabilities from both sides.
Physical memory management
A frame allocator tracks free physical memory pages (4 KiB). Frames are allocated to:
- Process page tables
- Kernel heap
- Shared memory regions
- DMA buffers for device drivers
Virtual memory layout
The virtual address space layout will be defined per-architecture, but the general structure is:
0x0000_0000_0000_0000 ┌──────────────────────┐
│ Userspace │
│ (per-process) │
0x0000_7FFF_FFFF_FFFF └──────────────────────┘
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
Non-canonical hole
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
0xFFFF_8000_0000_0000 ┌──────────────────────┐
│ Kernel space │
│ (shared, all procs) │
0xFFFF_FFFF_FFFF_FFFF └──────────────────────┘
(This is the x86_64 layout; aarch64 is similar but with different conventions.)
IPC
Inter-process communication is the backbone of the microkernel. Since drivers, filesystems, and networking all run in userspace, every system operation involves IPC.
Synchronous message passing
The primary mechanism: a client sends a message to a server and blocks until it gets a reply. This is used for request/response patterns like “read this file” or “send this network packet.”
Performance consideration
IPC overhead is the classic criticism of microkernels. ThemeliOS will address this by:
- Keeping messages small (pointers to shared memory for bulk data)
- Using register-based fast-path for small messages
- Careful cache-aware scheduling of communicating processes
Immutability
The OS root filesystem is read-only. The entire OS image is a single artifact that is booted as-is.
- Updates: Swap the entire image. No package managers, no apt-get, no partial updates.
- Configuration: Injected at boot time via cloud-init-style metadata or the management API.
- Ephemeral state: Container images and runtime state live on a RAM-backed ephemeral layer that is lost on reboot.
This model treats nodes as cattle: if a node is unhealthy, replace it with a fresh one. No debugging on the node, no SSHing in, no manual fixes.
Target platforms
ThemeliOS is designed to run as a virtual machine, with bare-metal support as a secondary goal.
| Platform | Status | Notes |
|---|---|---|
| QEMU/KVM (x86_64) | Primary dev target | Used for all development and testing |
| QEMU (aarch64) | Secondary dev target | ARM64 support |
| AWS (EC2) | Future | Nitro hypervisor |
| GCP (Compute Engine) | Future | KVM-based |
| Azure (VMs) | Future | Hyper-V |
| Bare metal (headless) | Future | Server hardware, no GPU/display |
Capability System
This document details the design of ThemeliOS’s capability system — the core security mechanism of the kernel.
Status: Design phase. Implementation begins in Phase 2.
Overview
In ThemeliOS, every resource is accessed through capabilities. A capability is a kernel-managed, unforgeable token that encodes:
- Which resource (identified by a kernel object ID)
- What operations are permitted (a bitmask of rights)
Capability types
| Capability type | Resource | Example rights |
|---|---|---|
MemoryCap | Physical memory region | Read, Write, Execute, Map |
EndpointCap | IPC endpoint | Send, Receive |
ThreadCap | Thread/process | Start, Stop, Suspend, Resume |
DeviceCap | Hardware device (MMIO region) | Read, Write |
IRQCap | Interrupt line | Acknowledge, Bind |
Capability spaces
Each process has a capability space (CSpace) — a table mapping local capability slots to kernel objects. A process refers to its capabilities by slot index, not by object ID. The kernel translates slot indices to objects on each syscall.
Process A's CSpace:
Slot 0 → MemoryCap(region=0x1000, rights=RW)
Slot 1 → EndpointCap(endpoint=#7, rights=Send)
Slot 2 → (empty)
Slot 3 → ThreadCap(thread=#12, rights=Start|Stop)
Process B's CSpace:
Slot 0 → EndpointCap(endpoint=#7, rights=Receive)
Slot 1 → MemoryCap(region=0x2000, rights=R)
Process A can send to endpoint #7 (slot 1), and Process B can receive from it (slot 0). Neither can access the other’s memory — they’d need explicit capabilities for that.
Capability operations
Grant
A parent process can grant a capability to a child process, optionally with reduced rights:
Parent has: MemoryCap(region=X, rights=RWX)
Parent grants child: MemoryCap(region=X, rights=R)
The child gets read-only access. Rights can only be reduced, never elevated.
Transfer via IPC
Capabilities can be attached to IPC messages. This is how services delegate access:
FileServer receives "open /config" request
FileServer replies with MemoryCap(region=file_data, rights=R)
Client now has read access to the file's memory region
Revoke
The kernel (or a process with the appropriate meta-capability) can revoke a capability, immediately invalidating it. Any future use of the revoked slot returns an error.
Container mapping
In ThemeliOS, a “container” is a group of processes sharing a common set of capabilities. The container’s capability set defines its sandbox:
- Memory: Only the memory regions granted to it
- Network: Only the network endpoints it has capabilities for
- Filesystem: Only the filesystem views it’s been granted
- IPC: Only the services it has endpoint capabilities for
A container cannot discover or access anything outside its capability set. Unlike Linux containers (where a kernel exploit can escape the namespace), escaping a capability sandbox requires forging a kernel object — which is impossible without a kernel memory corruption bug.
Comparison with Linux isolation
| Aspect | Linux (namespaces/cgroups) | ThemeliOS (capabilities) |
|---|---|---|
| Default | Access everything, restrict selectively | Access nothing, grant explicitly |
| Enforcement | Kernel checks on each syscall | No syscall exists without capability |
| Escape risk | Kernel bugs can bypass namespaces | Requires kernel memory corruption |
| Resource discovery | Can probe for resources | Can’t even address unknown resources |
| Granularity | Per-namespace | Per-object, per-right |
Memory Management
This document describes ThemeliOS’s memory management subsystem design.
Status: Design phase. Implementation begins in Phase 1.
Overview
The memory management (MM) subsystem is responsible for:
- Physical frame allocation — tracking which 4 KiB pages of physical RAM are free or in use
- Virtual memory — creating and managing page tables for each process
- Kernel heap — providing dynamic allocation (
alloc-style) for kernel data structures
Physical memory
Boot-time discovery
The bootloader provides a memory map describing which physical address ranges are usable RAM, reserved by firmware, or used for MMIO. The frame allocator uses this map to initialize its free list.
Frame allocator
The frame allocator hands out 4 KiB physical memory frames. Initial implementation will use a bitmap allocator:
- One bit per physical frame (1 = allocated, 0 = free)
- Simple, predictable, easy to implement
- For 4 GiB of RAM: bitmap is 128 KiB (manageable)
Later optimization: replace with a buddy allocator for efficient allocation of contiguous multi-frame regions (needed for DMA buffers, large pages).
Capability integration
Physical frames are resources protected by capabilities. When a process requests memory:
- Kernel allocates a frame from the free pool
- Kernel creates a
MemoryCapfor that frame - Kernel inserts the capability into the process’s CSpace
- Process can now map the frame into its address space using the capability
A process cannot access physical memory it doesn’t have a capability for — the page tables are configured to reflect capability permissions.
Virtual memory
Address space layout (x86_64)
Lower half (user space, per-process):
0x0000_0000_0000_0000 - 0x0000_7FFF_FFFF_FFFF
Upper half (kernel space, shared across all processes):
0xFFFF_8000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF
├── Physical memory direct map
├── Kernel code and data
├── Kernel heap
└── Per-CPU data
Page tables
x86_64 uses 4-level page tables (PML4 → PDPT → PD → PT), each with 512 entries. Each entry is 8 bytes and can point to:
- The next level table
- A large page (2 MiB at PD level, 1 GiB at PDPT level)
- A 4 KiB page (at PT level)
The kernel manages page tables for each process. When a context switch occurs, the CPU’s CR3 register is loaded with the new process’s PML4 physical address, instantly switching the entire address space.
aarch64 differences
aarch64 uses a similar 4-level translation table scheme but with different register names (TTBR0/TTBR1 instead of CR3) and different table entry formats. The architecture abstraction layer hides these differences from the rest of the kernel.
Kernel heap
The kernel needs dynamic allocation for data structures like:
- Process control blocks
- Capability tables
- IPC message buffers
- Driver state
We’ll use the linked_list_allocator crate initially (a simple free-list allocator suitable for #![no_std] kernels), backed by physical frames allocated from the frame allocator.
The kernel heap lives in the upper-half virtual address space and is shared across all contexts (but only accessible from kernel mode).
Memory safety
Rust’s ownership model provides compile-time guarantees against:
- Use-after-free: The compiler prevents using a frame after it’s been freed
- Double-free: The compiler prevents freeing a frame twice
- Data races: Shared mutable access requires synchronization (
Mutex,RefCell)
The unsafe keyword is required for raw pointer operations (hardware register access, page table manipulation) — these are confined to small, well-documented blocks.
Milestones
ThemeliOS development is organized into phases. Each phase builds on the previous one and produces a working, testable artifact.
Phase 0 — Boot
Goal: Get the kernel booting on QEMU and printing to the serial console.
Deliverables:
- Bootloader integration (Limine or UEFI)
- Architecture-specific early init (x86_64 first)
- Serial console output (16550 UART on x86_64)
- “Hello from ThemeliOS” printed on boot
cargo xtask runboots the kernel in QEMU end-to-end
What you’ll learn: Bare-metal Rust, the boot process, how hardware/QEMU works at the lowest level.
Phase 1 — Kernel basics
Goal: A kernel that can manage memory and schedule tasks.
Deliverables:
- Physical frame allocator (bitmap-based)
- Virtual memory manager (page table setup, higher-half kernel)
- Kernel heap allocator
- Interrupt handling (IDT on x86_64, GIC on aarch64)
- Timer-driven preemptive scheduler (round-robin)
- Basic kernel shell over serial (for debugging, will be removed later)
- aarch64 port of Phase 0 + Phase 1
Phase 2 — Isolation
Goal: Implement the capability system and process isolation.
Deliverables:
- Capability types and capability space (CSpace)
- Process creation with isolated address spaces
- Capability grant, transfer, and revocation
- Synchronous IPC (message passing between processes)
- First userspace process (init)
Phase 3 — Storage
Goal: Read from a virtual disk and present a filesystem.
Deliverables:
- VirtIO block driver (for QEMU’s virtual disk)
- Read-only filesystem (simple format, possibly custom or FAT)
- RAM-backed ephemeral writable layer
- Immutable root image creation tooling
Phase 4 — Networking
Goal: TCP/IP connectivity.
Deliverables:
- VirtIO network driver
- Ethernet, ARP, IPv4
- TCP and UDP
- Basic socket-like API via capabilities
- DHCP client
Phase 5 — Containers
Goal: Run OCI container images.
Deliverables:
- OCI image format parsing and layer unpacking
- Container lifecycle (create, start, stop, destroy)
- Container-to-capability mapping (each container gets a capability set)
- Container networking (virtual interfaces, isolation)
- Log streaming from containers
Phase 6 — Management
Goal: External API for managing the node.
Deliverables:
- HTTP or gRPC management API
- Container management endpoints (create, start, stop, list, logs)
- Node status and health reporting
- Configuration injection at boot time
- No SSH — API is the only interface
Future — Kubernetes
Goal: Serve as a K8s/K3s worker node.
Deliverables (rough):
- CRI-compatible container runtime
- kubelet (or custom equivalent)
- CNI plugin support
- Node registration with K8s control plane
- Pod lifecycle management
This phase is explicitly not v1 and will be scoped in detail after Phase 6 is complete.