Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ThemeliOS

ThemeliOS (from Greek θεμέλιο — “foundation”) is an experimental capability-based microkernel operating system written in Rust. It is designed from the ground up to do one thing well: run container workloads securely.

What is ThemeliOS?

ThemeliOS is a from-scratch kernel — it does not use or build on top of Linux. It implements its own memory management, process scheduling, inter-process communication, and security model.

The long-term vision is a minimal, immutable OS that:

  • Boots on virtual machines and bare metal
  • Runs OCI-compatible container images
  • Serves as a Kubernetes/K3s worker node
  • Provides hardware-enforced isolation between containers via capabilities
  • Has no SSH, no shell, and no way to “log in” — all management is via API

Why build a new kernel?

Existing container OSes (Bottlerocket, Talos Linux, Flatcar) all use the Linux kernel with a stripped-down userspace. This is practical, but it inherits Linux’s security model — namespaces and cgroups are opt-in isolation bolted onto a kernel designed for general-purpose computing.

ThemeliOS takes the opposite approach: isolation is the default. The capability-based security model means a process has zero access to anything unless explicitly granted. There’s nothing to escape from because there’s no ambient authority to escalate to.

Project status

ThemeliOS is in early development. See the Milestones page for the current roadmap.

License

MIT — Copyright (c) 2026 Rudi MK

Development Setup

This guide walks through setting up a development environment for ThemeliOS on macOS or Linux.

Prerequisites

1. Rust nightly toolchain

ThemeliOS requires Rust nightly because the kernel uses unstable features (#![no_std], #![no_main], inline assembly, custom allocators).

The project pins the exact toolchain via rust-toolchain.toml, so you just need rustup installed — it will automatically download the correct nightly version.

Install rustup (if you don’t have it):

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

After cloning the repo, the first cargo command will automatically install the pinned nightly toolchain plus the bare-metal targets (x86_64-unknown-none, aarch64-unknown-none).

You can verify with:

rustup show

You should see a nightly toolchain with the x86_64-unknown-none and aarch64-unknown-none targets listed.

2. QEMU

QEMU emulates the hardware that ThemeliOS runs on. You need qemu-system-x86_64 for the primary amd64 target and optionally qemu-system-aarch64 for arm64.

macOS (Homebrew):

brew install qemu

This installs all QEMU system emulators.

3. xorriso

xorriso creates bootable ISO images. The build pipeline uses it to package the kernel with the Limine bootloader into a hybrid BIOS+UEFI ISO.

macOS (Homebrew):

brew install xorriso

Ubuntu/Debian:

sudo apt install xorriso

Fedora:

sudo dnf install xorriso

4. C compiler (for Limine CLI tool)

The first cargo xtask run downloads and builds the Limine bootloader’s CLI tool, which is a small C program. This requires a C compiler.

  • macOS: Xcode Command Line Tools (xcode-select --install)
  • Linux: gcc or clang (usually pre-installed)

Ubuntu/Debian:

sudo apt install qemu-system-x86 qemu-system-arm

Fedora:

sudo dnf install qemu-system-x86 qemu-system-aarch64

Arch Linux:

sudo pacman -S qemu-full

Verify installation:

qemu-system-x86_64 --version
qemu-system-aarch64 --version

3. mdbook (optional, for building documentation)

cargo install mdbook

Building and running

All build and run commands go through the xtask tool. You never need to invoke cargo build for the kernel directly.

Build the kernel

cargo xtask build

This cross-compiles the kernel for x86_64-unknown-none (the default target).

For arm64:

cargo xtask build --arch arm64

Run in QEMU

cargo xtask run

This builds the kernel, creates a bootable ISO, and launches it in QEMU in headless mode — serial output is piped to your terminal, but no graphical window opens. Press Ctrl+A, X to exit QEMU.

For arm64 (not yet implemented):

cargo xtask run --arch arm64

Build ISO only (without launching QEMU)

cargo xtask iso

This builds the kernel and creates a bootable ISO at target/themelios.iso without launching QEMU. Useful when you want to run QEMU manually with custom flags.

Run with QEMU display window

To see the QEMU graphical window (shows the Limine bootloader screen and any framebuffer output):

cargo xtask run --display

This does everything cargo xtask run does but opens a QEMU window instead of running headless. Serial output still goes to your terminal.

Build documentation

cargo xtask docs

This builds both the mdbook (to docs/book/) and the rustdoc API docs.

Shorthand alias

The workspace defines a cargo xt alias, so these also work:

cargo xt build
cargo xt run
cargo xt docs

Project layout

themelios/
├── kernel/          # The kernel crate (#![no_std], bare-metal)
│   └── src/
│       ├── main.rs  # Kernel entry point, module declarations
│       ├── arch/    # Architecture-specific (x86_64, aarch64)
│       ├── mm/      # Memory management
│       ├── sched/   # Scheduler
│       ├── cap/     # Capability system
│       ├── ipc/     # Inter-process communication
│       ├── drivers/ # Device drivers (VirtIO, serial, etc.)
│       ├── fs/      # Filesystem
│       └── net/     # Networking
├── xtask/           # Build tooling (runs on host)
├── docs/            # mdbook documentation
├── .cargo/          # Cargo configuration
└── CLAUDE.md        # Project documentation for AI assistants

IDE setup

VS Code

Install the rust-analyzer extension. It should pick up the workspace configuration automatically.

If rust-analyzer struggles with the #![no_std] kernel crate, you may need to add this to .vscode/settings.json:

{
    "rust-analyzer.cargo.target": "x86_64-unknown-none",
    "rust-analyzer.cargo.buildScripts.enable": true
}

Other editors

Any editor with rust-analyzer LSP support should work. The key setting is ensuring the target is set to x86_64-unknown-none for the kernel crate.

Troubleshooting

“can’t find crate for core

This means the bare-metal target isn’t installed. Run:

rustup target add x86_64-unknown-none aarch64-unknown-none

Or let rust-toolchain.toml handle it by running any cargo command in the project.

“error: -Zbuild-std is unstable”

You need to be on the nightly toolchain. Check with rustup show — the project’s rust-toolchain.toml should select nightly automatically.

QEMU not found

Make sure QEMU is installed and on your $PATH. See the QEMU installation section above.

Bootloader

ThemeliOS uses the Limine bootloader. This page explains why, how it works, and how it fits into the build pipeline.

Why Limine?

We evaluated several options for booting ThemeliOS:

OptionProsCons
Custom UEFI appFull controlMassive effort, x86_64 UEFI only initially
Multiboot2Simple, QEMU -kernel flagBIOS only, no arm64, no UEFI
bootloader crateVery easy Rust integrationx86_64 only, no arm64
LimineBIOS + UEFI, x86_64 + arm64, well-maintainedExternal dependency

Limine was chosen because:

  1. Multi-architecture: Supports x86_64 and aarch64 (and RISC-V, LoongArch). We need both for our cloud targets.
  2. Multi-firmware: Works on both BIOS (legacy) and UEFI (modern). Cloud platforms use UEFI; QEMU defaults to BIOS.
  3. Higher-half kernel: Limine sets up page tables that map our kernel at 0xffffffff80000000, which is the standard layout for 64-bit kernels.
  4. Clean protocol: The Limine boot protocol gives us a memory map, framebuffer, and other boot info without writing any assembly.
  5. Active maintenance: Regular releases, good documentation.

Cloud compatibility

Limine’s UEFI support means ThemeliOS can boot on:

  • AWS EC2 (Nitro): UEFI supported on most instance types
  • GCP Compute Engine: UEFI supported
  • Azure Gen2 VMs: UEFI
  • Bare metal: UEFI is standard on modern server hardware
  • QEMU/KVM: Both BIOS (default) and UEFI (via OVMF)

The same kernel binary works on all platforms — only the bootloader firmware interface differs, and Limine handles that.

How it works

Boot sequence

  1. Firmware (BIOS or UEFI) loads the Limine bootloader from the boot media
  2. Limine reads limine.conf to find the kernel path and boot protocol
  3. Limine loads the kernel ELF into memory at the addresses specified in the linker script
  4. Limine sets up:
    • 64-bit long mode (x86_64) or EL1 (aarch64)
    • 4-level page tables with identity + higher-half mappings
    • A valid stack
  5. Limine scans the kernel’s .requests ELF section for boot protocol requests
  6. Limine fills in the requests (memory map, framebuffer, etc.)
  7. Limine jumps to the kernel entry point (kmain)

Boot protocol requests

The kernel communicates with Limine through static data structures placed in a special ELF section. These are “requests” — the kernel declares what boot information it needs, and Limine fills in the responses.

#![allow(unused)]
fn main() {
// Placed in the .requests ELF section via the linker script
#[used]
#[link_section = ".requests"]
static BASE_REVISION: BaseRevision = BaseRevision::new();
}

The linker script places these between start/end markers so Limine knows where to scan:

.data : {
    ...
    KEEP(*(.requests_start_marker))
    KEEP(*(.requests))
    KEEP(*(.requests_end_marker))
}

Configuration file

limine.conf (in the project root) uses the v8 format:

timeout: 0

/ThemeliOS
    protocol: limine
    kernel_path: boot():/boot/themelios
  • timeout: 0 — boot immediately without showing a menu
  • /ThemeliOS — defines a boot entry
  • protocol: limine — use the Limine protocol (not Linux or Multiboot)
  • kernel_path: boot():/boot/themelios — load the kernel from the boot volume

Linker script

The linker script (kernel/linker-x86_64.ld) controls the kernel’s memory layout:

  • Entry point: ENTRY(kmain) — tells the ELF where execution begins
  • Load address: 0xffffffff80000000 — the higher-half virtual address
  • Sections: .text (code), .rodata (constants), .data (mutable data + Limine requests), .bss (zeroed data)

The kernel must be compiled with -Crelocation-model=static to produce a non-PIE executable with fixed addresses that match the linker script.

Build pipeline

The cargo xtask run command handles the full pipeline:

  1. Cross-compile the kernel for x86_64-unknown-none
  2. Download Limine (one-time: git clone of the v8.x-binary branch to target/limine/)
  3. Build Limine CLI (one-time: make compiles limine.c)
  4. Create ISO via xorriso:
    • Copies kernel, Limine files, and limine.conf into an ISO directory structure
    • Creates a hybrid BIOS+UEFI bootable ISO
    • Installs BIOS boot sectors via limine bios-install
  5. Launch QEMU with the ISO attached as a CD-ROM

Limine version

  • Bootloader: v8.x (binary distribution from v8.x-binary branch)
  • Rust crate: limine = "0.5" (boot protocol structures)

The bootloader binaries are cached in target/limine/ and not committed to git.

Architecture Overview

ThemeliOS is a capability-based microkernel. This page explains the high-level design and the reasoning behind key architectural decisions.

Microkernel vs monolithic

In a monolithic kernel (like Linux), drivers, filesystems, and networking all run inside the kernel with full hardware access. A bug in any driver can crash or compromise the entire system.

In a microkernel, only the absolute minimum runs in kernel space:

Kernel spaceUserspace
Memory managementDevice drivers
Process schedulingFilesystem
IPC (message passing)Network stack
Capability enforcementContainer runtime
Management API

Everything else runs as isolated userspace processes that communicate via IPC. A buggy driver crashes its own process, not the kernel.

Why microkernel for ThemeliOS? Since we’re building an OS specifically for running untrusted container workloads, minimizing the trusted computing base (the code that can compromise the whole system) is critical. The smaller the kernel, the smaller the attack surface.

Capability-based security

ThemeliOS does not use Linux-style permissions (UID/GID, filesystem permissions) or Linux-style isolation (namespaces, cgroups). Instead, it uses capabilities.

What is a capability?

A capability is an unforgeable token that grants its holder specific permissions on a specific resource. For example:

  • “Read and write to memory region 0x1000–0x2000”
  • “Send messages to IPC endpoint #42”
  • “Access VirtIO block device at MMIO address 0xFE00”

Key properties

  1. No ambient authority: A newly created process has zero capabilities. It can’t do anything until its parent grants it capabilities.

  2. Unforgeable: Capabilities are managed by the kernel. Userspace can’t create them or guess valid ones.

  3. Transferable: Capabilities can be passed between processes via IPC, enabling controlled delegation.

  4. Revocable: A capability can be revoked, immediately cutting off access.

Why not namespaces?

Linux namespaces are “isolation after the fact” — processes start with broad access and namespaces restrict what they can see. Capabilities are “isolation by default” — processes start with nothing and are explicitly granted only what they need.

For a container OS, this means a compromised container literally cannot access resources it wasn’t given capabilities for. There’s no kernel interface to probe, no /proc to read, no syscall to escalate through — the authority simply doesn’t exist.

Inspiration

  • seL4: Formally verified capability microkernel. ThemeliOS borrows its capability model.
  • Fuchsia/Zircon: Google’s capability-based OS. Demonstrates the model works at scale.

Memory model

ThemeliOS uses hardware-enforced memory isolation:

  • Each process runs in its own virtual address space (page tables enforced by the MMU).
  • The kernel has its own address space that userspace cannot access.
  • Shared memory between processes requires explicit capabilities from both sides.

Physical memory management

A frame allocator tracks free physical memory pages (4 KiB). Frames are allocated to:

  • Process page tables
  • Kernel heap
  • Shared memory regions
  • DMA buffers for device drivers

Virtual memory layout

The virtual address space layout will be defined per-architecture, but the general structure is:

0x0000_0000_0000_0000  ┌──────────────────────┐
                        │   Userspace           │
                        │   (per-process)       │
0x0000_7FFF_FFFF_FFFF  └──────────────────────┘
                        ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
                          Non-canonical hole
                        └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
0xFFFF_8000_0000_0000  ┌──────────────────────┐
                        │   Kernel space        │
                        │   (shared, all procs) │
0xFFFF_FFFF_FFFF_FFFF  └──────────────────────┘

(This is the x86_64 layout; aarch64 is similar but with different conventions.)

IPC

Inter-process communication is the backbone of the microkernel. Since drivers, filesystems, and networking all run in userspace, every system operation involves IPC.

Synchronous message passing

The primary mechanism: a client sends a message to a server and blocks until it gets a reply. This is used for request/response patterns like “read this file” or “send this network packet.”

Performance consideration

IPC overhead is the classic criticism of microkernels. ThemeliOS will address this by:

  • Keeping messages small (pointers to shared memory for bulk data)
  • Using register-based fast-path for small messages
  • Careful cache-aware scheduling of communicating processes

Immutability

The OS root filesystem is read-only. The entire OS image is a single artifact that is booted as-is.

  • Updates: Swap the entire image. No package managers, no apt-get, no partial updates.
  • Configuration: Injected at boot time via cloud-init-style metadata or the management API.
  • Ephemeral state: Container images and runtime state live on a RAM-backed ephemeral layer that is lost on reboot.

This model treats nodes as cattle: if a node is unhealthy, replace it with a fresh one. No debugging on the node, no SSHing in, no manual fixes.

Target platforms

ThemeliOS is designed to run as a virtual machine, with bare-metal support as a secondary goal.

PlatformStatusNotes
QEMU/KVM (x86_64)Primary dev targetUsed for all development and testing
QEMU (aarch64)Secondary dev targetARM64 support
AWS (EC2)FutureNitro hypervisor
GCP (Compute Engine)FutureKVM-based
Azure (VMs)FutureHyper-V
Bare metal (headless)FutureServer hardware, no GPU/display

Capability System

This document details the design of ThemeliOS’s capability system — the core security mechanism of the kernel.

Status: Design phase. Implementation begins in Phase 2.

Overview

In ThemeliOS, every resource is accessed through capabilities. A capability is a kernel-managed, unforgeable token that encodes:

  1. Which resource (identified by a kernel object ID)
  2. What operations are permitted (a bitmask of rights)

Capability types

Capability typeResourceExample rights
MemoryCapPhysical memory regionRead, Write, Execute, Map
EndpointCapIPC endpointSend, Receive
ThreadCapThread/processStart, Stop, Suspend, Resume
DeviceCapHardware device (MMIO region)Read, Write
IRQCapInterrupt lineAcknowledge, Bind

Capability spaces

Each process has a capability space (CSpace) — a table mapping local capability slots to kernel objects. A process refers to its capabilities by slot index, not by object ID. The kernel translates slot indices to objects on each syscall.

Process A's CSpace:
  Slot 0 → MemoryCap(region=0x1000, rights=RW)
  Slot 1 → EndpointCap(endpoint=#7, rights=Send)
  Slot 2 → (empty)
  Slot 3 → ThreadCap(thread=#12, rights=Start|Stop)

Process B's CSpace:
  Slot 0 → EndpointCap(endpoint=#7, rights=Receive)
  Slot 1 → MemoryCap(region=0x2000, rights=R)

Process A can send to endpoint #7 (slot 1), and Process B can receive from it (slot 0). Neither can access the other’s memory — they’d need explicit capabilities for that.

Capability operations

Grant

A parent process can grant a capability to a child process, optionally with reduced rights:

Parent has: MemoryCap(region=X, rights=RWX)
Parent grants child: MemoryCap(region=X, rights=R)

The child gets read-only access. Rights can only be reduced, never elevated.

Transfer via IPC

Capabilities can be attached to IPC messages. This is how services delegate access:

FileServer receives "open /config" request
FileServer replies with MemoryCap(region=file_data, rights=R)
Client now has read access to the file's memory region

Revoke

The kernel (or a process with the appropriate meta-capability) can revoke a capability, immediately invalidating it. Any future use of the revoked slot returns an error.

Container mapping

In ThemeliOS, a “container” is a group of processes sharing a common set of capabilities. The container’s capability set defines its sandbox:

  • Memory: Only the memory regions granted to it
  • Network: Only the network endpoints it has capabilities for
  • Filesystem: Only the filesystem views it’s been granted
  • IPC: Only the services it has endpoint capabilities for

A container cannot discover or access anything outside its capability set. Unlike Linux containers (where a kernel exploit can escape the namespace), escaping a capability sandbox requires forging a kernel object — which is impossible without a kernel memory corruption bug.

Comparison with Linux isolation

AspectLinux (namespaces/cgroups)ThemeliOS (capabilities)
DefaultAccess everything, restrict selectivelyAccess nothing, grant explicitly
EnforcementKernel checks on each syscallNo syscall exists without capability
Escape riskKernel bugs can bypass namespacesRequires kernel memory corruption
Resource discoveryCan probe for resourcesCan’t even address unknown resources
GranularityPer-namespacePer-object, per-right

Memory Management

This document describes ThemeliOS’s memory management subsystem design.

Status: Design phase. Implementation begins in Phase 1.

Overview

The memory management (MM) subsystem is responsible for:

  1. Physical frame allocation — tracking which 4 KiB pages of physical RAM are free or in use
  2. Virtual memory — creating and managing page tables for each process
  3. Kernel heap — providing dynamic allocation (alloc-style) for kernel data structures

Physical memory

Boot-time discovery

The bootloader provides a memory map describing which physical address ranges are usable RAM, reserved by firmware, or used for MMIO. The frame allocator uses this map to initialize its free list.

Frame allocator

The frame allocator hands out 4 KiB physical memory frames. Initial implementation will use a bitmap allocator:

  • One bit per physical frame (1 = allocated, 0 = free)
  • Simple, predictable, easy to implement
  • For 4 GiB of RAM: bitmap is 128 KiB (manageable)

Later optimization: replace with a buddy allocator for efficient allocation of contiguous multi-frame regions (needed for DMA buffers, large pages).

Capability integration

Physical frames are resources protected by capabilities. When a process requests memory:

  1. Kernel allocates a frame from the free pool
  2. Kernel creates a MemoryCap for that frame
  3. Kernel inserts the capability into the process’s CSpace
  4. Process can now map the frame into its address space using the capability

A process cannot access physical memory it doesn’t have a capability for — the page tables are configured to reflect capability permissions.

Virtual memory

Address space layout (x86_64)

 Lower half (user space, per-process):
   0x0000_0000_0000_0000 - 0x0000_7FFF_FFFF_FFFF

 Upper half (kernel space, shared across all processes):
   0xFFFF_8000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF
     ├── Physical memory direct map
     ├── Kernel code and data
     ├── Kernel heap
     └── Per-CPU data

Page tables

x86_64 uses 4-level page tables (PML4 → PDPT → PD → PT), each with 512 entries. Each entry is 8 bytes and can point to:

  • The next level table
  • A large page (2 MiB at PD level, 1 GiB at PDPT level)
  • A 4 KiB page (at PT level)

The kernel manages page tables for each process. When a context switch occurs, the CPU’s CR3 register is loaded with the new process’s PML4 physical address, instantly switching the entire address space.

aarch64 differences

aarch64 uses a similar 4-level translation table scheme but with different register names (TTBR0/TTBR1 instead of CR3) and different table entry formats. The architecture abstraction layer hides these differences from the rest of the kernel.

Kernel heap

The kernel needs dynamic allocation for data structures like:

  • Process control blocks
  • Capability tables
  • IPC message buffers
  • Driver state

We’ll use the linked_list_allocator crate initially (a simple free-list allocator suitable for #![no_std] kernels), backed by physical frames allocated from the frame allocator.

The kernel heap lives in the upper-half virtual address space and is shared across all contexts (but only accessible from kernel mode).

Memory safety

Rust’s ownership model provides compile-time guarantees against:

  • Use-after-free: The compiler prevents using a frame after it’s been freed
  • Double-free: The compiler prevents freeing a frame twice
  • Data races: Shared mutable access requires synchronization (Mutex, RefCell)

The unsafe keyword is required for raw pointer operations (hardware register access, page table manipulation) — these are confined to small, well-documented blocks.

Milestones

ThemeliOS development is organized into phases. Each phase builds on the previous one and produces a working, testable artifact.

PhaseGoalStatus
0Boot on QEMU, serial outputComplete
1Memory allocator, scheduler, interrupts (x86_64)Complete
2Capability system, process isolation, IPCComplete
3VirtIO block driver, read-only filesystemNot started
4VirtIO net driver, TCP/IP stackNot started
5OCI container supportNot started
6Management API (Docker-compatible)Not started
7aarch64 portNot started
8Hyperscaler support (AWS, GCP, Azure)Not started
9Testing and benchmarksNot started
10Kubernetes worker nodeNot started
11GPU support across cloudsNot started
12Production operations (observability, updates)Not started

Phase 0 — Boot (Complete)

Goal: Get the kernel booting on QEMU and printing to the serial console.

Deliverables:

  • Bootloader integration (Limine or UEFI)
  • Architecture-specific early init (x86_64 first)
  • Serial console output (16550 UART on x86_64)
  • “Hello from ThemeliOS” printed on boot
  • cargo xtask run boots the kernel in QEMU end-to-end

Phase 1 — Kernel basics (Complete)

Goal: A kernel that can manage memory and schedule tasks. x86_64 only — aarch64 is deferred to Phase 7.

Deliverables:

  • Physical frame allocator (bitmap-based)
  • Kernel heap allocator
  • Interrupt handling (GDT, IDT, 8259 PIC on x86_64)
  • Timer-driven preemptive scheduler (round-robin)
  • Basic kernel shell over serial (for debugging, will be removed later)
  • Automated test infrastructure (isa-debug-exit, cargo xtask test, GitHub Actions CI)

Phase 2 — Isolation (Complete)

Goal: Implement the capability system and process isolation.

Deliverables:

  • Custom page tables replacing Limine’s (required for per-process address spaces)
  • Capability types and capability space (CSpace)
  • Process creation with isolated address spaces
  • Capability grant, transfer, and revocation
  • Synchronous IPC (message passing between processes)
  • Audit logging (tamper-evident record of capability usage for compliance and security)
  • Reclaim bootloader-reclaimable memory (safe once we own GDT, page tables, and stack)
  • First userspace process (init)

Phase 3 — Storage (Not started)

Goal: Read from a virtual disk and present a filesystem.

Deliverables:

  • VirtIO block driver (for QEMU’s virtual disk)
  • Read-only filesystem (simple format, possibly custom or FAT)
  • RAM-backed ephemeral writable layer
  • Immutable root image creation tooling

Phase 4 — Networking (Not started)

Goal: TCP/IP connectivity.

Deliverables:

  • VirtIO network driver
  • Ethernet, ARP, IPv4
  • TCP and UDP
  • Basic socket-like API via capabilities
  • DHCP client

Phase 5 — Containers (Not started)

Goal: Run OCI container images.

Deliverables:

  • Linux syscall compatibility layer (translate Linux syscalls to capability-checked ThemeliOS operations)
  • OCI image format parsing and layer unpacking
  • Container lifecycle (create, start, stop, destroy)
  • Container exec (spawn processes inside a running container’s isolation boundary)
  • PTY support for interactive terminal sessions
  • Container-to-capability mapping (each container gets a capability set)
  • Container networking (virtual interfaces, isolation)
  • Log streaming from containers (stdout/stderr capture)
  • Resource limits (CPU, memory) enforced via capabilities
  • Container image registry support (Docker Hub, ECR, GCR, ACR)
  • Registry authentication, TLS, and cloud-specific credential helpers

Phase 6 — Management (Not started)

Goal: Docker-compatible management API for the node.

Deliverables:

  • Docker Engine API compatible subset (containers, exec, images, logs, networks)
  • Bidirectional streaming for interactive exec sessions (websocket)
  • Capability-based authorization (API clients mapped to capability sets)
  • TLS client certificate and API token authentication
  • Node status and health reporting
  • Configuration injection at boot time
  • No SSH — API is the only interface
  • Standard Docker tooling works out of the box (docker exec, docker ps, docker logs, etc.)

Phase 7 — aarch64 port (Not started)

Goal: Port all Phase 0 and Phase 1 functionality to aarch64 (ARM64), enabling ThemeliOS to run on ARM-based hardware and cloud instances (e.g., AWS Graviton).

Deliverables:

  • aarch64 boot via Limine (UEFI on ARM)
  • PL011 UART serial driver for debug output
  • GIC (Generic Interrupt Controller) initialization and exception handling
  • ARM generic timer for scheduler preemption
  • Physical frame allocator (same bitmap design, architecture-independent)
  • Kernel heap (architecture-independent, just works)
  • Scheduler and context switch for aarch64 (different register set, different calling convention)
  • Serial debug shell (architecture-independent, just works)
  • cargo xtask run --arch aarch64 boots and passes all tests
  • Automated tests on aarch64 QEMU in CI

Phase 8 — Hyperscaler support (Not started)

Goal: Boot and run on AWS, GCP, and Azure.

Deliverables:

  • Instance metadata service (IMDS) clients for all three providers
  • Cloud-aware configuration injection at boot time
  • Machine image tooling (cargo xtask image --cloud aws/gcp/azure)
  • AMI creation for AWS (raw disk import via aws ec2 import-image)
  • GCP image creation (raw disk tarball + gcloud compute images create)
  • Azure VHD image creation
  • UEFI Secure Boot chain verification and kernel image signing
  • Measured boot (TPM support)
  • Boot validation on each provider’s compute instances
  • GitHub Actions workflow to build downloadable QEMU ISOs (x86_64, aarch64)
  • GitHub Actions workflows to build and publish cloud-specific machine images

Phase 9 — Testing and benchmarks (Not started)

Goal: Comprehensive test suite and performance benchmarks to validate the OS works correctly end-to-end.

Deliverables:

  • CI infrastructure (GitHub Actions with QEMU, isa-debug-exit device for pass/fail exit codes)
  • Boot smoke tests (kernel boots, reaches known-good state, no panic)
  • Kernel unit tests (allocator, scheduler, capability enforcement tested in isolation)
  • Kernel integration tests (spawn process + grant capability + IPC message + verify result)
  • Security and isolation tests (capability violations, unauthorized memory access, process escape attempts — all must fail cleanly)
  • Container runtime tests with standard images (alpine, busybox, nginx)
  • Custom test images (memory stress, network connectivity, filesystem I/O, multi-process isolation)
  • Container lifecycle tests (create, start, stop, restart, destroy, exec)
  • Multi-container isolation validation
  • Container networking tests
  • Resource limit enforcement tests
  • Cloud validation tests (boot on each hyperscaler, IMDS, networking, container workloads)
  • Benchmarks: boot time, context switch latency, IPC throughput, memory allocation speed, container cold-start time
  • Benchmark history tracking for regression detection

Phase 10 — Kubernetes (Not started)

Goal: Full drop-in K8s/K3s/RKE2 worker node. Any pod that runs on an Ubuntu or Flatcar node must run identically on ThemeliOS.

Deliverables:

  • Full Linux syscall coverage for real-world K8s workloads (databases, language runtimes, service meshes, logging agents, init systems)
  • CRI (Container Runtime Interface) gRPC API implementation
  • CNI (Container Network Interface) plugin support (Flannel, Calico, Cilium)
  • CSI (Container Storage Interface) driver support for persistent volumes
  • Pod semantics (groups of containers sharing network and storage namespaces)
  • kubelet (standard binary or compatible custom implementation)
  • kube-proxy equivalent for service networking and load balancing
  • Node registration, capacity reporting, and health conditions
  • kubectl exec -it with full interactive shell support
  • kubectl logs, kubectl cp, kubectl port-forward
  • Pod resource management (CPU/memory requests and limits, QoS classes)
  • DNS resolution for K8s service discovery

Phase 11 — GPU support (Not started)

Goal: GPU passthrough and accelerator support for containerized workloads across all major cloud providers.

Deliverables:

  • VFIO/IOMMU support for GPU device passthrough to containers
  • NVIDIA driver ioctl compatibility in the syscall layer
  • K8s device plugin API support for GPU resource scheduling
  • GPU resource requests and limits in pod specs
  • Validation on AWS GPU instances (P/G series)
  • Validation on GCP GPU instances (A2/G2 series)
  • Validation on Azure GPU instances (NC/ND series)
  • Cloud-specific accelerator support (AWS Inferentia/Trainium, GCP TPU, Azure AMD GPUs)

Phase 12 — Production operations (Not started)

Goal: Day-2 operational tooling for running ThemeliOS nodes in production.

Deliverables:

  • Metrics export in Prometheus format (node-exporter compatible)
  • Log forwarding to external collectors (CloudWatch, Stackdriver, Fluentd)
  • Health endpoints for load balancers and orchestrators
  • Distributed tracing support for container workloads
  • A/B partition scheme for whole-image OS updates
  • Automatic rollback on failed updates
  • Zero-downtime node upgrades (drain → swap image → rejoin cluster)
  • OS update tooling (cargo xtask image --update or equivalent)
  • Update coordination with K8s (respect PodDisruptionBudgets during upgrades)