ThemeliOS
ThemeliOS (from Greek θεμέλιο — “foundation”) is an experimental capability-based microkernel operating system written in Rust. It is designed from the ground up to do one thing well: run container workloads securely.
What is ThemeliOS?
ThemeliOS is a from-scratch kernel — it does not use or build on top of Linux. It implements its own memory management, process scheduling, inter-process communication, and security model.
The long-term vision is a minimal, immutable OS that:
- Boots on virtual machines and bare metal
- Runs OCI-compatible container images
- Serves as a Kubernetes/K3s worker node
- Provides hardware-enforced isolation between containers via capabilities
- Has no SSH, no shell, and no way to “log in” — all management is via API
Why build a new kernel?
Existing container OSes (Bottlerocket, Talos Linux, Flatcar) all use the Linux kernel with a stripped-down userspace. This is practical, but it inherits Linux’s security model — namespaces and cgroups are opt-in isolation bolted onto a kernel designed for general-purpose computing.
ThemeliOS takes the opposite approach: isolation is the default. The capability-based security model means a process has zero access to anything unless explicitly granted. There’s nothing to escape from because there’s no ambient authority to escalate to.
Project status
ThemeliOS is in early development. See the Milestones page for the current roadmap.
License
MIT — Copyright (c) 2026 Rudi MK
Development Setup
This guide walks through setting up a development environment for ThemeliOS on macOS or Linux.
Prerequisites
1. Rust nightly toolchain
ThemeliOS requires Rust nightly because the kernel uses unstable features (#![no_std], #![no_main], inline assembly, custom allocators).
The project pins the exact toolchain via rust-toolchain.toml, so you just need rustup installed — it will automatically download the correct nightly version.
Install rustup (if you don’t have it):
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
After cloning the repo, the first cargo command will automatically install the pinned nightly toolchain plus the bare-metal targets (x86_64-unknown-none, aarch64-unknown-none).
You can verify with:
rustup show
You should see a nightly toolchain with the x86_64-unknown-none and aarch64-unknown-none targets listed.
2. QEMU
QEMU emulates the hardware that ThemeliOS runs on. You need qemu-system-x86_64 for the primary amd64 target and optionally qemu-system-aarch64 for arm64.
macOS (Homebrew):
brew install qemu
This installs all QEMU system emulators.
3. xorriso
xorriso creates bootable ISO images. The build pipeline uses it to package the kernel with the Limine bootloader into a hybrid BIOS+UEFI ISO.
macOS (Homebrew):
brew install xorriso
Ubuntu/Debian:
sudo apt install xorriso
Fedora:
sudo dnf install xorriso
4. C compiler (for Limine CLI tool)
The first cargo xtask run downloads and builds the Limine bootloader’s CLI tool, which is a small C program. This requires a C compiler.
- macOS: Xcode Command Line Tools (
xcode-select --install) - Linux:
gccorclang(usually pre-installed)
Ubuntu/Debian:
sudo apt install qemu-system-x86 qemu-system-arm
Fedora:
sudo dnf install qemu-system-x86 qemu-system-aarch64
Arch Linux:
sudo pacman -S qemu-full
Verify installation:
qemu-system-x86_64 --version
qemu-system-aarch64 --version
3. mdbook (optional, for building documentation)
cargo install mdbook
Building and running
All build and run commands go through the xtask tool. You never need to invoke cargo build for the kernel directly.
Build the kernel
cargo xtask build
This cross-compiles the kernel for x86_64-unknown-none (the default target).
For arm64:
cargo xtask build --arch arm64
Run in QEMU
cargo xtask run
This builds the kernel, creates a bootable ISO, and launches it in QEMU in headless mode — serial output is piped to your terminal, but no graphical window opens. Press Ctrl+A, X to exit QEMU.
For arm64 (not yet implemented):
cargo xtask run --arch arm64
Build ISO only (without launching QEMU)
cargo xtask iso
This builds the kernel and creates a bootable ISO at target/themelios.iso without launching QEMU. Useful when you want to run QEMU manually with custom flags.
Run with QEMU display window
To see the QEMU graphical window (shows the Limine bootloader screen and any framebuffer output):
cargo xtask run --display
This does everything cargo xtask run does but opens a QEMU window instead of running headless. Serial output still goes to your terminal.
Build documentation
cargo xtask docs
This builds both the mdbook (to docs/book/) and the rustdoc API docs.
Shorthand alias
The workspace defines a cargo xt alias, so these also work:
cargo xt build
cargo xt run
cargo xt docs
Project layout
themelios/
├── kernel/ # The kernel crate (#![no_std], bare-metal)
│ └── src/
│ ├── main.rs # Kernel entry point, module declarations
│ ├── arch/ # Architecture-specific (x86_64, aarch64)
│ ├── mm/ # Memory management
│ ├── sched/ # Scheduler
│ ├── cap/ # Capability system
│ ├── ipc/ # Inter-process communication
│ ├── drivers/ # Device drivers (VirtIO, serial, etc.)
│ ├── fs/ # Filesystem
│ └── net/ # Networking
├── xtask/ # Build tooling (runs on host)
├── docs/ # mdbook documentation
├── .cargo/ # Cargo configuration
└── CLAUDE.md # Project documentation for AI assistants
IDE setup
VS Code
Install the rust-analyzer extension. It should pick up the workspace configuration automatically.
If rust-analyzer struggles with the #![no_std] kernel crate, you may need to add this to .vscode/settings.json:
{
"rust-analyzer.cargo.target": "x86_64-unknown-none",
"rust-analyzer.cargo.buildScripts.enable": true
}
Other editors
Any editor with rust-analyzer LSP support should work. The key setting is ensuring the target is set to x86_64-unknown-none for the kernel crate.
Troubleshooting
“can’t find crate for core”
This means the bare-metal target isn’t installed. Run:
rustup target add x86_64-unknown-none aarch64-unknown-none
Or let rust-toolchain.toml handle it by running any cargo command in the project.
“error: -Zbuild-std is unstable”
You need to be on the nightly toolchain. Check with rustup show — the project’s rust-toolchain.toml should select nightly automatically.
QEMU not found
Make sure QEMU is installed and on your $PATH. See the QEMU installation section above.
Bootloader
ThemeliOS uses the Limine bootloader. This page explains why, how it works, and how it fits into the build pipeline.
Why Limine?
We evaluated several options for booting ThemeliOS:
| Option | Pros | Cons |
|---|---|---|
| Custom UEFI app | Full control | Massive effort, x86_64 UEFI only initially |
| Multiboot2 | Simple, QEMU -kernel flag | BIOS only, no arm64, no UEFI |
bootloader crate | Very easy Rust integration | x86_64 only, no arm64 |
| Limine | BIOS + UEFI, x86_64 + arm64, well-maintained | External dependency |
Limine was chosen because:
- Multi-architecture: Supports x86_64 and aarch64 (and RISC-V, LoongArch). We need both for our cloud targets.
- Multi-firmware: Works on both BIOS (legacy) and UEFI (modern). Cloud platforms use UEFI; QEMU defaults to BIOS.
- Higher-half kernel: Limine sets up page tables that map our kernel at
0xffffffff80000000, which is the standard layout for 64-bit kernels. - Clean protocol: The Limine boot protocol gives us a memory map, framebuffer, and other boot info without writing any assembly.
- Active maintenance: Regular releases, good documentation.
Cloud compatibility
Limine’s UEFI support means ThemeliOS can boot on:
- AWS EC2 (Nitro): UEFI supported on most instance types
- GCP Compute Engine: UEFI supported
- Azure Gen2 VMs: UEFI
- Bare metal: UEFI is standard on modern server hardware
- QEMU/KVM: Both BIOS (default) and UEFI (via OVMF)
The same kernel binary works on all platforms — only the bootloader firmware interface differs, and Limine handles that.
How it works
Boot sequence
- Firmware (BIOS or UEFI) loads the Limine bootloader from the boot media
- Limine reads
limine.confto find the kernel path and boot protocol - Limine loads the kernel ELF into memory at the addresses specified in the linker script
- Limine sets up:
- 64-bit long mode (x86_64) or EL1 (aarch64)
- 4-level page tables with identity + higher-half mappings
- A valid stack
- Limine scans the kernel’s
.requestsELF section for boot protocol requests - Limine fills in the requests (memory map, framebuffer, etc.)
- Limine jumps to the kernel entry point (
kmain)
Boot protocol requests
The kernel communicates with Limine through static data structures placed in a special ELF section. These are “requests” — the kernel declares what boot information it needs, and Limine fills in the responses.
#![allow(unused)]
fn main() {
// Placed in the .requests ELF section via the linker script
#[used]
#[link_section = ".requests"]
static BASE_REVISION: BaseRevision = BaseRevision::new();
}
The linker script places these between start/end markers so Limine knows where to scan:
.data : {
...
KEEP(*(.requests_start_marker))
KEEP(*(.requests))
KEEP(*(.requests_end_marker))
}
Configuration file
limine.conf (in the project root) uses the v8 format:
timeout: 0
/ThemeliOS
protocol: limine
kernel_path: boot():/boot/themelios
timeout: 0— boot immediately without showing a menu/ThemeliOS— defines a boot entryprotocol: limine— use the Limine protocol (not Linux or Multiboot)kernel_path: boot():/boot/themelios— load the kernel from the boot volume
Linker script
The linker script (kernel/linker-x86_64.ld) controls the kernel’s memory layout:
- Entry point:
ENTRY(kmain)— tells the ELF where execution begins - Load address:
0xffffffff80000000— the higher-half virtual address - Sections:
.text(code),.rodata(constants),.data(mutable data + Limine requests),.bss(zeroed data)
The kernel must be compiled with -Crelocation-model=static to produce a non-PIE executable with fixed addresses that match the linker script.
Build pipeline
The cargo xtask run command handles the full pipeline:
- Cross-compile the kernel for
x86_64-unknown-none - Download Limine (one-time:
git cloneof thev8.x-binarybranch totarget/limine/) - Build Limine CLI (one-time:
makecompileslimine.c) - Create ISO via
xorriso:- Copies kernel, Limine files, and
limine.confinto an ISO directory structure - Creates a hybrid BIOS+UEFI bootable ISO
- Installs BIOS boot sectors via
limine bios-install
- Copies kernel, Limine files, and
- Launch QEMU with the ISO attached as a CD-ROM
Limine version
- Bootloader: v8.x (binary distribution from
v8.x-binarybranch) - Rust crate:
limine = "0.5"(boot protocol structures)
The bootloader binaries are cached in target/limine/ and not committed to git.
Architecture Overview
ThemeliOS is a capability-based microkernel. This page explains the high-level design and the reasoning behind key architectural decisions.
Microkernel vs monolithic
In a monolithic kernel (like Linux), drivers, filesystems, and networking all run inside the kernel with full hardware access. A bug in any driver can crash or compromise the entire system.
In a microkernel, only the absolute minimum runs in kernel space:
| Kernel space | Userspace |
|---|---|
| Memory management | Device drivers |
| Process scheduling | Filesystem |
| IPC (message passing) | Network stack |
| Capability enforcement | Container runtime |
| Management API |
Everything else runs as isolated userspace processes that communicate via IPC. A buggy driver crashes its own process, not the kernel.
Why microkernel for ThemeliOS? Since we’re building an OS specifically for running untrusted container workloads, minimizing the trusted computing base (the code that can compromise the whole system) is critical. The smaller the kernel, the smaller the attack surface.
Capability-based security
ThemeliOS does not use Linux-style permissions (UID/GID, filesystem permissions) or Linux-style isolation (namespaces, cgroups). Instead, it uses capabilities.
What is a capability?
A capability is an unforgeable token that grants its holder specific permissions on a specific resource. For example:
- “Read and write to memory region 0x1000–0x2000”
- “Send messages to IPC endpoint #42”
- “Access VirtIO block device at MMIO address 0xFE00”
Key properties
-
No ambient authority: A newly created process has zero capabilities. It can’t do anything until its parent grants it capabilities.
-
Unforgeable: Capabilities are managed by the kernel. Userspace can’t create them or guess valid ones.
-
Transferable: Capabilities can be passed between processes via IPC, enabling controlled delegation.
-
Revocable: A capability can be revoked, immediately cutting off access.
Why not namespaces?
Linux namespaces are “isolation after the fact” — processes start with broad access and namespaces restrict what they can see. Capabilities are “isolation by default” — processes start with nothing and are explicitly granted only what they need.
For a container OS, this means a compromised container literally cannot access resources it wasn’t given capabilities for. There’s no kernel interface to probe, no /proc to read, no syscall to escalate through — the authority simply doesn’t exist.
Inspiration
- seL4: Formally verified capability microkernel. ThemeliOS borrows its capability model.
- Fuchsia/Zircon: Google’s capability-based OS. Demonstrates the model works at scale.
Memory model
ThemeliOS uses hardware-enforced memory isolation:
- Each process runs in its own virtual address space (page tables enforced by the MMU).
- The kernel has its own address space that userspace cannot access.
- Shared memory between processes requires explicit capabilities from both sides.
Physical memory management
A frame allocator tracks free physical memory pages (4 KiB). Frames are allocated to:
- Process page tables
- Kernel heap
- Shared memory regions
- DMA buffers for device drivers
Virtual memory layout
The virtual address space layout will be defined per-architecture, but the general structure is:
0x0000_0000_0000_0000 ┌──────────────────────┐
│ Userspace │
│ (per-process) │
0x0000_7FFF_FFFF_FFFF └──────────────────────┘
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
Non-canonical hole
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
0xFFFF_8000_0000_0000 ┌──────────────────────┐
│ Kernel space │
│ (shared, all procs) │
0xFFFF_FFFF_FFFF_FFFF └──────────────────────┘
(This is the x86_64 layout; aarch64 is similar but with different conventions.)
IPC
Inter-process communication is the backbone of the microkernel. Since drivers, filesystems, and networking all run in userspace, every system operation involves IPC.
Synchronous message passing
The primary mechanism: a client sends a message to a server and blocks until it gets a reply. This is used for request/response patterns like “read this file” or “send this network packet.”
Performance consideration
IPC overhead is the classic criticism of microkernels. ThemeliOS will address this by:
- Keeping messages small (pointers to shared memory for bulk data)
- Using register-based fast-path for small messages
- Careful cache-aware scheduling of communicating processes
Immutability
The OS root filesystem is read-only. The entire OS image is a single artifact that is booted as-is.
- Updates: Swap the entire image. No package managers, no apt-get, no partial updates.
- Configuration: Injected at boot time via cloud-init-style metadata or the management API.
- Ephemeral state: Container images and runtime state live on a RAM-backed ephemeral layer that is lost on reboot.
This model treats nodes as cattle: if a node is unhealthy, replace it with a fresh one. No debugging on the node, no SSHing in, no manual fixes.
Target platforms
ThemeliOS is designed to run as a virtual machine, with bare-metal support as a secondary goal.
| Platform | Status | Notes |
|---|---|---|
| QEMU/KVM (x86_64) | Primary dev target | Used for all development and testing |
| QEMU (aarch64) | Secondary dev target | ARM64 support |
| AWS (EC2) | Future | Nitro hypervisor |
| GCP (Compute Engine) | Future | KVM-based |
| Azure (VMs) | Future | Hyper-V |
| Bare metal (headless) | Future | Server hardware, no GPU/display |
Capability System
This document details the design of ThemeliOS’s capability system — the core security mechanism of the kernel.
Status: Design phase. Implementation begins in Phase 2.
Overview
In ThemeliOS, every resource is accessed through capabilities. A capability is a kernel-managed, unforgeable token that encodes:
- Which resource (identified by a kernel object ID)
- What operations are permitted (a bitmask of rights)
Capability types
| Capability type | Resource | Example rights |
|---|---|---|
MemoryCap | Physical memory region | Read, Write, Execute, Map |
EndpointCap | IPC endpoint | Send, Receive |
ThreadCap | Thread/process | Start, Stop, Suspend, Resume |
DeviceCap | Hardware device (MMIO region) | Read, Write |
IRQCap | Interrupt line | Acknowledge, Bind |
Capability spaces
Each process has a capability space (CSpace) — a table mapping local capability slots to kernel objects. A process refers to its capabilities by slot index, not by object ID. The kernel translates slot indices to objects on each syscall.
Process A's CSpace:
Slot 0 → MemoryCap(region=0x1000, rights=RW)
Slot 1 → EndpointCap(endpoint=#7, rights=Send)
Slot 2 → (empty)
Slot 3 → ThreadCap(thread=#12, rights=Start|Stop)
Process B's CSpace:
Slot 0 → EndpointCap(endpoint=#7, rights=Receive)
Slot 1 → MemoryCap(region=0x2000, rights=R)
Process A can send to endpoint #7 (slot 1), and Process B can receive from it (slot 0). Neither can access the other’s memory — they’d need explicit capabilities for that.
Capability operations
Grant
A parent process can grant a capability to a child process, optionally with reduced rights:
Parent has: MemoryCap(region=X, rights=RWX)
Parent grants child: MemoryCap(region=X, rights=R)
The child gets read-only access. Rights can only be reduced, never elevated.
Transfer via IPC
Capabilities can be attached to IPC messages. This is how services delegate access:
FileServer receives "open /config" request
FileServer replies with MemoryCap(region=file_data, rights=R)
Client now has read access to the file's memory region
Revoke
The kernel (or a process with the appropriate meta-capability) can revoke a capability, immediately invalidating it. Any future use of the revoked slot returns an error.
Container mapping
In ThemeliOS, a “container” is a group of processes sharing a common set of capabilities. The container’s capability set defines its sandbox:
- Memory: Only the memory regions granted to it
- Network: Only the network endpoints it has capabilities for
- Filesystem: Only the filesystem views it’s been granted
- IPC: Only the services it has endpoint capabilities for
A container cannot discover or access anything outside its capability set. Unlike Linux containers (where a kernel exploit can escape the namespace), escaping a capability sandbox requires forging a kernel object — which is impossible without a kernel memory corruption bug.
Comparison with Linux isolation
| Aspect | Linux (namespaces/cgroups) | ThemeliOS (capabilities) |
|---|---|---|
| Default | Access everything, restrict selectively | Access nothing, grant explicitly |
| Enforcement | Kernel checks on each syscall | No syscall exists without capability |
| Escape risk | Kernel bugs can bypass namespaces | Requires kernel memory corruption |
| Resource discovery | Can probe for resources | Can’t even address unknown resources |
| Granularity | Per-namespace | Per-object, per-right |
Memory Management
This document describes ThemeliOS’s memory management subsystem design.
Status: Design phase. Implementation begins in Phase 1.
Overview
The memory management (MM) subsystem is responsible for:
- Physical frame allocation — tracking which 4 KiB pages of physical RAM are free or in use
- Virtual memory — creating and managing page tables for each process
- Kernel heap — providing dynamic allocation (
alloc-style) for kernel data structures
Physical memory
Boot-time discovery
The bootloader provides a memory map describing which physical address ranges are usable RAM, reserved by firmware, or used for MMIO. The frame allocator uses this map to initialize its free list.
Frame allocator
The frame allocator hands out 4 KiB physical memory frames. Initial implementation will use a bitmap allocator:
- One bit per physical frame (1 = allocated, 0 = free)
- Simple, predictable, easy to implement
- For 4 GiB of RAM: bitmap is 128 KiB (manageable)
Later optimization: replace with a buddy allocator for efficient allocation of contiguous multi-frame regions (needed for DMA buffers, large pages).
Capability integration
Physical frames are resources protected by capabilities. When a process requests memory:
- Kernel allocates a frame from the free pool
- Kernel creates a
MemoryCapfor that frame - Kernel inserts the capability into the process’s CSpace
- Process can now map the frame into its address space using the capability
A process cannot access physical memory it doesn’t have a capability for — the page tables are configured to reflect capability permissions.
Virtual memory
Address space layout (x86_64)
Lower half (user space, per-process):
0x0000_0000_0000_0000 - 0x0000_7FFF_FFFF_FFFF
Upper half (kernel space, shared across all processes):
0xFFFF_8000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF
├── Physical memory direct map
├── Kernel code and data
├── Kernel heap
└── Per-CPU data
Page tables
x86_64 uses 4-level page tables (PML4 → PDPT → PD → PT), each with 512 entries. Each entry is 8 bytes and can point to:
- The next level table
- A large page (2 MiB at PD level, 1 GiB at PDPT level)
- A 4 KiB page (at PT level)
The kernel manages page tables for each process. When a context switch occurs, the CPU’s CR3 register is loaded with the new process’s PML4 physical address, instantly switching the entire address space.
aarch64 differences
aarch64 uses a similar 4-level translation table scheme but with different register names (TTBR0/TTBR1 instead of CR3) and different table entry formats. The architecture abstraction layer hides these differences from the rest of the kernel.
Kernel heap
The kernel needs dynamic allocation for data structures like:
- Process control blocks
- Capability tables
- IPC message buffers
- Driver state
We’ll use the linked_list_allocator crate initially (a simple free-list allocator suitable for #![no_std] kernels), backed by physical frames allocated from the frame allocator.
The kernel heap lives in the upper-half virtual address space and is shared across all contexts (but only accessible from kernel mode).
Memory safety
Rust’s ownership model provides compile-time guarantees against:
- Use-after-free: The compiler prevents using a frame after it’s been freed
- Double-free: The compiler prevents freeing a frame twice
- Data races: Shared mutable access requires synchronization (
Mutex,RefCell)
The unsafe keyword is required for raw pointer operations (hardware register access, page table manipulation) — these are confined to small, well-documented blocks.
Milestones
ThemeliOS development is organized into phases. Each phase builds on the previous one and produces a working, testable artifact.
| Phase | Goal | Status |
|---|---|---|
| 0 | Boot on QEMU, serial output | Complete |
| 1 | Memory allocator, scheduler, interrupts (x86_64) | Complete |
| 2 | Capability system, process isolation, IPC | Complete |
| 3 | VirtIO block driver, read-only filesystem | Not started |
| 4 | VirtIO net driver, TCP/IP stack | Not started |
| 5 | OCI container support | Not started |
| 6 | Management API (Docker-compatible) | Not started |
| 7 | aarch64 port | Not started |
| 8 | Hyperscaler support (AWS, GCP, Azure) | Not started |
| 9 | Testing and benchmarks | Not started |
| 10 | Kubernetes worker node | Not started |
| 11 | GPU support across clouds | Not started |
| 12 | Production operations (observability, updates) | Not started |
Phase 0 — Boot (Complete)
Goal: Get the kernel booting on QEMU and printing to the serial console.
Deliverables:
- Bootloader integration (Limine or UEFI)
- Architecture-specific early init (x86_64 first)
- Serial console output (16550 UART on x86_64)
- “Hello from ThemeliOS” printed on boot
cargo xtask runboots the kernel in QEMU end-to-end
Phase 1 — Kernel basics (Complete)
Goal: A kernel that can manage memory and schedule tasks. x86_64 only — aarch64 is deferred to Phase 7.
Deliverables:
- Physical frame allocator (bitmap-based)
- Kernel heap allocator
- Interrupt handling (GDT, IDT, 8259 PIC on x86_64)
- Timer-driven preemptive scheduler (round-robin)
- Basic kernel shell over serial (for debugging, will be removed later)
- Automated test infrastructure (
isa-debug-exit,cargo xtask test, GitHub Actions CI)
Phase 2 — Isolation (Complete)
Goal: Implement the capability system and process isolation.
Deliverables:
- Custom page tables replacing Limine’s (required for per-process address spaces)
- Capability types and capability space (CSpace)
- Process creation with isolated address spaces
- Capability grant, transfer, and revocation
- Synchronous IPC (message passing between processes)
- Audit logging (tamper-evident record of capability usage for compliance and security)
- Reclaim bootloader-reclaimable memory (safe once we own GDT, page tables, and stack)
- First userspace process (init)
Phase 3 — Storage (Not started)
Goal: Read from a virtual disk and present a filesystem.
Deliverables:
- VirtIO block driver (for QEMU’s virtual disk)
- Read-only filesystem (simple format, possibly custom or FAT)
- RAM-backed ephemeral writable layer
- Immutable root image creation tooling
Phase 4 — Networking (Not started)
Goal: TCP/IP connectivity.
Deliverables:
- VirtIO network driver
- Ethernet, ARP, IPv4
- TCP and UDP
- Basic socket-like API via capabilities
- DHCP client
Phase 5 — Containers (Not started)
Goal: Run OCI container images.
Deliverables:
- Linux syscall compatibility layer (translate Linux syscalls to capability-checked ThemeliOS operations)
- OCI image format parsing and layer unpacking
- Container lifecycle (create, start, stop, destroy)
- Container exec (spawn processes inside a running container’s isolation boundary)
- PTY support for interactive terminal sessions
- Container-to-capability mapping (each container gets a capability set)
- Container networking (virtual interfaces, isolation)
- Log streaming from containers (stdout/stderr capture)
- Resource limits (CPU, memory) enforced via capabilities
- Container image registry support (Docker Hub, ECR, GCR, ACR)
- Registry authentication, TLS, and cloud-specific credential helpers
Phase 6 — Management (Not started)
Goal: Docker-compatible management API for the node.
Deliverables:
- Docker Engine API compatible subset (containers, exec, images, logs, networks)
- Bidirectional streaming for interactive exec sessions (websocket)
- Capability-based authorization (API clients mapped to capability sets)
- TLS client certificate and API token authentication
- Node status and health reporting
- Configuration injection at boot time
- No SSH — API is the only interface
- Standard Docker tooling works out of the box (
docker exec,docker ps,docker logs, etc.)
Phase 7 — aarch64 port (Not started)
Goal: Port all Phase 0 and Phase 1 functionality to aarch64 (ARM64), enabling ThemeliOS to run on ARM-based hardware and cloud instances (e.g., AWS Graviton).
Deliverables:
- aarch64 boot via Limine (UEFI on ARM)
- PL011 UART serial driver for debug output
- GIC (Generic Interrupt Controller) initialization and exception handling
- ARM generic timer for scheduler preemption
- Physical frame allocator (same bitmap design, architecture-independent)
- Kernel heap (architecture-independent, just works)
- Scheduler and context switch for aarch64 (different register set, different calling convention)
- Serial debug shell (architecture-independent, just works)
cargo xtask run --arch aarch64boots and passes all tests- Automated tests on aarch64 QEMU in CI
Phase 8 — Hyperscaler support (Not started)
Goal: Boot and run on AWS, GCP, and Azure.
Deliverables:
- Instance metadata service (IMDS) clients for all three providers
- Cloud-aware configuration injection at boot time
- Machine image tooling (
cargo xtask image --cloud aws/gcp/azure) - AMI creation for AWS (raw disk import via
aws ec2 import-image) - GCP image creation (raw disk tarball +
gcloud compute images create) - Azure VHD image creation
- UEFI Secure Boot chain verification and kernel image signing
- Measured boot (TPM support)
- Boot validation on each provider’s compute instances
- GitHub Actions workflow to build downloadable QEMU ISOs (x86_64, aarch64)
- GitHub Actions workflows to build and publish cloud-specific machine images
Phase 9 — Testing and benchmarks (Not started)
Goal: Comprehensive test suite and performance benchmarks to validate the OS works correctly end-to-end.
Deliverables:
- CI infrastructure (GitHub Actions with QEMU,
isa-debug-exitdevice for pass/fail exit codes) - Boot smoke tests (kernel boots, reaches known-good state, no panic)
- Kernel unit tests (allocator, scheduler, capability enforcement tested in isolation)
- Kernel integration tests (spawn process + grant capability + IPC message + verify result)
- Security and isolation tests (capability violations, unauthorized memory access, process escape attempts — all must fail cleanly)
- Container runtime tests with standard images (alpine, busybox, nginx)
- Custom test images (memory stress, network connectivity, filesystem I/O, multi-process isolation)
- Container lifecycle tests (create, start, stop, restart, destroy, exec)
- Multi-container isolation validation
- Container networking tests
- Resource limit enforcement tests
- Cloud validation tests (boot on each hyperscaler, IMDS, networking, container workloads)
- Benchmarks: boot time, context switch latency, IPC throughput, memory allocation speed, container cold-start time
- Benchmark history tracking for regression detection
Phase 10 — Kubernetes (Not started)
Goal: Full drop-in K8s/K3s/RKE2 worker node. Any pod that runs on an Ubuntu or Flatcar node must run identically on ThemeliOS.
Deliverables:
- Full Linux syscall coverage for real-world K8s workloads (databases, language runtimes, service meshes, logging agents, init systems)
- CRI (Container Runtime Interface) gRPC API implementation
- CNI (Container Network Interface) plugin support (Flannel, Calico, Cilium)
- CSI (Container Storage Interface) driver support for persistent volumes
- Pod semantics (groups of containers sharing network and storage namespaces)
- kubelet (standard binary or compatible custom implementation)
- kube-proxy equivalent for service networking and load balancing
- Node registration, capacity reporting, and health conditions
kubectl exec -itwith full interactive shell supportkubectl logs,kubectl cp,kubectl port-forward- Pod resource management (CPU/memory requests and limits, QoS classes)
- DNS resolution for K8s service discovery
Phase 11 — GPU support (Not started)
Goal: GPU passthrough and accelerator support for containerized workloads across all major cloud providers.
Deliverables:
- VFIO/IOMMU support for GPU device passthrough to containers
- NVIDIA driver ioctl compatibility in the syscall layer
- K8s device plugin API support for GPU resource scheduling
- GPU resource requests and limits in pod specs
- Validation on AWS GPU instances (P/G series)
- Validation on GCP GPU instances (A2/G2 series)
- Validation on Azure GPU instances (NC/ND series)
- Cloud-specific accelerator support (AWS Inferentia/Trainium, GCP TPU, Azure AMD GPUs)
Phase 12 — Production operations (Not started)
Goal: Day-2 operational tooling for running ThemeliOS nodes in production.
Deliverables:
- Metrics export in Prometheus format (node-exporter compatible)
- Log forwarding to external collectors (CloudWatch, Stackdriver, Fluentd)
- Health endpoints for load balancers and orchestrators
- Distributed tracing support for container workloads
- A/B partition scheme for whole-image OS updates
- Automatic rollback on failed updates
- Zero-downtime node upgrades (drain → swap image → rejoin cluster)
- OS update tooling (
cargo xtask image --updateor equivalent) - Update coordination with K8s (respect PodDisruptionBudgets during upgrades)