Sprint 6 — Syscall Interface & Userspace Entry

Cross the Ring 0 / Ring 3 boundary.

✅ Complete

Table of contents

Overview #

Sprint 6 connects all the kernel subsystems together into a usable system by implementing the SYSCALL/SYSRET fast transition mechanism, an ELF loader, and the actual transition into Ring 3 (user mode). After this sprint, the kernel can load and run a userspace program.


SYSCALL/SYSRET #

✅ Implemented — kernel/src/arch/x86_64/syscall.rs

What is SYSCALL/SYSRET?

The SYSCALL instruction is the fast path for entering the kernel from userspace on x86_64. Unlike software interrupts (int 0x80), SYSCALL doesn't push to the stack or read the IDT — it uses pre-configured MSRs (Model-Specific Registers) for maximum speed.

MSR Configuration

MSRNamePurpose
STARSegment SelectorsBits 47:32 = kernel CS, Bits 63:48 = user CS base
LSTARSyscall Entry64-bit address of the syscall handler entry point
SFMASKRFLAGS MaskFlags to clear on syscall entry (disable interrupts)

Syscall Entry Point

When userspace executes SYSCALL:

  1. CPU saves RIP in RCX, RFLAGS in R11
  2. CPU loads CS/SS from STAR MSR → kernel mode
  3. CPU masks RFLAGS with SFMASK → interrupts disabled
  4. CPU jumps to LSTAR → our entry point

Our handler then:

  1. Swap to kernel stack (from TSS RSP0)
  2. Save all user registers to the thread's save area
  3. Dispatch based on RAX (syscall number)
  4. Execute the syscall handler
  5. Restore user registers
  6. SYSRET back to userspace

Register Convention

RegisterRole
RAXSyscall number (in) / return value (out)
RDIArgument 1
RSIArgument 2
RDXArgument 3
R10Argument 4 (RCX is clobbered by SYSCALL)
R8Argument 5
R9Argument 6
RCXSaved RIP (by CPU)
R11Saved RFLAGS (by CPU)

Syscall Dispatch Table #

✅ Implemented — 12 syscalls dispatched via match on RAX

The kernel dispatches syscalls via a match frame.rax in syscall_dispatch(). SYS_EXIT (RAX=0) is handled inline with thread_exit().

RAXNameArgumentsDescription
0SYS_EXITTerminate calling thread (thread → Dead, schedule away)
1SYS_SENDslot, label, data0, data1IPC send on endpoint capability
2SYS_RECVslotIPC receive — blocks until message arrives
3SYS_PORT_OUTslot, port, value, widthWrite to I/O port via IoPort capability. R10 width: 0/1=byte, 4=dword
4SYS_PORT_INslot, port, widthRead from I/O port via IoPort capability. R10 width: 0/1=byte (RDI=u8), 4=dword (RDI=u32)
5SYS_WAIT_IRQslotBlock until hardware IRQ fires on IrqLine capability
6SYS_SPAWN_PROCESSCreate empty child process, returns CNode slot of Process cap
7SYS_ALLOC_MEMORYalloc_slot, target_slotAllocate physical frame via PmmAllocator, store MemoryFrame cap in target_slot
8SYS_MAP_MEMORYproc_slot, frame_slot, vaddr, flagsMap MemoryFrame into process VA. Flags: bit 0 = WRITABLE, bit 1 = EXECUTABLE
9SYS_DELEGATEproc_slot, src_slot, dst_slotCopy capability from caller's CNode to child process's CNode
10SYS_SPAWN_THREADproc_slot, user_rip, user_rspCreate Ring 3 thread in target process, returns TID
11SYS_DROP_CAPslotRemove capability from caller's CNode slot (frees for reuse)

Error Convention

All syscalls return u64 in RAX. Success = 0 (or a positive value like slot/TID). Errors are sentinel values near u64::MAX:

ValueMeaning
u64::MAXInvalid slot index
u64::MAX - 1Insufficient rights
u64::MAX - 2Wrong capability type
u64::MAX - 3Endpoint/Process not found
u64::MAX - 4PMM out-of-memory / alignment error
u64::MAX - 5Process not found / already has waiter
u64::MAX - 6map_page failure

ELF Loader #

✅ Implemented — kernel/src/arch/x86_64/syscall.rs (load_elf_into_process)

What is ELF?

ELF (Executable and Linkable Format) is the standard binary format for executables on Linux and bare-metal systems. The kernel must parse ELF files to load userspace programs.

Loading Process

  1. Read ELF header — verify magic bytes, architecture (x86_64), type (executable)
  2. Parse program headers — each PT_LOAD segment describes a chunk to map:
    • Virtual address, file offset, file size, memory size
    • Permissions (Read, Write, Execute)
  3. Allocate pages — use PMM to allocate physical frames for each segment
  4. Map pages — use VMM to create mappings in the process's address space with correct permissions
  5. Copy data — copy segment contents from the ELF file into the mapped pages
  6. Zero BSS — if memory size > file size, zero the remaining bytes
  7. Set up user stack — allocate and map pages at the top of userspace (e.g., 0x7FFFFFFFE000)
  8. Return entry point — the ELF header contains the address where execution begins

Address Space Layout (Userspace)

    block-beta
      columns 1
      block:stack["0x00007FFFFFFFFFFF"]
        A["User Stack (grows ↓)"]
      end
      block:guard["0x00007FFFFFFFE000"]
        B["Guard Page"]
      end
      block:heap[" "]
        C["Heap (grows ↑)"]
      end
      block:bss[" "]
        D[".bss    R+W"]
      end
      block:data[" "]
        E[".data   R+W"]
      end
      block:rodata[" "]
        F[".rodata R"]
      end
      block:text["0x0000000000400000 ← ELF base"]
        G[".text   R+X"]
      end
    

Ring 3 Entry #

✅ Implemented — ring3_entry() + jump_to_ring3()

Steps to Enter Userspace (Actual Implementation)

  1. Create process — allocate PML4, CNode (64 slots), kernel thread
  2. Load ELF — parse PT_LOAD segments, map with W^X permissions, zero BSS
  3. Set up user stack — map 4 pages at 0x7FFFFFFFE000 with User + Writable + NX
  4. Prepare initial capabilities — Init (PID 1) receives:
    • Slot 1: PmmAllocator — right to allocate physical frames
    • Slot 2: IoPort { base: 0x3F8, size: 8 } — COM1 serial
    • Slot 3: Process { pid: 1 } — self-reference for memory mapping
    • Slot 4: IoPort { base: 0xC000, size: 128 } — Virtio-Blk I/O BAR (dynamically discovered via PCI)
  5. Switch to user page tables — schedule() swaps CR3 to process PML4
  6. IRETQjump_to_ring3() performs swapgs then iretq with user CS/SS (GDT selectors 0x23/0x1B), RFLAGS=0x202 (IF set), entry at 0x400000

Verification

Test that the syscall round-trip works correctly:

  1. Userspace calls SYSCALL → enters kernel
  2. Kernel processes the request
  3. Kernel returns via SYSRET → back in userspace
  4. Verify registers are preserved, return value is correct

Security Considerations #


Dependencies #