Reading a kernel, Part 1: getting your bearings
A kernel is not a book you read front to back. It is a building you learn to navigate — and like any large building, it has a logic to its layout that becomes readable once you stop looking for the front door.
The first time I sat down with kernel source I made the mistake most people make: I tried to start at the beginning. There is no beginning. A kernel is not a narrative. It is a city, and the thing you need first is not a map so much as an understanding of how the city is organised — which districts exist, what they contain, how they connect, and where the interesting things tend to happen.
This is a two-part series. Part one is about orientation. Part two is about finding the interesting bits once you know where you are. Both are about reading — not executing, not debugging, not fuzzing. Reading, which is underrated as a research method.
The directory structure is the argument
Start with the top-level directory layout. In any BSD-derived kernel — FreeBSD, OpenBSD, NetBSD, or XNU underneath macOS — the structure follows a broadly consistent logic. kern/ contains the core: process management, scheduling, system calls, the virtual filesystem interface. sys/ holds the headers that define the structures everything else uses. net/ and netinet/ contain the network stack. vm/ is the virtual memory subsystem. dev/ is device drivers.
These divisions are not arbitrary. They reflect the actual architecture of the system — the separation of concerns that the original designers imposed and that subsequent maintainers have mostly honoured. Reading the directory layout is reading a high-level design document. It tells you what the authors thought were the major subsystems, what they thought belonged together, and what they thought should be kept apart.
Linux is organised differently — arch/, drivers/, fs/, mm/, net/ — but the same principle applies. The structure is the argument. Read the structure before you read the code.
Finding the entry points
The system call table is the front door. In BSD kernels it is typically kern/syscalls.c or a generated file derived from syscalls.master. In Linux it is the architecture-specific syscall_table. This table maps system call numbers to handler functions. It is the complete list of everything userspace can ask the kernel to do on its behalf.
Pick a system call you understand well from the user side — open(2), say, or mmap(2). Find its handler in the table. Follow it into the implementation. This is the most reliable orientation exercise I know: you start from something you already understand the semantics of, and you trace how those semantics are implemented. You learn the conventions — how arguments arrive, how errors are returned, how the kernel validates input before acting on it — in a context where you already know what the answer should look like.
The validation step is always worth reading carefully. It is where the kernel checks that userspace has not passed it something it cannot handle. It is also, historically, where things go wrong.
Subsystem mapping
Once you have found the entry points, the next task is mapping which subsystems a given path touches. Most interesting operations in a kernel cross subsystem boundaries: an open(2) call touches the VFS layer, the filesystem-specific code, the file descriptor table, and potentially the security framework. A mmap(2) touches the VM subsystem, the file system, and the architecture-specific page table management.
The interfaces between subsystems are the most interesting places to read. They are where assumptions are made explicit — or where they are left implicit and become vulnerabilities. The function that one subsystem calls to ask another for something is a contract. Read contracts carefully: look at what the caller assumes the callee will check, and what the callee assumes the caller has already verified. The gap between those two assumptions is where the interesting things live.
Part two will go there: the privilege transitions, the trust boundaries, the places where the kernel's model of what is safe to do changes based on who is asking. That is where the reading starts to pay off.
A kernel is a building, and buildings are read by walking them. You will not understand the whole on the first walk. You are not supposed to. What you are building, slowly, is a sense of which rooms connect to which other rooms — so that when something unexpected happens, you know which direction to walk.