Fuzzing: a week of noise
A week of fuzzing looks like this: mostly nothing, occasionally a crash, rarely something interesting. The ratio of noise to signal is the point, not a disappointment. What you learn from the noise is what the signal means.
Fuzzing has the reputation of being automated magic: you point a fuzzer at a target, leave it running, and findings emerge. This is not entirely wrong. Fuzzers do find things that manual review misses — particularly in parsing code, where the space of possible inputs is too large for any human to enumerate exhaustively. But the week of fuzzing I want to describe is less about the findings and more about reading the output, which is what makes fuzzing productive rather than merely busy.
The output of a fuzzer, before you have tuned it, is mostly noise. Crashes caused by the fuzzer feeding the target input it is not equipped to handle at all — empty files where the target expects non-empty, truncated data at structurally significant offsets, input types that the target rejects immediately before doing anything interesting. These are genuine crashes, technically, but they are not findings. They are evidence that the fuzzer's input generation needs constraining.
The first day
The first day of a fuzzing campaign should be spent not on the output but on the coverage. Modern coverage-guided fuzzers — AFL, libFuzzer and their descendants — instrument the target binary and track which code paths they exercise with each input. Coverage is the measure of whether the fuzzer is reaching the interesting parts of the code. Crashes in unreached code paths are not possible. Crashes in code paths you do not know exist are the ones you want.
I spend the first day looking at coverage reports and asking: is the fuzzer reaching the parsing code? Is it reaching the code paths that process fields I am interested in? If not, why not — is there a precondition I need to satisfy to reach them? Answering these questions transforms a generic fuzzing campaign into one targeted at the specific functionality that the research question is about.
Reading the crashes
By the third or fourth day, the fuzzer has typically settled into a rhythm. New coverage additions are slowing down. The crash bucket is growing, but many crashes are caused by a small number of distinct root causes. Crash deduplication — most fuzzers do this automatically, grouping crashes by similar stack traces — is essential at this stage. What looks like fifty findings often reduces to three or four distinct issues.
For each distinct crash: reproduce it manually, outside the fuzzer, using the minimal crashing input the fuzzer has generated. Read the crash output carefully: the signal (SIGSEGV, SIGABRT), the stack trace, the memory address if available. Then read the code at the crash site. What was the function trying to do? What condition caused it to fail? Is the crash at the expected input validation point, or has the malformed input propagated further than it should have before causing a failure?
What fuzzing finds that manual review misses
The classic answer is: parsing edge cases. The single input that triggers a length calculation to overflow. The specific combination of field values that causes a state machine to enter an undefined state. The input that is technically valid according to the parser's interpretation of the format but contains something that causes a downstream consumer to misbehave. Human reviewers read code with assumptions about what inputs will look like. Fuzzers operate without those assumptions.
What manual review finds that fuzzing misses: logical issues. Authorisation bypasses. Race conditions that require specific timing, not just specific inputs. The finding that requires understanding what the code is supposed to do as well as what it actually does. A fuzzer has no model of the intended behaviour. It only observes when the behaviour becomes observably wrong.
The combination of both methods — reading the code, then fuzzing the surface, then reading the crashes — is more effective than either alone. Fuzzing without prior reading produces crashes you cannot interpret. Reading without fuzzing misses the inputs you did not think to construct.
A week of fuzzing ends with a crash bucket, a coverage map, and a sharper model of where the interesting code lives. Mostly noise. Occasionally signal. The patience to sit with the noise long enough to hear the signal is the thing fuzzing teaches — which is to say, the same thing everything in this work teaches.