The map and the hunt
After thirty-five years, I stopped trusting my instinct for where to look — and started laying three boring public data sources on top of one another. What that changed, and what it did not.
The question I have been asking myself, on and off, for about three decades: where to look.
For most of those decades the answer was instinct. After enough years staring at the same kind of software you build up a feeling — this daemon is too quiet to be safe, that format parser was written by someone tired, that RFC has been read by more committee members than implementers. You learn to trust it. You ship findings. You move on.
The trouble is that instinct has a half-life. After thirty years it starts to feel reliable in a way that’s no longer earned. After thirty-five it starts to feel reliable in a way that ought to worry you.
So this spring I tried to stop hunting by smell.
What three boring sources say
There are public data sources for vulnerability research that almost nobody triangulates, presumably because triangulating them is dull. The National Vulnerability Database publishes a row for every CVE assigned. The CISA Known-Exploited Vulnerabilities catalogue publishes the much shorter list of bugs that someone has actually used in anger. The vendor’s own advisories — if you bother to walk the Wayback Machine — tell you what they themselves chose to ship a fix for, and what they named the component, which the NVD descriptions usually do not.
Each source on its own is mildly useful. The interesting thing is what they say when you lay them on top of one another.
Apple is the case I happen to know. The same shape would apply to Microsoft, to the BSDs, to any vendor that publishes advisories and any government that publishes an exploited-in-the-wild list. The method is portable. The example is not the point.
The divergence
Lay the volume rank against the exploitation rank and a gap opens up that I had not seen so clearly before.
There is one class of bug — race conditions in privilege-checking code, broadly — that has consistently large NVD volume. Six months, dozens of CVEs, every release cycle. Researchers love them. I have written about them. There is a small literature.
In the exploited-in-the-wild list, the same class is almost absent. Years of data. One entry. One.
Browser memory corruption is the opposite shape. Moderate NVD volume, dominant in the wild. Year after year. The fixed cost of weaponising it has gone up since the major sandboxes were redesigned, and you can see that in the slope, but it is still where the people who do this for money spend their money.
I have spent more research time on the first class than on the second. The data is gently informing me that this might have been a use of time that was more interesting than it was useful.
There is a separate quiet category — log-redaction failures and information leaks in diagnostic plumbing — that nobody in particular is excited about, which sits in the middle of both lists, and which is producing fixes month after month. Boring is also a signal.
A maths trick that does not read code
The other thing I have been playing with is a screening pass that looks at the shape of a binary’s call graph, not at what the calls actually do.
The technique borrows from random matrix theory — a corner of statistics that asks what an ordinary, unremarkable matrix looks like in bulk, so you can tell when one is not ordinary. Build a matrix from the call graph. Take its spectrum. Compare against a noise null model. Score on a few axes. The binaries whose spectrum sits well outside the noise band rise to the top of a list.
Crucially, it does not know what the code does. It does not know what a function name means. It does not know what an entitlement is. It just notices that this particular program is wired together in a way that does not look like the others in its peer group.
I find it useful for the same reason I find a chess engine’s evaluation bar useful when I am tired. It does not play the game for me. It says, gently, “look there now.”
What it does not do
It does not find bugs. None of this finds bugs.
What it does is reduce the number of places I am prepared to spend a week of attention on, from a long list of plausible candidates to a short list of unusually shaped ones. The next step — the only step that ever finds anything — is still loading the function in a disassembler, going through it carefully, and asking what would happen if a particular call could be made to happen out of order, or with a particular value, or in a particular state.
That part is irreducibly slow and there is no maths that helps. The maths only helps you decide which slow walk to take next.
It also does not, for the avoidance of doubt, produce a finding that I would file with anyone. A z-score is not a CVE. A peculiarly shaped call graph is not a vulnerability. The screen has a false-positive rate I can estimate but not eliminate. Two of every three things it flags will, on inspection, turn out to be perfectly innocent code that was written by an unusually thorough engineer and has the call graph of a hedgehog.
The two cases worth pursuing are the value. The third is the price.
What changes when you stop hunting
The honest answer is: less than I expected, and more than I expected, in different ways.
Less, because I still spend most of my days in the same way — reading code slowly, writing notes, drinking coffee that is too cold to be enjoyable. The work has not become a dashboard exercise. I would mistrust myself instantly if it did.
More, because the part of my head that used to spend energy on am I looking in the right place has gone quiet. Not silent — it should not be silent — but quieter. It is a different temperature of attention. Calmer. Less tempted by the shiny.
Nobody’s asking me to do this. Nobody’s paying me to do this. Nobody will be particularly surprised if I stop. The point isn’t to industrialise the practice. The point is to stop being wrong about where to start.
Today the screen has a function I wouldn’t have picked yesterday. Most likely the call graph was unusual because someone careful wrote a lot of small helpers, and the function is fine. That’s the work.