Penfold — the Python toolkit extracted from a 13-day OpenBSD vulnerability-research campaign

Five subpackages — recon, verify, disclose, harness, orchestrator — including the four-verifier pre-send hallucination filter from the Calculator Discipline paper. BSD-2-Clause. Released to keep the methodology honest.

Stuart Thomas

Independent Security Research (TriageForge) — Whitby, North Yorkshire, United Kingdom

26 May 2026 · Status: Research-grade / 0.1.2 · Language: Python ≥ 3.10 · Licence: BSD-2-Clause · Source: github.com/jetnoir/penfold

1. Summary

Penfold is a small Python package that bundles the recon, verify, and disclosure-discipline tools I built and validated during a 13-day OpenBSD vulnerability-research campaign in May 2026. That campaign produced four landed disclosures (OSPFD-001, OSPF6D-001, SNMPD-001, and EIGRPD-001) and one body of methodology lessons; penfold is the parts of the toolchain that survived the “walk-back-anything-that-doesn’t-replicate” discipline at the end of the campaign.

The name is a Danger Mouse reference. Penfold is the hamster: modest, helpful, scope-aware, occasionally indispensable, prone to saying “oh crumbs” when things go sideways. That is roughly the right mental model for these tools.

The repository sits at github.com/jetnoir/penfold under the BSD 2-Clause Licence. It is research-grade software — the components are individually robust and have been used in anger, but the package is not packaged as a turnkey scanner.

2. Why this exists

Three reasons, in roughly increasing order of importance.

One. The four disclosed bugs needed a small set of tools to find. Some of those tools were one-off scripts; some were worth keeping. Penfold is the keeping pile. Walking it back as a coherent package was easier than letting it rot as a directory of unrelated .py files that only worked on my machine.

Two. The methodology paper that accompanies the campaign — The Calculator Discipline — documents a pre-send filter (hallucination_check.py) that catches the mechanical classes of AI-assisted disclosure failures (bug-shape fabrication, evidence fabrication). The filter is only useful if it ships somewhere people can pick it up. Penfold is the somewhere.

Three. Wave 10 of the campaign was a hard methodology correction. Five structural ranking signals turned out not to transfer at scale; one ranker survived. Publishing penfold with the survivors clearly separated from the deprecated experiments — the latter live in the repo’s deprecated/ directory with per-tool post-mortems — is the honest way to make the lessons available to other researchers. A paper saying “walk things back” means more when the walked-back artefacts are in the same repository.

3. The shape of the package

Penfold is structured as five Python subpackages, mirroring the five stages of the workflow it supports:

3.1 — `penfold.recon`

Statistical ranking of candidate functions in a libclang-extracted corpus. The production primary signal is rmt_null_test, which compares the call-graph spectrum of each function against a Marchenko-Pastur null distribution sampled by Monte Carlo. Functions whose λ₂ survives the null are flagged for human audit. cg_dist_score is a modestly useful secondary signal based on call-graph distance to known-bad fingerprints. libclang_extractor is the AST front-end that produces the corpus. hunt_rmt_null is a fast empirical-null variant for larger corpora where Monte Carlo is too expensive.

3.2 — `penfold.verify`

Two binary-side gates. frame_check reads an objdump of a shipped binary at a candidate sink and returns a verdict on whether the OpenBSD stack-canary scheme (canary at [rbp-8], mixed with the return address on arm64 via PAC) would defang a hypothetical OOB write at the cited offset. fingerprint_locate resolves source function names to addresses in stripped binaries by matching characteristic byte sequences, so that the frame check can be run against the actual shipped binary rather than the build-tree object file.

3.3 — `penfold.disclose`

The pre-send discipline layer. caller_bounds_detector is a structural check that flags drafts using size_t arithmetic with memcpy/memmove but no caller-bounds-analysis section. disclosure_template is a gate-enforced scaffold for new disclosure drafts — it refuses to render a final draft unless every gate it knows about has been satisfied. hallucination_check is the headline pre-send filter from the Calculator Discipline paper — ten verifiers covering file:line resolution, version-tag matching, fabricated-evidence detection, PoC-existence checks, and the four extended verifiers added in the paper’s §6.

3.4 — `penfold.harness`

Skeleton generators and BSD-side network primitives for live testing. harness_gen emits a *_verify.c skeleton that calls the candidate sink with attacker-controlled inputs — the researcher fills in the TODOs, links against the project’s real headers, and runs the result under ASAN. harness.bsd_pwn is a small Python module for sending raw packets over the wire to BSD test targets, with checksum, sockets, recv and packet utilities; used for the live amd64 DoS validation of EIGRPD-001.

3.5 — `penfold.orchestrator`

mkii_run is a thin pipeline driver that stitches the four upstream subpackages together using a JSON state file. It is not magic; it just tracks which gates have been satisfied for a given bug ID and refuses to advance past gates that are still OPEN. Each command (screen, audit, harness, verify, disclose) maps to a step in the workflow. Researchers can run any step manually; the orchestrator just records what has been done.

4. Using it

Install:

git clone https://github.com/jetnoir/penfold
cd penfold
pip install -e .

A minimal hunt looks like:

# 1. Score every function in a libclang-extracted corpus
python -m penfold.recon.hunt_rmt_null my_corpus.pkl ranked.json

# 2. Pick a candidate; verify its stack frame against the shipped binary
python -m penfold.verify.frame_check /usr/sbin/ospf6d \
    --arch x86_64 --function lsa_check \
    --buf-offset -80 --buf-size 16 --oob-size 16

# 3. Generate a *_verify.c skeleton; fill in the TODOs; build and run under ASAN
python -m penfold.harness.harness_gen --function lsa_check --target ospf6d

For the full workflow including the orchestrator and the pre-send gates, see docs/OPERATIONS.md in the repository.

5. Scope and limits

Honest about what these tools do and do not do.

Target class. OpenBSD-shaped C — kernel and userland daemons that follow the OpenBSD compiler-and-libc posture (stack canary at [rbp-8], PAC on arm64, imsg-style IPC, event-loop dispatch). Most of the tools generalise to other BSDs and to Linux C with minor knob changes; the canary pattern in frame_check is the one place this assumption is hard-coded.

Not a fuzzer. Penfold does not generate inputs or run targets. The harness skeleton stage produces something a fuzzer can chew on, but the fuzzing itself is the researcher’s job.

Not a symbolic executor. Penfold does not reason about path conditions. The validation step in frame_check is purely lexical against the disassembly; it tells you whether the canary scheme would catch a hypothetical OOB at the cited offset, not whether the OOB is reachable.

Not a bug finder by itself. Every stage of penfold defaults to “needs human review”. The ranker surfaces candidates; the verifier returns a verdict that a human reads; the disclosure scaffold refuses to render until a human ticks the gates. Penfold is a discipline framework, not an oracle.

6. What got walked back

Five ranking signals were built, evaluated, and walked back when the validation corpus grew. They live in deprecated/ with per-tool post-mortems:

rmt_score_nb — non-backtracking matrix variant of the spectral ranker. Failed to transfer across three corpus sizes; collapsed at N=11,415.
dom_score — dominator-tree depth ranking. No signal on validation.
tda_score — persistent-homology features. No signal on validation.
vig_lambda2 (as anti-predictive signal) — an N=5 fluke; the anti-predictive claim was withdrawn. λ₂ still appears as a feature inside rmt_null_test.
cpg_ranker — built but never validated as load-bearing during the campaign.
bayes_fuse v1/v2/v3 — score-fusion experiments; all three dropped at Wave 10.

Publishing these alongside the survivors is the part I care about most. A paper saying “walk things back” means more when the walked-back artefacts ship in the same repository.

Legal note

Penfold is released under the BSD 2-Clause Licence. The author makes no warranty as to fitness for any particular purpose. Users are responsible for ensuring that their use of the tool complies with applicable law, including the Computer Misuse Act 1990 (England and Wales) and equivalent legislation elsewhere. The tool is intended for use against systems the user owns or has explicit written authorisation to test.

The third-party libraries penfold depends on — libclang (LLVM Project), NumPy and SciPy (NumFOCUS), NetworkX (NetworkX developers), and pyserial — are the work of their respective authors and are used under their own licences. Their inclusion as dependencies should not be taken as endorsement of penfold by their maintainers.

The OpenBSD maintainers named in the README (Claudio Jeker, Martijn van Duren, Theo de Raadt, Theo Buehler) are credited in their public capacity as the committers of the fixes for the four disclosures that informed penfold’s design. None of them endorse this package or have any involvement with it; they are credited for the work of theirs that made the methodology lessons possible.