Metis — automated binary vulnerability triage for macOS, Linux and Windows

A five-stage research pipeline: backbone-fraction path prioritisation, Random-Matrix-Theory anomaly screening, SSA dataflow templates, symbolic taint analysis with angr and Z3, and on-device dynamic validation.

Stuart Thomas

Independent Security Research — Whitby, North Yorkshire, United Kingdom

24 May 2026 · Status: Research-grade / experimental · Language: Python 98.3%, C 1.6% · Licence: MIT-style · Source: github.com/jetnoir/metis

1. Summary

Metis is a research toolchain for triaging compiled binaries against a library of vulnerability templates, with a deliberately staged pipeline that throws cheap techniques at the problem before reaching for expensive ones. Five components: C1 path prioritisation by constraint hardness, C2 spectral anomaly detection on the call graph, C3 template-driven SSA dataflow matching, C6 symbolic taint analysis with proof-of-concept synthesis, and C7 on-device validation. The whole thing is Python, sits on top of angr and Z3, and runs cross-platform against macOS, Linux and Windows binaries.

The repository sits at github.com/jetnoir/metis under an MIT-style licence. It is research code — useful in the contexts it has been used, not industrial-grade triage infrastructure.

2. Why I built it

I was tired of writing the same triage script for every new vulnerability lead. A finding would come out of a source-review pass, I would want to know whether the equivalent defect existed in a dozen other binaries I had on disk, and I would end up doing the same five things by hand: enumerate plausible call paths, work out which were worth symbolic execution, set up the symbolic state, write the taint specification, and validate whatever the solver claimed against the real binary on the real OS.

The five-component layout in Metis is exactly that workflow, broken into stages where each stage’s output is small enough to be the input to the next. Cheap and statistical first; expensive and exact later. The interesting parts are not the underlying engines — angr and Z3 do the heavy lifting and are well-trodden — but the pruning steps that get the working set down to something a solver can actually finish on.

3. The pipeline

3.1 — C1: backbone-fraction path prioritisation

C1 scores candidate paths through the binary by how much of the path’s constraint set is “hard” in the SAT sense. The intuition is that paths dominated by trivially-satisfiable branches rarely contain the interesting bugs — the hard branches are where the parser logic, length arithmetic, and state-machine transitions live. Sorting by backbone fraction means the symbolic engine spends its budget on those paths first. The reported figure in the repository is a 60% reduction in peak active states on mixed-difficulty binaries, which matches what I see in practice.

3.2 — C2: spectral anomaly detection via Random Matrix Theory

C2 takes the call graph as an adjacency matrix and looks at its eigenvalue spectrum. RMT predicts what the spectrum of a random graph of the same density should look like; deviations from that prediction localise to functions whose connectivity pattern differs sharply from the surrounding code. In practice this surfaces things like recently-added handler functions, hand-written assembly stubs, and statically-linked third-party libraries that have been bolted onto an otherwise organic codebase. Not every spectral anomaly is a vulnerability, but a high fraction of vulnerabilities are spectral anomalies.

3.3 — C3: template-driven SSA dataflow matching

C3 lifts the binary to SSA form, computes a memory-taint relation, and matches against a library of templates that describe known defect patterns (missing length checks, sign-confusion arithmetic, unchecked allocator return values, the standard set). Templates are written declaratively. Matches are the input to C6.

3.4 — C6: symbolic taint with PoC synthesis

For each C3 hit that survives a sanity filter, C6 sets up a symbolic state at a controlled entry point and asks angr to drive execution through the candidate sink. Where the solver finds a satisfying assignment of attacker-controlled inputs, Z3 hands back a concrete proof-of-concept payload. That payload is what gets passed to C7.

3.5 — C7: on-device validation

C7 takes the C6 PoC and runs it against the real binary on the real operating system, under one of three drivers depending on what is available: subprocess (target accepts input on stdin or argv), LLDB (need to set a breakpoint and inspect register state at the alleged sink), or DTrace (need to observe a syscall pattern without modifying the target). The output is a yes/no on whether the source-level finding is exploitable on the shipped binary — a question that, as previous work documented in detail, source review alone cannot answer.

4. Using it

A minimal pipeline invocation looks like:

from metis.c2_rmt import C2RMTAnalysis
from metis.c6_taint import C6Analysis
import angr

proj = angr.Project(BINARY, auto_load_libs=False)
c2_result = C2RMTAnalysis.from_project(proj).run()
# C6 takes the anomalies that survive C3 and drives them symbolically

The repository ships operational manuals, case studies, and worked examples that cover the full pipeline end-to-end on real binaries. The Python requirement is 3.11 or 3.13; 3.12 is currently unsupported because of an upstream issue in one of the dependencies.

5. Status

Metis is research-grade. It has been used in anger on several of the disclosures listed elsewhere on this site, and the components are individually robust, but the pipeline is not packaged as a one-command tool that a non-author can drop onto an unfamiliar binary and trust the output of. The README is the canonical reference. The Operations Manual in-repo documents the assumptions each component makes.

I am happy to take patches, issues and (within the terms of my engagement constraints) discussion of methodology. I am not in a position to provide commercial support.

6. What it does not do

It bears stating, in the spirit of being honest about what tools do:

It does not find bugs that source review could not have found. It accelerates triage; it does not invent novel detection theory.
It does not produce CVEs. The output is a candidate finding that still needs a researcher to write the disclosure, coordinate with the vendor, and judge the severity.
It does not handle obfuscated or packed binaries with any grace. Most real-world targets are not packed; if yours is, you have other problems.
It does not run unattended. The validation step in C7 expects a person to be paying attention.

Within those limits it has been useful enough to be worth maintaining.

Legal note

Metis is released under an MIT-style licence (see the repository for the exact wording). The author makes no warranty as to fitness for any particular purpose. Users are responsible for ensuring that their use of the tool complies with applicable law, including the Computer Misuse Act 1990 (England and Wales) and equivalent legislation elsewhere. The tool is intended for use against systems the user owns or has explicit written authorisation to test.

The third-party tools that Metis depends on — angr, Z3, numpy, scipy, LLDB, DTrace — are the work of their respective authors and are used under their own licences. Their inclusion as dependencies should not be taken as endorsement of Metis by their maintainers.