Security Engineering · Autonomous AI Systems

Matthew Bowman

I build the security, cryptographic provenance, and audit infrastructure that agentic AI systems need to be trusted — backed by 15 years of keeping production alive when things break.

Austin, TX → relocating to NYC · incident response · endpoint & multi-cloud · 15+ years

About

I'm a security and systems engineer with 15+ years across enterprise IT, multi-cloud architecture, and security operations. My day-to-day is keeping production systems healthy and defensible across AWS, GCP, and Azure; my nights are spent building the autonomous security tooling shown below.

Hands-on with EDR-driven incident response (SentinelOne across 100+ environments), cloud security hardening, and high-tempo production incident work. Deep operator history in the gaming and media industry. Former U.S. federal Confidential clearance. I like problems where security, automation, and scale meet.

Focus
Incident response · detection · security automation
Cloud
AWS · GCP · Azure · Kubernetes
Security
SentinelOne EDR · IAM · PKI · log analysis
Code
Python · Bash · PowerShell · Go
Certs
CompTIA Security+ · Network+
Based
Austin, TX → relocating to NYC

Selected Work

Independent security R&D — original systems I designed and built. Concept-level; no client data, targets, or findings.

Autonomous Security Research

Meridian

A containerized pipeline that chains reconnaissance → vulnerability analysis → exploit validation, built to understand how automated adversaries operate at scale.

Meridian operations console — recon → hunt → verify → report
Meridian operations console — recon to hunt to verify to report pipeline with live service status
See the findings pipeline (targets redacted)
Findings / triage queue — target hostnames and counts redacted
Meridian findings pipeline — candidate findings queued for human triage, target hostnames and counts redacted
Problem
Modern attack surfaces are too large to assess by hand, and defenders rarely see how an automated attacker actually prioritizes and moves.
Approach
A multi-stage, WAF-aware pipeline with CVE-first prioritization and breadth-then-depth heuristics that decide when to pivot vs. go deep — with evidence capture and structured reporting built in.
Impact
Turns days of manual recon into continuous, prioritized signal, and doubles as a defender's lens on attacker tooling, tempo, and decision-making.
PythonDocker Compose · 30+ servicesorchestrationrecon / vuln toolingLLM-assisted triage

AI Agent Security · Cryptography

Seal

Cryptographic provenance for AI-agent prompts — replacing brittle "injection detection" with signatures that fail closed.

Problem
Prompt-injection defenses based on reading language are guesswork; an attacker only has to phrase it differently.
Approach
Every prompt carries an Ed25519-signed Verified Prompt Envelope proving who authorized it, its scope, and that it wasn't tampered with. Turns an NLP problem into key management.
Impact
A defense-in-depth primitive for agent systems that rejects unauthorized instructions by construction, not by vibes.
PythonEd25519HMAC-SHA256protocol design

Agent Infrastructure · Audit

Division

A hierarchical multi-agent system with durable episodic memory and a full audit trail of autonomous work.

Problem
Multi-agent systems lose context across sessions and leave no record of who did what, when, or why.
Approach
A coordination layer (lead → supervisor → specialized agents) over four-level episodic memory, with an HTTP API that checkpoints every task and outcome.
Impact
Cross-session memory plus a forensic, replayable audit trail — observability and accountability for agents.
PythonHTTP APIepisodic memorybi-temporal records

Autonomous Decision Systems

Midas

An autonomous research-to-decision engine that reads primary-source filings, forms structured theses, and routes every candidate through hard risk gates before anything acts — all core test suites green (session_loop 18/18, risk gates 96/96), graded PASS.

Midas operations dashboard — demo data
Midas operations dashboard — engine health, risk gates, open positions, and learning loop (demo data)
Problem
Automated decision systems optimize for being right and forget to optimize for surviving being wrong — a single bad sizing call ends the game.
Approach
A multi-model pipeline — cheap models for extraction and numeric work, a frontier model for the final conviction call — behind a state machine, a ten-gate risk layer, paper-trade execution, and a live operations dashboard. Every decision is logged and replayable; most candidates are rejected by design. Full coverage: session_loop 18/18, risk gates 96/96, infrastructure bugfix verified (2241 tests in full suite).
Impact
Capital-preservation-first automation: it does nothing unless conviction and risk both clear — 'no decision' is the default, not a failure. Trading pipeline stabilized — PASS grade across all core subsystems.
Pythonmulti-model routingrisk-gate state machineFastAPI ops dashboardpaper-trade execution

Threat Intelligence · Attack Surface

Sentinel Engine

Certificate-Transparency monitoring that surfaces new and anomalous infrastructure from internet-scale CT noise.

Problem
New subdomains, certs, and look-alike infrastructure appear constantly — phishing and shadow assets hide in the volume.
Approach
Continuously ingest public CT logs, extract and normalize domains, correlate against tracked roots, and surface only the new or anomalous as actionable intel.
Impact
Early warning on phishing infrastructure, subdomain sprawl, and shadow assets — attack-surface monitoring that runs unattended.
PythonCertificate Transparencystreaming correlationOSINT

Autonomous Security Research

Black Box

Ephemeral, multi-provider C2 infrastructure — auto-deployed, auto-rotated, burn-detect self-heal, graded PASS (150 tests across burn detection and infrastructure layers).

Problem
C2 infrastructure has a short shelf life: a single scan or takedown burns it. Manual rotation is slow, error-prone, and skips the window before defenders pivot.
Approach
Oblique-Relay (Cloudflare Worker edge redirector) validates and routes implant traffic; Red-Baron Terraform modules deploy disposable backends across AWS, Azure, DO, GCP, or Linode; a 7-signal burn detector (cert validity, DNS health, backend/cert-fingerprint/ping checks) runs every 5 minutes and drives auto-migration when a burn is confirmed — terraform destroy wipes everything on rotation. Infrastructure layer validated via BATS acceptance tests (17/17) and tfenv-pinned Terraform 0.11.15 for Red-Baron compatibility.
Impact
Turns C2 lifecycle management from a daily babysitting chore into a set-and-forget automation: deploy once, the system self-heals on burn, operator gets notified only when something needs a decision. Full critical-path test coverage (133 unit tests + 17 BATS integration tests) means every rotation, every signal, every error path is verified and auditable.
PythonCloudflare WorkersTerraformSliverBATSburn detection

AI Security Evaluation

Assay

A fully-wired AI security evaluator — all four engines (seed/jailbreak, garak probes, defense delta scoring, results dashboard) integrated and tested, 150/150 tests passing, graded PASS.

Problem
All AI evaluation tools score a model's vulnerability, but none measure whether a defense middleware actually helps or by how much — you get a baseline and a prayer.
Approach
Point at any Ollama-hosted model, run all four engines in sequence (deterministic seed probes → NVIDIA garak probes → inline defense re-scoring → delta-driven report), then compare baseline vs. defended scores as a first-class CLI primitive: 'assay delta baseline defended.' Ships a premium HTML report and a multi-run dashboard. All four engine paths are fully tested (150/150 PASS) — the complete evaluate-delta-dashboard pipeline is auditable end to end.
Impact
Turns 'is it secure?' from vibes to a letter grade, and 'does the defense help?' from guesswork to a measured percentage-point lift. The delta wedge makes it the only honest defense-evaluation tool in the OSS AI-security space.
PythonOllamagarakjailbreak evaluationdefense deltadeterministic scoring

Mechanism Design · Protocol Security

Grommet

A boundary investigation of extraction-resistant sequencing — 4-cycle adversarial design proving that content-blind safety mechanisms cannot simultaneously bound attacker extraction and pass legitimate throughput under market stress.

Problem
Every permissionless blockchain suffers MEV/front-running. Proposed defenses claim extraction resistance, but none are systematically tested under adversarial stress. The space has no framework for auditing a mechanism's boundary conditions before deployment.
Approach
Rigorous iterated adversarial mechanism design across four independent cycles: each proposes a hypothesis, simulates it (7 canonical sims, Python stdlib-only, SEED=42 reproducible), subjects it to adversarial review, and falsifies or refines it. Output: the Closure Law (safety predicate survives stress iff exogenous-only), the Cross-Batch Rate-Bound Theorem, and the Closure ⊥ Utility Impossibility (1,874×–28,115× gap) — three formal results, a 13-element dead-end catalog, a 21-question audit checklist, and an honest shippable spec (CoW + Shutter on Gnosis).
Impact
The constraint framework is the product — a general design methodology for any protocol claiming extraction-resistant sequencing. Turns 'is it MEV-resistant?' from marketing copy into a falsifiable audit. The monetary-base extension (MONETARY_CLOSURE_V1.md) applies the Closure Law as the minting rule for an engine-backed currency, where the NO-GO impossibility does not bind.
Python (stdlib-only sims)MEV researchadversarial mechanism designformal impossibility proofprotocol security audit

News

2026-06-17

Midas risk-gate engine ships 96/96 gate tests — autonomous decision pipeline graded PASS

Midas, the autonomous research-to-decision engine, now has full test coverage across its core subsystems: session loop 18/18, risk gates 96/96, full suite 2241 tests progressing. The grade flip to PASS means the trading pipeline is stabilized — all core test suites green, infrastructure bugfix verified (commit 3bce10609). 3 external credential blockers remain and are tracked separately.

2026-06-17

Black Box C2 infrastructure tests grade PASS — 17/17 BATS + P4.3 Terraform compatibility layer validated

The Black Box C2 infrastructure project's deployment pipeline (Red-Baron Terraform modules, tfenv compatibility layer, Terraform 0.11.15 pinning) now has full BATS acceptance test coverage: 17 tests covering argument parsing, credential-presence error paths, and multi-cloud deployment scripts — all passing in <5s. Combined with the burn detection module's 133/133 unit tests, the complete Black Box critical path is now test-verified (150 total tests, all PASS). P4.3 prerequisite work (tfenv + Terraform 0.11.15) is complete, unblocking Red-Baron module validation. 4 credential-blocked tasks remain (Cloudflare + DigitalOcean tokens queued).

2026-06-17

Grommet research cycle concludes — graded PASS with 3 formal impossibility results and a 21-question audit checklist

Grommet, a 4-cycle adversarial mechanism-design investigation into extraction-resistant transaction sequencing (MEV), has concluded with a terminal verdict. Three formal theorems (Closure Law, Cross-Batch Rate-Bound Theorem, Closure ⊥ Utility Impossibility), a 13-entry dead-end catalog, 7 reproducible simulations (SEED=42, stdlib-only), and a 21-question audit checklist for any protocol claiming MEV resistance. The grade flip to PASS means the full lifecycle (hypothesis → simulation → adversarial review → terminal documentation) is complete and auditable. A monetary-base spin-off extends the Closure Law as a minting rule for engine-backed currencies, where the NO-GO impossibility does not bind.

Archive · 5 earlier updates

2026-06-14

Assay ships 150/150 tests — security evaluator graded PASS with delta CLI and multi-run dashboard

Assay, the AI security evaluator that scores jailbreak and injection resistance, now has full test coverage across both eval engines (deterministic seed probes and NVIDIA garak probes), the delta and dashboard CLI interfaces, and the inline-defense integration loop — 150 tests passing in total, all green. The grade flip to PASS means the complete evaluate-delta-dashboard pipeline is covered by automated tests, making Assay the only audited OSS tool for measuring defense lift.

2026-06-13

Black Box burn detection ships 133/133 tests — critical-path destructive automation graded PASS

The Black Box C2 infrastructure project's burn detection module (7 signals, runs every 5 minutes, can trigger terraform destroy on confirmed burn) now has full unit test coverage: 133 tests passing across all signal types, dry-run and live paths, threshold logic, and migration integration. The grade flip to PASS means the entire detection-to-rotation pipeline's critical path is covered by automated tests.

2026-06-10

Seal grows to a three-axis trust layer, with Assay as the evaluator

Seal now defends all three agent-security axes — prompt provenance, injection detection, and signed memory-trust — behind a one-command install and CLI. Assay, the paired evaluator, scores a target across all three and measures the lift the defense actually adds.

2026-06-09

Live operator dashboards for Meridian & Midas

Two of the autonomous systems now ship real operator consoles — Meridian's recon → hunt → verify → report pipeline, and Midas's risk-gated decision engine with a ten-gate safety layer. Captures are above (run on local models; targets and live data redacted).

2026-05-30

Seal: cryptographic provenance for agent prompts

Shipped the Verified Prompt Envelope — Ed25519-signed authorization that lets an agent reject unauthorized instructions by construction, turning prompt-injection defense from guesswork into key management.