Security Engineering · Autonomous AI Systems

Matthew Bowman

I build the security, cryptographic provenance, and audit infrastructure that agentic AI systems need to be trusted — backed by 15 years of keeping production alive when things break.

Austin, TX → relocating to NYC · incident response · endpoint & multi-cloud · 15+ years

About

I'm a security and systems engineer with 15+ years across enterprise IT, multi-cloud architecture, and security operations. My day-to-day is keeping production systems healthy and defensible across AWS, GCP, and Azure; my nights are spent building the autonomous security tooling shown below.

Hands-on with EDR-driven incident response (SentinelOne across 100+ environments), cloud security hardening, and high-tempo production incident work. Deep operator history in the gaming and media industry. Former U.S. federal Confidential clearance. I like problems where security, automation, and scale meet.

Focus
Incident response · detection · security automation
Cloud
AWS · GCP · Azure · Kubernetes
Security
SentinelOne EDR · IAM · PKI · log analysis
Code
Python · Bash · PowerShell · Go
Certs
CompTIA Security+ · Network+
Based
Austin, TX → relocating to NYC

Selected Work

Independent security R&D — original systems I designed and built. Concept-level; no client data, targets, or findings.

Autonomous Security Research

Meridian

A containerized pipeline that chains reconnaissance → vulnerability analysis → exploit validation, built to understand how automated adversaries operate at scale.

Meridian operations console — recon → hunt → verify → report
Meridian operations console — recon to hunt to verify to report pipeline with live service status
See the findings pipeline (targets redacted)
Findings / triage queue — target hostnames and counts redacted
Meridian findings pipeline — candidate findings queued for human triage, target hostnames and counts redacted
Problem
Modern attack surfaces are too large to assess by hand, and defenders rarely see how an automated attacker actually prioritizes and moves.
Approach
A multi-stage, WAF-aware pipeline with CVE-first prioritization and breadth-then-depth heuristics that decide when to pivot vs. go deep — with evidence capture and structured reporting built in.
Impact
Turns days of manual recon into continuous, prioritized signal, and doubles as a defender's lens on attacker tooling, tempo, and decision-making.
PythonDocker Compose · 30+ servicesorchestrationrecon / vuln toolingLLM-assisted triage

AI Agent Security · Cryptography

Seal

Cryptographic provenance for AI-agent prompts — replacing brittle "injection detection" with signatures that fail closed.

Problem
Prompt-injection defenses based on reading language are guesswork; an attacker only has to phrase it differently.
Approach
Every prompt carries an Ed25519-signed Verified Prompt Envelope proving who authorized it, its scope, and that it wasn't tampered with. Turns an NLP problem into key management.
Impact
A defense-in-depth primitive for agent systems that rejects unauthorized instructions by construction, not by vibes.
PythonEd25519HMAC-SHA256protocol design

Agent Infrastructure · Audit

Division

A hierarchical multi-agent system with durable episodic memory and a full audit trail of autonomous work.

Problem
Multi-agent systems lose context across sessions and leave no record of who did what, when, or why.
Approach
A coordination layer (lead → supervisor → specialized agents) over four-level episodic memory, with an HTTP API that checkpoints every task and outcome.
Impact
Cross-session memory plus a forensic, replayable audit trail — observability and accountability for agents.
PythonHTTP APIepisodic memorybi-temporal records

Autonomous Decision Systems

Midas

An autonomous research-to-decision engine that reads primary-source filings, forms structured theses, and routes every candidate through hard risk gates before anything acts.

Midas operations dashboard — demo data
Midas operations dashboard — engine health, risk gates, open positions, and learning loop (demo data)
Problem
Automated decision systems optimize for being right and forget to optimize for surviving being wrong — a single bad sizing call ends the game.
Approach
A multi-model pipeline — cheap models for extraction and numeric work, a frontier model for the final conviction call — behind a state machine, a ten-gate risk layer, paper-trade execution, and a live operations dashboard. Every decision is logged and replayable; most candidates are rejected by design.
Impact
Capital-preservation-first automation: it does nothing unless conviction and risk both clear — 'no decision' is the default, not a failure.
Pythonmulti-model routingrisk-gate state machineFastAPI ops dashboardpaper-trade execution

Threat Intelligence · Attack Surface

Sentinel Engine

Certificate-Transparency monitoring that surfaces new and anomalous infrastructure from internet-scale CT noise.

Problem
New subdomains, certs, and look-alike infrastructure appear constantly — phishing and shadow assets hide in the volume.
Approach
Continuously ingest public CT logs, extract and normalize domains, correlate against tracked roots, and surface only the new or anomalous as actionable intel.
Impact
Early warning on phishing infrastructure, subdomain sprawl, and shadow assets — attack-surface monitoring that runs unattended.
PythonCertificate Transparencystreaming correlationOSINT

Autonomous Security Research

Black Box

Ephemeral, multi-provider C2 infrastructure — auto-deployed, auto-rotated, with 7-signal burn detection and autonomous migration on compromise.

Problem
C2 infrastructure has a short shelf life: a single scan or takedown burns it. Manual rotation is slow, error-prone, and skips the window before defenders pivot.
Approach
Oblique-Relay (Cloudflare Worker edge redirector) validates and routes implant traffic; Red-Baron Terraform modules deploy disposable backends across AWS, Azure, DO, GCP, or Linode; a 7-signal burn detector (cert validity, DNS health, backend/cert-fingerprint/ping checks) runs every 5 minutes and drives auto-migration when a burn is confirmed — terraform destroy wipes everything on rotation.
Impact
Turns C2 lifecycle management from a daily babysitting chore into a set-and-forget automation: deploy once, the system self-heals on burn, operator gets notified only when something needs a decision.
PythonCloudflare WorkersTerraformSliverburn detection

AI Security Evaluation

Assay

An AI security evaluator that scores jailbreak and injection resistance, with a first-class with/without-defense delta — the only tool that measures the lift a defense actually adds.

Problem
All AI evaluation tools score a model's vulnerability, but none measure whether a defense middleware actually helps or by how much — you get a baseline and a prayer.
Approach
Point at any Ollama-hosted model, run a multi-engine battery (deterministic seed probes + NVIDIA garak probes), then attach a defense inline and re-score. The delta is a first-class CLI primitive: 'assay delta baseline defended.' Ships a premium HTML report and a multi-run dashboard. Two complete eval-engine iterations have already proven the loop by finding and closing a real blind spot in Seal.
Impact
Turns 'is it secure?' from vibes to a letter grade, and 'does the defense help?' from guesswork to a measured percentage-point lift. The delta wedge makes it the only honest defense-evaluation tool in the OSS AI-security space.
PythonOllamagarakjailbreak evaluationdefense deltadeterministic scoring

News

2026-06-14

Assay ships 150/150 tests — security evaluator graded PASS with delta CLI and multi-run dashboard

Assay, the AI security evaluator that scores jailbreak and injection resistance, now has full test coverage across both eval engines (deterministic seed probes and NVIDIA garak probes), the delta and dashboard CLI interfaces, and the inline-defense integration loop — 150 tests passing in total, all green. The grade flip to PASS means the complete evaluate-delta-dashboard pipeline is covered by automated tests, making Assay the only audited OSS tool for measuring defense lift.

2026-06-13

Black Box burn detection ships 133/133 tests — critical-path destructive automation graded PASS

The Black Box C2 infrastructure project's burn detection module (7 signals, runs every 5 minutes, can trigger terraform destroy on confirmed burn) now has full unit test coverage: 133 tests passing across all signal types, dry-run and live paths, threshold logic, and migration integration. The grade flip to PASS means the entire detection-to-rotation pipeline's critical path is covered by automated tests.

2026-06-10

Seal grows to a three-axis trust layer, with Assay as the evaluator

Seal now defends all three agent-security axes — prompt provenance, injection detection, and signed memory-trust — behind a one-command install and CLI. Assay, the paired evaluator, scores a target across all three and measures the lift the defense actually adds.

Archive · 2 earlier updates

2026-06-09

Live operator dashboards for Meridian & Midas

Two of the autonomous systems now ship real operator consoles — Meridian's recon → hunt → verify → report pipeline, and Midas's risk-gated decision engine with a ten-gate safety layer. Captures are above (run on local models; targets and live data redacted).

2026-05-30

Seal: cryptographic provenance for agent prompts

Shipped the Verified Prompt Envelope — Ed25519-signed authorization that lets an agent reject unauthorized instructions by construction, turning prompt-injection defense from guesswork into key management.