A vendor-neutral benchmark harness that makes the “CPU bottleneck in agentic AI” measurable.
It simulates multi-agent loops with configurable mixes of:
- planning / routing overhead (CPU-bound string + JSON work)
- tool-call orchestration (RPC-like waits + serialization)
- memory lookups (KV-style access patterns)
- I/O-like behavior (disk + sqlite mocks)
It produces a compact table with p50/p95 step latency, CPU utilization proxy, and orchestration vs work breakdown.
This is not a model benchmark. It is a control-plane / agent-runtime benchmark.
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
.\.venv\Scripts\python.exe .\bench.pyRun a single scenario:
.\.venv\Scripts\python.exe .\run_bench.py --workload workloads\mixed.json --agents 32 --steps 200See docs/Output.md for real benchmark output tables and fan-out runs.
The benchmark prints a table like:
p50_ms,p95_ms— step latency distributioncpu_util— process CPU time / wall time (rough proxy for CPU saturation)cpu_ms— CPU time spent per stepio_wait_ms— simulated tool/I/O wait per stepjson_kb,parse_ms— serialization + parsing costs- optional: context switches if
psutilis available
workloads/reasoning_heavy.json— few tool calls, more CPU planning + parsingworkloads/action_heavy.json— frequent tool calls and higher orchestration overheadworkloads/mixed.json— typical “agent” blend
You can add your own JSON workload; see workloads/schema.json.
Common patterns:
- High p95 with low cpu_util → coordination / waits dominate (tool calls, I/O, blocking)
- High cpu_util with rising p50/p95 → CPU saturated (planning/parsing dominates)
- Large parse_ms / json_kb → serialization overhead is a real tax; consider batching / binary codecs
- Many agents increases tail → lock contention + event loop overhead (agent fan-out)
See: docs/INTERPRETING_OUTPUT.md
bench.py— one-command benchmark tablerun_bench.py— run a single workload with CLI flagsagentic_bench/— simulator + metricsworkloads/— workload profiles (JSON)tools/— optional helpers.github/workflows/test.yml— CI sanity run
MIT — see LICENSE.