Troubleshooting
Broker reachability
briefing failed against <url>: ECONNREFUSED
The runner couldn’t reach the broker. The most common cause:
ac7 serve # in a separate terminal
If the broker is meant to live elsewhere, check --url and
$AC7_URL. The runner falls back to http://127.0.0.1:8717 when
neither is set.
failed to reach broker at <url>: fetch failed
Network / DNS / TLS issue. Verify with:
curl <url>/healthz
For self-signed certs (LAN deployments) you’ll need to use the
broker’s HTTPS port and either trust the cert in your system
store or run with NODE_TLS_REJECT_UNAUTHORIZED=0 set in the CLI’s
own env (this is separate from the runner’s --unsafe-tls,
which targets the agent child).
--token or AC7_TOKEN is required
No bearer token resolved. Three sources, in order:
--token <secret>flag$AC7_TOKENenv var- Saved entry in
~/.config/ac7/auth.jsonfor the resolved URL
Run ac7 connect to enroll the device, or pass the token
explicitly. If auth.json has an entry but the CLI doesn’t
find it, the URL is probably mismatched — the lookup is
exact-match (no trailing-slash normalization). Check what URL
you connected with versus what URL you’re using now.
—doctor failures
[FAIL] claude binary — failed to locate claude binary
Either Claude Code isn’t installed, or it’s not on $PATH. Set
$CLAUDE_PATH to the absolute path:
export CLAUDE_PATH=/path/to/claude
For codex, set $CODEX_PATH instead. Both runners check the env
var first, then fall back to which.
[FAIL] $TMPDIR writable
The runner writes its CA cert PEM to $TMPDIR at 0o600. Common
causes:
$TMPDIRis set to a path that doesn’t exist- Filesystem mounted read-only
- Permission issue on the directory
Check with:
echo "$TMPDIR"
ls -ld "$TMPDIR"
Typically just unset $TMPDIR (so it falls back to /tmp) or
point it at a writable location.
[FAIL] loopback proxy bindable
The runner couldn’t listen() on 127.0.0.1:0. Sandboxed
environments without loopback networking trigger this — uncommon,
some restrictive CI runners.
If you can’t get loopback bind to work, you can run with
--no-trace to skip the proxy entirely. You lose trace capture
but everything else works.
[FAIL] trace CA + leaf cert generation
A node-forge / Node crypto runtime issue. Usually a Node version mismatch with native modules. Reinstall the CLI:
npm uninstall -g @agentc7/cli @agentc7/ac7
npm install -g @agentc7/ac7
Port conflicts
EADDRINUSE: address already in use 0.0.0.0:8717
Something else is already on port 8717. Either find and kill it, or change ports:
ac7 serve --port 8718
# or
AC7_PORT=8718 ac7 serve
Then point your runners and CLI invocations at the new port:
ac7 claude-code --url http://127.0.0.1:8718
# or
export AC7_URL=http://127.0.0.1:8718
.mcp.json problems
.mcp.json not restored after a crash
The runner restores on every exit path including
uncaughtException / unhandledRejection. If you find an
unrestored copy after a hard crash (kill -9, OOM, system reboot
mid-session), check the backup directory:
ls $TMPDIR/*-mcp-*.json
Each backup is named <pid>-mcp-<nonce>.json. Pick the right
one (timestamp + cwd matching) and copy it back manually. If
multiple exist from different runs, the most recent is usually
the one you want.
.mcp.json has a stale ac7 entry
Same scenario — a previous runner died without cleaning up.
Either replace it with the runner’s restore (see above), or
edit by hand to remove the ac7 server entry. Other entries
should be untouched.
Empty traces
Captured traces panel is empty for a completed objective
Three common causes:
-
The agent used HTTP/2. We don’t parse HPACK yet, so HTTP/2 exchanges produce no
llm_exchangeevents. The Anthropic SDK defaults to HTTP/1.1 for/v1/messages, so claude-code is usually fine — but a forced HTTP/2 binding will go uncaptured. Workaround: use--no-traceand rely on the agent’s own logging. -
The agent bypassed
HTTPS_PROXY. Some wrapper scripts filter env vars; some agents have hard-coded proxy settings that override env. Verify by checking the runner’s session log — successful proxy traffic shows up astrace: llm_exchange queuedlines. -
The agent pinned cert fingerprints. Our MITM leaf won’t match a pinned cert. Claude Code v2 doesn’t currently pin; if it does in the future you’ll see TLS handshake failures in the agent’s stderr. There’s no workaround at the runner layer.
Codex traces show as opaque_http instead of llm_exchange
This is expected today — the typed parser only recognizes
Anthropic’s /v1/messages shape. Adding an OpenAI parser is a
follow-up. Codex traces are still captured; they just render as
opaque records (host, method, URL, status, header / body
previews) instead of typed exchanges.
Activity DB grows unbounded
ac7 prune-traces --older-than 30d
Deletes activity rows older than the cutoff. Set up a daily cron:
0 3 * * * ac7 prune-traces --older-than 30d --yes
If the DB has already grown to many GB and you want to compact
the file after deleting, run a manual VACUUM on the activity
DB while the broker is offline:
sqlite3 ./ac7-activity.db 'VACUUM;'
WAL mode means online deletes don’t shrink the file; vacuum does, but it takes an exclusive lock.
KEK and encrypted fields
KekResolutionError: cannot decrypt totpSecret
The KEK file (<config>.kek) is missing or wrong, OR
$AC7_KEK is set to the wrong key. Two recovery options:
- Restore the original KEK file from backup if you have one. Without it, encrypted fields can’t be recovered.
- Reset the encrypted fields: regenerate the VAPID keypair
(web push will need re-subscribing); re-enroll TOTP for every
member (
ac7 enroll --member <name>for each).
To avoid this in production, inject $AC7_KEK from a real
secrets manager rather than relying on the auto-generated file.
Lost the team config but still have the KEK
You can reconstruct the team from scratch (ac7 setup), but
you’ll need to re-issue every bearer token (the hashes are gone
with the config) and re-enroll every TOTP. The KEK on its own
isn’t useful without the encrypted ciphertexts.
TOTP rate limits
429 Too Many Requests on /session/totp
Per-member: 5 failures / 15 minutes. Global (codeless login, where the server iterates members to find a match): 10 failures / 15 minutes.
Wait the window out. If you’re locked out and need to bypass:
ac7 enroll --member <name>
re-enrolls the member with a fresh secret. The bearer token in the team config is the recovery capability — whoever can read the config can re-enroll.
Token rotation
Suspecting a token leak
ac7 rotate --member <name>
Revokes every active token for the named member and mints a fresh one. Print is one-shot; save the new token immediately.
Every existing process holding the old token starts failing on
its next request. Re-enroll affected devices with ac7 connect.
--reveal-token printed something then everything broke
ac7 roster --reveal-token --member <name> is an alias over
ac7 rotate. It rotates as a side effect. Any process using the
previous token (CI runners, scripts) is now broken — they need
the new value. The CLI print is your only chance; if you missed
it, run rotate again.
ac7 setup refuses to run
setup: a config already exists at ./ac7.json
Setup refuses to overwrite — running it would invalidate every existing token. Two options:
- You really want to reset:
rm ac7.jsonand run setup again. Every existing bearer token is invalidated; every TOTP enrollment is gone; you’ll re-onboard everyone. - You meant to add a member: use
ac7 member createinstead.
Codex-specific
ac7 codex: no codex auth.json found in ~/.codex
Codex isn’t logged in. Run:
codex login
The runner symlinks ~/.codex/auth.json into the ephemeral
CODEX_HOME — it needs the real file to exist.
Refusing to create helper binaries under temporary dir
This warning would fire if the runner’s CODEX_HOME parent dir
were under $TMPDIR. We use ~/.cache/agentc7/codex/ instead
specifically to avoid this — if you’re seeing it anyway, check
$XDG_CACHE_HOME and make sure it doesn’t point at a tmpfs.
Runner / IPC
ac7 mcp-bridge: AC7_RUNNER_SOCKET is required
You ran ac7 mcp-bridge directly. It’s not meant to be invoked
manually — the runner pre-fills it into the agent’s MCP config
(.mcp.json or CODEX_HOME/config.toml) with the right env.
Multiple runners stepping on each other
Each runner binds a unique socket at
/tmp/.ac7-runner-<pid>.sock. They don’t interfere by path.
The single-bridge constraint is per-runner: if two MCP bridges
connect to the same runner, the older one is dropped.
To run multiple agents simultaneously, run multiple runner processes in different terminals.
Where to look for more detail
- Session log:
~/.cache/agentc7/session-<component>-<pid>.log— structured JSON per event.tail -fit during a run. - Server log: stderr of
ac7 serve. Structured JSON; redirect to file in production. --doctor: covers the most common environmental failures before they bite you.
If a problem isn’t covered here, the logs almost always have enough to pin it down — the runner deliberately over-logs at state transitions because the alternative is unhelpful “agent just stopped working” reports.