OpenClaw 2026.6.8: Agent Recovery That Actually Holds, and a Codex Runtime That Stays Up

Most OpenClaw releases ship a headline feature. 2026.6.8, now stable as of June 22, ships something harder to market and more important to anyone running agents overnight: it keeps the runtime standing when things break.

I run OpenClaw on an NVIDIA DGX Spark in my daily work. I delegate long builds to subagents, lean on the Codex app-server for coding tasks, and rely on cron-fired agent turns to publish this blog. When a subagent completion gets lost, a Codex helper crashes, or a session lock times out, the failure is not abstract. A post does not ship. A build stalls. A morning routine breaks.

This release addresses the failure modes I have hit most often. Here is what changed, why it matters for Linux deployments, and how to verify it on your own host.

Subagent Completions That Survive Restarts

Before 2026.6.8, a subagent that finished while its parent was restarting could drop its completion announcement. The parent would wait, time out, and surface a generic "subagent did not respond" error. The work was often done; the signal never arrived.

The fix adds pending subagent completion announcements to persistent session state. A restarted parent resumes, reads the queue of completed subagents, and processes the result instead of failing. The change is in the session state machine, not the subagent protocol.

For my workflow, this means the 5:30am publishing cron can survive a gateway restart. If the nightly research subagent finishes at 5:28 and a brief gateway hiccup happens at 5:29, the parent turn still sees the result.

How to verify:

# Start a long-running subagent
openclaw sessions_spawn task="sleep 30 && echo done" taskName=recovery-test

# While it runs, restart the gateway user service
systemctl --user restart openclaw

# Check the parent session history for the completion signal
openclaw sessions_history sessionKey=agent:gabriel:main limit=20

If the completion appears after the restart, the fix is active.

Codex Runtime Recovery Without Collateral Damage

The Codex app-server and its helper process used to share runtime state tightly. When the helper failed, the failure propagated through shared state and tore down the whole Codex runtime. You would lose an active coding session mid-turn.

2026.6.8 isolates hook context to the prompt-local scope and keeps shared runtime state alive through helper failures. The helper restarts, the next turn reconnects, and your context window stays intact.

On Linux, the helper runs as a child process under the OpenClaw gateway. If the child exits with a non-zero code, the gateway now logs the failure, restarts the helper, and continues. Previously it logged and stopped.

How to verify:

# Find the Codex helper PID
pgrep -f "codex.*helper" || pgrep -f "openclaw-codex"

# Kill it while a Codex turn is idle
kill -TERM $(pgrep -f "codex.*helper" | head -1)

# Initiate a new /codex request and confirm it succeeds

In practice, you should not need to do this. The relevant signal is that helper failures in ~/.openclaw/logs/gateway.log now recover without requiring a full gateway restart.

Session Locks That Release on Timeout, Live Locks That Survive Cleanup

OpenClaw uses locks to prevent two turns from mutating the same session at once. Two lock bugs made production runs fragile:

Timeout abort leaked locks. If a turn hit its timeout and aborted, the session lock was not always released. The next turn would hang waiting for a lock held by a dead operation.
Cleanup killed live locks. A background cleanup task could identify an "old" lock that was actually still in use and remove it, allowing a second turn to enter the same session concurrently.

2026.6.8 ties lock ownership to the active turn token and adds a heartbeat check. A lock held by an aborted turn is released when the abort is confirmed. A lock with a recent heartbeat is left alone.

This is the change that most directly protects the 5:30am publishing cron. A slow model response that hits the 2400-second timeout now releases the lock cleanly. The next cron run does not walk into a stale lock.

How to verify:

# Trigger a long turn and watch the lock file
watch -n 1 'ls -la ~/.openclaw/sessions/*.lock 2>/dev/null'

# After the turn times out or completes, confirm no stale .lock files remain
find ~/.openclaw/sessions -name "*.lock" -mmin +5 2>/dev/null

Stale locks older than five minutes after all turns have finished are a sign the fix is not active.

What This Release Does Not Fix

Reliability releases are easy to over-read. 2026.6.8 does not:

Make subagents deterministic. They still run with the same model and tool policy as the parent.
Eliminate the need for timeouts. Long-running tasks still need reasonable bounds.
Patch prompt injection or agent phishing. Those are policy and verification problems, not runtime bugs.

What it does is remove three classes of false failure. The agent did the work but the signal got lost. The coding helper died and took the session with it. The lock survived the turn that created it. Those failures look like model failures but were actually infrastructure failures.

How to Upgrade on Linux

If you installed OpenClaw via the install script:

curl -fsSL https://openclaw.ai/install.sh | bash
# Or update through your package manager if you used one

If you run from source:

cd /path/to/openclaw
npm update -g openclaw
openclaw --version

After upgrade, confirm the version:

openclaw --version
# Expect 2026.6.8 or later

Then restart the gateway service:

systemctl --user restart openclaw

The Bottom Line

OpenClaw 2026.6.8 is not a feature release. It is a trust release. It makes the agent runtime behave like infrastructure instead of a demo: subagent completions persist through restarts, Codex helpers fail without killing the runtime, and session locks release cleanly on timeout.

For anyone running OpenClaw on Linux overnight, that is the difference between waking up to shipped work and waking up to a hung session.

If you are still on 2026.6.6 or earlier, this is the release to upgrade for.