Back to all papers

Beyond Session JSON: Durable State and Runtime Design for Long-Running AI Coding Agents

Chaitanya Mishra

Abstract

Long-running AI coding agents are better modeled as stateful workflows than as chat sessions. They plan, invoke tools, wait for humans and external services, retry after failure, and resume across hours or days. Once an agent behaves this way, persistence stops being a serialization problem and becomes a systems problem.

The central claim of this paper is that the key design object is the recoverable state envelope: the minimum durable state required to resume execution safely without duplicating externally visible effects. From that perspective, plain JSON remains useful for interchange, export, and bounded snapshots, but it is usually too weak to serve as the primary system of record once concurrency, crash recovery, schema evolution, queryable history, and audit become first-class concerns.

The paper develops a four-plane model of agent state (conversational, execution, memory, and recovery) with audit treated as a cross-cutting concern, and uses that model to compare plain JSON files, SQLite, and PostgreSQL. SQLite is often the best default for local or single-node agents because it provides transactions, write-ahead logging, constraints, and queryability without the operational burden of a client/server database. PostgreSQL becomes the stronger choice when state is shared, concurrent, auditable, or enterprise-managed.

The paper then compares Elixir, Erlang, Ruby, Python, TypeScript/Node.js, Go, Rust, and Java as control-plane runtimes. BEAM-based systems have real operational advantages because supervision, isolation, and failure handling are runtime primitives, but those advantages do not eliminate the need for sound durable-state design.

Keywords

AI coding agents durable execution SQLite PostgreSQL BEAM Elixir Erlang JSON long-running workflows agent infrastructure