CRDTs Are Not Enough When Your Coworker Is an AI Agent
by
Kadhir Mani
(6.5 minutes)
<section data-section-id='5969f96e-e470-4a5c-8f27-ddc8d6ea1de4'><h2 id='5969f96e-e470-4a5c-8f27-ddc8d6ea1de4'>Problem</h2><p><b><strong style="white-space: pre-wrap;">CRDTs converge; they do not coordinate.</strong></b><span style="white-space: pre-wrap;"> That distinction becomes critical the moment an AI agent joins the editor.</span></p><p><br></p><p><span style="white-space: pre-wrap;">We hit this while building a multiplayer editor where humans and agents could both modify the same document. A CRDT guarantees replicas reach the same state, but it says nothing about whether the agent should write right now, when a human is actively in the same section, or where its output lands given stale context.</span></p><p><br></p><p><span style="white-space: pre-wrap;">Human-only multiplayer assumes every keystroke reflects fresh, local intent. An agent breaks that assumption: it operates on a snapshot taken before inference begins, may rewrite large spans at once, and has no inherent presence awareness.</span></p><p><br></p></section> <section data-section-id='d4169e3e-a334-4db4-9e84-735750da7315'><h2 id='d4169e3e-a334-4db4-9e84-735750da7315'>Why the Naive Solution Fails</h2><p><span style="white-space: pre-wrap;">The naive solution is to treat the agent as just another peer on the CRDT graph, a faster user. Let the CRDT merge concurrent edits and converge on a consistent state. Problem solved.</span></p><p><br></p><p><span style="white-space: pre-wrap;">But it ended up not being that simple. CRDTs guarantee convergence, not that an edit should have been made in the first place. In practice, the result was agents repeatedly overwriting sections that humans were actively editing. With several people in the same document, each with their own agent running concurrently, the collisions compounded.</span></p><p><br></p><p><span style="white-space: pre-wrap;">Agents exposed a gap in three specific ways:</span></p><ul><li value="1"><b><strong style="white-space: pre-wrap;">Stale context.</strong></b><span style="white-space: pre-wrap;"> The agent reads at T₀, infers, then writes at T₁. Human edits made between those moments are invisible to its prompt. The CRDT merges output that was reasoned against a state that no longer exists.</span></li><li value="2"><b><strong style="white-space: pre-wrap;">Large-span rewrites.</strong></b><span style="white-space: pre-wrap;"> Humans edit words; agents rewrite paragraphs. A wider edit span raises collision probability and can silently invalidate comment anchors and structured blocks.</span></li><li value="3"><b><strong style="white-space: pre-wrap;">No presence awareness.</strong></b><span style="white-space: pre-wrap;"> Human collaborators spot a cursor in a section and adjust their own editing behavior accordingly. Agents have no equivalent signal. Without it, an agent writes into an actively-edited section, and the CRDT faithfully merges the result.</span></li></ul><p><br></p></section> <section data-section-id='fd135455-2113-40b4-9040-4f52fac64bdd'><h2 id='fd135455-2113-40b4-9040-4f52fac64bdd'>Practical Solution Shape</h2><p><span style="white-space: pre-wrap;">We went down a different route. We added a </span><b><strong style="white-space: pre-wrap;">coordination layer</strong></b><span style="white-space: pre-wrap;"> that gates all writes to the shared state and persistence for non-human participants.</span></p><p><br></p><mermaid data-height="400" data-card="true"> flowchart TD H[Human Editor] --&gt; C["Coordination Layer(Presence, Locks, Approvals)"] A[AI Agent] --&gt; C R[Reviewer] --&gt; C C --&gt; S[Collaborative Document State] S --&gt; P[Durable Persistence] </mermaid><p><br></p></section> <section data-section-id='cec9faae-06bf-417f-9c1a-6efcace8d73c'><p><span style="white-space: pre-wrap;">The coordination layer sits above CRDT convergence and answers three questions before the agent writes:</span></p><ul><li value="1"><span style="white-space: pre-wrap;">Is the target section free?</span></li><li value="2"><span style="white-space: pre-wrap;">Is a human actively focused there?</span></li><li value="3"><span style="white-space: pre-wrap;">Does the agent have a current snapshot of document state?</span></li></ul><p><br></p><p><span style="white-space: pre-wrap;">After ensuring the agent has been cleared, it takes a </span><b><strong style="white-space: pre-wrap;">narrow, expiring section-level lease</strong></b><span style="white-space: pre-wrap;">, not a whole-document lock, acquired against a stable snapshot. In other words, once the agent acquires a specific section lock, a human can no longer edit that section to avoid unpredictable collisions. A human can, however, override and kick an agent out at any time.</span></p><p><br></p><p><span style="white-space: pre-wrap;">This system allows many humans and agents to edit the shared canvas with generally fewer conflicts. Edits can be parallelized better.</span></p><p><br></p><p><span style="white-space: pre-wrap;">Four invariants keep the system safe:</span></p><ul><li value="1"><b><strong style="white-space: pre-wrap;">No stale writes</strong></b><span style="white-space: pre-wrap;"> — version is checked at lease time; a mismatch forces a re-read.</span></li><li value="2"><b><strong style="white-space: pre-wrap;">No unbounded locks</strong></b><span style="white-space: pre-wrap;"> — leases carry a TTL and auto-expire on crash or stall.</span></li><li value="3"><b><strong style="white-space: pre-wrap;">Approval on conflict</strong></b><span style="white-space: pre-wrap;"> — if a human is actively in the target section, the agent’s write is gated on an explicit approval signal before the lease is granted.</span></li><li value="4"><b><strong style="white-space: pre-wrap;">No live-state/storage coupling</strong></b><span style="white-space: pre-wrap;"> — reconnect reconciles against server state, not a local event replay.</span></li></ul><p><br></p><p><span style="white-space: pre-wrap;">The trade-off is added latency and protocol surface area in exchange for safety and a more predictable user experience. TTLs must be calibrated to inference latency. Aka, too short and agents false-expire, but too long and a stalled agent blocks the section. Large rewrites that span anchored comments need a coherence pass after publishing; lease scope alone does not protect anchor references.</span></p><p><br></p><mermaid data-height="380" data-card="true"> flowchart LR N1(Request Edit) --&gt; N2(Check Presence/Focus) N2 --&gt; N3{Human Active?} N3 --&gt;|Yes| N4(Request Approval) N4 --&gt;|Approved| N5(Acquire Expiring Lease) N3 --&gt;|No| N5 N5 --&gt; N6(Edit Stable Snapshot) N6 --&gt; N7(Publish Collaborative Event) N7 --&gt; N8(Release Lock) classDef p fill:#3E63DD,stroke:#263c85,color:#fff classDef d fill:#F59E0B,stroke:#b47408,color:#fff class N1,N2,N4,N5,N6,N7,N8 p class N3 d </mermaid></section> <section data-section-id='fed7b334-5bc1-4088-8ba9-8adcfb6bfbe2'><h2 id='fed7b334-5bc1-4088-8ba9-8adcfb6bfbe2'>A Small Protocol</h2><p><span style="white-space: pre-wrap;">A minimal protocol for agent edits can be sketched without reference to any specific framework. The core steps are always the same:</span></p><p><br></p><pre spellcheck="false" data-language="javascript" data-highlight-language="javascript"><span style="white-space: pre-wrap;">def </span><span style="white-space: pre-wrap;">agent_edit</span><span style="white-space: pre-wrap;">(</span><span style="white-space: pre-wrap;">section_id</span><span style="white-space: pre-wrap;">,</span><span style="white-space: pre-wrap;"> base_version</span><span style="white-space: pre-wrap;">)</span><span style="white-space: pre-wrap;">:</span><br><span style="white-space: pre-wrap;"> state </span><span style="white-space: pre-wrap;">=</span><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">read_presence</span><span style="white-space: pre-wrap;">(</span><span style="white-space: pre-wrap;">section_id</span><span style="white-space: pre-wrap;">)</span><br><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">if</span><span style="white-space: pre-wrap;"> state</span><span style="white-space: pre-wrap;">.</span><span style="white-space: pre-wrap;">human_active</span><span style="white-space: pre-wrap;">:</span><br><span style="white-space: pre-wrap;"> approval </span><span style="white-space: pre-wrap;">=</span><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">request_approval</span><span style="white-space: pre-wrap;">(</span><span style="white-space: pre-wrap;">section_id</span><span style="white-space: pre-wrap;">,</span><span style="white-space: pre-wrap;"> ttl</span><span style="white-space: pre-wrap;">=</span><span style="white-space: pre-wrap;">45s</span><span style="white-space: pre-wrap;">)</span><br><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">if</span><span style="white-space: pre-wrap;"> not approval</span><span style="white-space: pre-wrap;">.</span><span style="white-space: pre-wrap;">granted</span><span style="white-space: pre-wrap;">:</span><br><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">return</span><span style="white-space: pre-wrap;"> conflict</span><br><br><span style="white-space: pre-wrap;"> lease </span><span style="white-space: pre-wrap;">=</span><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">acquire_lease</span><span style="white-space: pre-wrap;">(</span><span style="white-space: pre-wrap;">section_id</span><span style="white-space: pre-wrap;">,</span><span style="white-space: pre-wrap;"> base_version</span><span style="white-space: pre-wrap;">,</span><span style="white-space: pre-wrap;"> ttl</span><span style="white-space: pre-wrap;">=</span><span style="white-space: pre-wrap;">90s</span><span style="white-space: pre-wrap;">)</span><br><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">if</span><span style="white-space: pre-wrap;"> not lease</span><span style="white-space: pre-wrap;">.</span><span style="white-space: pre-wrap;">ok</span><span style="white-space: pre-wrap;">:</span><br><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">return</span><span style="white-space: pre-wrap;"> retry_with_fresh_snapshot</span><br><br><span style="white-space: pre-wrap;"> snapshot </span><span style="white-space: pre-wrap;">=</span><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">read_section</span><span style="white-space: pre-wrap;">(</span><span style="white-space: pre-wrap;">section_id</span><span style="white-space: pre-wrap;">)</span><br><span style="white-space: pre-wrap;"> update </span><span style="white-space: pre-wrap;">=</span><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">generate_edit</span><span style="white-space: pre-wrap;">(</span><span style="white-space: pre-wrap;">snapshot</span><span style="white-space: pre-wrap;">)</span><br><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">publish_collaborative_update</span><span style="white-space: pre-wrap;">(</span><span style="white-space: pre-wrap;">section_id</span><span style="white-space: pre-wrap;">,</span><span style="white-space: pre-wrap;"> update</span><span style="white-space: pre-wrap;">)</span><br><span style="white-space: pre-wrap;"> </span><span style="white-space: pre-wrap;">release_lease</span><span style="white-space: pre-wrap;">(</span><span style="white-space: pre-wrap;">section_id</span><span style="white-space: pre-wrap;">)</span></pre><p><br></p><p><span style="white-space: pre-wrap;">The critical distinction is what this protocol is </span><i><em style="white-space: pre-wrap;">not</em></i><span style="white-space: pre-wrap;"> doing. The CRDT handles whether two updates can be merged into a consistent state, it resolves the mathematical question of convergence. This protocol handles a prior question: should the agent be allowed to produce an update at all, given what is currently happening in the document. Both layers are necessary. Removing the coordination protocol and relying on CRDT convergence alone means the agent will write, the CRDT will merge, and the result will be technically consistent but semantically wrong.</span></p><p><br></p><p><span style="white-space: pre-wrap;">The lease TTL is the main tuning surface. Inference latency varies widely across model calls; a TTL that works for a short edit may frequently expire on a longer rewrite. The safer approach is a short initial TTL with a single extension path, rather than a long default that holds the section unnecessarily.</span></p><p><br></p><p><i><em style="white-space: pre-wrap;">Quick callout about line 13 above.</em></i></p><p><span style="white-space: pre-wrap;">The agent edits that snapshot and then publishes the changes as ordinary collaborative events. The trickiest part of this publish step was reconciling the agent's edits as CRDT events as a purely backend operation, especially given all the complex nodes we want to support.</span></p><p><br></p><p><span style="white-space: pre-wrap;">It was trivial enough to get text with some light formatting working, but it became a real headache when the same system needed to support mermaid diagrams, XYZ flow charts, feedback nodes, and so much more. The system and tooling we built to make this work ended up becoming quite complex over time. We'll make a separate post on that topic later.</span></p><p><br></p></section> <section data-section-id='fba2ac2e-c8fb-436f-a771-04aa37726ecd'><h2 id='fba2ac2e-c8fb-436f-a771-04aa37726ecd'>Edge Cases</h2><p><span style="white-space: pre-wrap;">Several failure modes require explicit handling beyond the happy path.</span></p><p><br></p><ul><li value="1"><b><strong style="white-space: pre-wrap;">Stale read, then human edit, then agent write.</strong></b><span style="white-space: pre-wrap;"> The agent reads at T₀, a human edits at T₁, and the agent writes at T₂ with a prompt grounded in T₀. The version check at lease acquisition catches this if the human’s edit incremented the section version. If it did not (e.g., the edit was to a different field), the race may still produce incoherent output. Version granularity should match the edit granularity.</span></li><li value="2"><b><strong style="white-space: pre-wrap;">Orphaned lease after model timeout or crash.</strong></b><span style="white-space: pre-wrap;"> If the agent process dies mid-inference, the lease must self-expire via TTL. Without automatic expiry, a crashed agent holds the section indefinitely. Monitor lease age and alert on leases approaching the TTL ceiling.</span></li><li value="3"><b><strong style="white-space: pre-wrap;">Reconnect storms after offline editing.</strong></b><span style="white-space: pre-wrap;"> When a client reconnects after an extended offline period, replaying a local event queue against a diverged server state can produce a burst of conflicting updates. Reconcile against the server snapshot at reconnect time rather than replaying local events in order.</span></li><li value="4"><b><strong style="white-space: pre-wrap;">Multiple tabs for the same user.</strong></b><span style="white-space: pre-wrap;"> Presence tracked per-session rather than per-user can report the same human as active in multiple sections simultaneously. The coordination layer must deduplicate presence by user identity, not by connection, or approval requests will fire when the “other” editor is the same person in a different tab.</span></li></ul><p><br></p></section> <section data-section-id='961f552d-cea7-419c-808a-284fba9ad566'><h2 id='961f552d-cea7-419c-808a-284fba9ad566'>Lessons Learned</h2><ul><li value="1"><b><strong style="white-space: pre-wrap;">Convergence ≠ coordination.</strong></b><span style="white-space: pre-wrap;"> CRDTs guarantee merge; they don’t decide whether an agent should write. That layer must be built separately.</span></li><li value="2"><b><strong style="white-space: pre-wrap;">Test stale-context and reconnect paths first.</strong></b><span style="white-space: pre-wrap;"> They are the most likely failure modes and the hardest to retrofit.</span></li><li value="3"><b><strong style="white-space: pre-wrap;">Prefer expiring section-level soft locks.</strong></b><span style="white-space: pre-wrap;"> Narrow scope, automatic expiry, and an approval fallback — not whole-document hard locks.</span></li><li value="4"><b><strong style="white-space: pre-wrap;">Make presence and transport state explicit.</strong></b><span style="white-space: pre-wrap;"> Agents must know when they are offline or behind; silence is not a safe assumption.</span></li><li value="5"><b><strong style="white-space: pre-wrap;">Agent writes are multiplayer events, not background jobs.</strong></b><span style="white-space: pre-wrap;"> Same presence broadcast, same lock protocol, same conflict surface.</span></li></ul><p><br></p><p><span style="white-space: pre-wrap;">The hard part of AI document editing is not making the model write text. It is making the model behave inside a multiplayer system where humans are already thinking, typing, reviewing, disconnecting, reconnecting, and changing their minds. CRDTs are still necessary. They are just not enough.</span></p><p><br></p></section>