Symphony: OpenAI Ships a Spec, Not a Library — and Tells AI to Build It

Created: 2026-03-16 | Size: 11444 bytes

TL;DR

OpenAI released Symphony, an open-source specification for a long-running daemon that reads issues from Linear, creates isolated workspaces, and runs coding agents autonomously until the work is done. The architecture is deliberately simple: in-memory state, no database, tracker-driven recovery. But the real story is the distribution model - OpenAI ships a detailed spec and tells you to "have your favorite coding agent build it." They are dogfooding the exact workflow Symphony enables.

What Symphony Actually Does

Symphony is a scheduler/runner that turns an issue tracker into an autonomous coding pipeline. It polls Linear for eligible issues, claims them, creates per-issue workspace directories, and launches a coding agent (Codex) as a subprocess. The agent works the issue, and Symphony handles retries, stall detection, reconciliation, and cleanup.

The key constraint: Symphony only reads from the tracker. All mutations (state transitions, comments, PR creation) are performed by the coding agent itself. Symphony is the orchestration layer, not the execution layer.

This is the natural evolution of what we described in The Evolution of Continuous Delivery: Embracing Agentic Workflows - except Symphony makes the agent loop a first-class daemon rather than a CI step.

WORKFLOW.md: Config and Prompt in One File

The most interesting design decision is WORKFLOW.md. A single version-controlled Markdown file that combines YAML frontmatter (runtime configuration) with a Markdown body (agent prompt template).

The YAML frontmatter configures:

Key	What It Controls
`tracker`	Linear team/project, state mappings
`polling`	Interval, batch size
`workspace`	Root directory, lifecycle hooks
`agent`	Max turns, concurrency limits, stall timeout
`codex`	Command, approval mode, sandbox settings

The Markdown body is a Liquid-compatible template rendered per-issue with issue and attempt variables. This means the agent's instructions are version-controlled alongside the code it modifies, so changes to workflow policy go through the same PR review as code changes.

Dynamic reload is mandatory: Symphony watches for changes to WORKFLOW.md and re-applies config and prompt without restart. Invalid reloads keep the last known good config, so a bad push doesn't crash the daemon.

The Orchestrator State Machine

Symphony's orchestrator is deliberately simple: all state lives in memory.

Each poll tick: reconcile active runs, validate config, fetch candidates, sort by priority then age, dispatch eligible issues. On normal worker exit, Symphony schedules a 1-second continuation retry to re-check if the issue still needs work. Abnormal exits use exponential backoff (10s x 2^(attempt-1), capped at 5 minutes).

Concurrency is controlled at two levels: a global cap (max_concurrent_agents, default 10) and per-state limits (max_concurrent_agents_by_state). Issues in Todo state with non-terminal blockers are skipped.

No Database, Tracker-Driven Recovery

There is no persistent state store. After a restart, Symphony:

Cleans up stale workspace directories for terminal issues
Polls the tracker for active issues
Re-dispatches eligible work

This trades durability for simplicity. The issue tracker is the source of truth. If Symphony crashes, no state is lost because the tracker already has it. This is a pragmatic choice: for a system that manages coding agents working on tracked issues, the tracker already provides the persistence layer.

The "Spec as Product" Distribution Model

This is where Symphony gets interesting beyond its architecture. The README says:

Tell your favorite coding agent to build Symphony.

OpenAI ships a detailed specification (SPEC.md) and an Elixir reference implementation, but the spec is the actual product. The implementation is just proof that the spec works. The intended distribution path is: read the spec, feed it to a coding agent, get a working implementation in your language of choice.

This is dogfooding at its most recursive. Symphony is a system for running coding agents on issues. The first issue you give it is: "build Symphony from this spec." If that works, the system has validated itself.

How Symphony Compares to Other Orchestration Approaches

Symphony isn't the only way to run coding agents. Here's how the main approaches stack up:

	Symphony (Daemon)	CI-Triggered (e.g., GitHub Actions)	Agent Frameworks (CrewAI, AutoGen)
Trigger	Polls tracker on interval	Webhook/event	Programmatic API call
State management	In-memory, tracker-driven	Stateless per run	In-memory or custom
Multi-turn on same issue	Native (loop with re-check)	Requires external state	Native
Failure recovery	Exponential backoff + reconciliation	Re-trigger manually or on next event	Framework-dependent
Concurrency control	Built-in (global + per-state caps)	Runner pool limits	Manual
Isolation	Per-issue workspace dirs	Per-run container	None by default
Stall detection	Built-in (kills after timeout)	Job timeout only	None by default

The daemon model's advantage is continuity. CI-triggered agents are fire-and-forget - if the agent partially solves an issue, there is no built-in mechanism to resume where it left off. Symphony's retry loop with tracker state re-checking handles partial progress natively. Agent frameworks like CrewAI give you multi-agent composition but leave scheduling, isolation, and lifecycle management as your problem.

The tradeoff: Symphony is tightly coupled to a specific tracker (Linear) and agent (Codex). The frameworks are more flexible but require you to build everything Symphony gives you for free.

The Multi-Turn Reliability Problem

Symphony's retry loop assumes that given enough turns, the agent will converge on a solution. Recent research challenges this assumption. A Microsoft study testing 15 LLMs across 200,000+ simulated conversations found that all models, including frontier models like GPT-4.1 and Gemini 2.5 Pro, show an average 39% performance drop in multi-turn settings compared to single-turn. The degradation is primarily driven by unreliability: models make premature assumptions in early turns, over-rely on their own incorrect previous responses, and fail to course-correct when given new information.

Symphony's architecture actually has some built-in mitigations for this. Each "turn" in the Codex subprocess can be a fresh prompt with updated context from the tracker, and the 1-second continuation retry re-checks issue state before continuing. But the fundamental problem remains: if the agent takes a wrong approach early in a complex issue, Symphony's retry mechanism will keep spawning new sessions that may repeat the same mistakes unless the workspace is cleaned between attempts.

The after_run and before_run lifecycle hooks provide a possible escape hatch; you could use them to reset workspace state or inject corrective context between retries. But the spec doesn't prescribe this, and most teams would need to discover the pattern through painful experience.

The Linear Lock-In Question

Symphony's integration layer is a Linear API adapter; there is no abstraction over multiple trackers. If your team uses Jira, GitHub Issues, or Shortcut, you cannot use Symphony as-is.

The spec does separate the Integration layer from the Orchestrator, so the architecture supports swapping trackers. What you would need to build:

Issue fetcher: Poll for eligible issues, map states to Symphony's internal model (active vs. terminal vs. non-active)
State resolver: Translate your tracker's workflow states into dispatch eligibility rules
Field mapper: Provide id, identifier, title, state, priority, and blockers in a normalized format

The hard part is not the API calls; it is the state mapping. Linear has a clean, opinionated state model. Jira workflows are arbitrarily complex with custom statuses, transitions, and conditions. GitHub Issues have labels-as-states, which are even less structured. Each tracker would need its own eligibility logic, and the WORKFLOW.md state configuration would need to become significantly more flexible.

This is likely intentional. OpenAI uses Linear internally, Symphony solves their problem, and the "spec as product" model means someone else can build the Jira adapter. But for most enterprise teams, this is the biggest adoption blocker.

What This Means for Engineering Teams

The daemon model beats CI-triggered agents. Running agents as a long-lived service with proper concurrency control, stall detection, and reconciliation is more robust than triggering agent runs from CI webhooks. Symphony handles the failure modes that one-shot CI runs cannot: stalled sessions, partial progress, multi-turn work on the same issue.

WORKFLOW.md is a pattern worth stealing. Even if you don't use Symphony, the idea of a single file that combines runtime config and agent prompt, version-controlled in-repo with dynamic reload, is a good pattern for any agentic workflow. It makes agent behavior auditable and reviewable through the same PR process as code.

The gap between "coding agent" and "autonomous engineering team" is orchestration. Individual coding agents can solve individual tasks. Symphony provides the missing layer: scheduling, isolation, retry logic, and lifecycle management. It turns "supervising AI coders" into "managing a project board."

Trust boundaries remain the hard problem. Symphony explicitly punts on security: "each implementation defines its own trust boundary." Workspace isolation and path validation are baseline controls, but the spec acknowledges that hook scripts are fully trusted and recommends OS/container isolation as a hardening step. For production use, you would need to add sandboxing beyond what the spec provides.

References

Symphony GitHub Repository - Original source
Symphony SPEC.md - Full specification
Codex App-Server Protocol - Agent communication protocol
LLMs Get Lost In Multi-Turn Conversation - Laban et al. (2025), Microsoft Research / Salesforce
The Evolution of Continuous Delivery: Embracing Agentic Workflows - Daita blog