Get Shit Done: The Context Engineering Layer That Makes Claude Code Actually Reliable

Created: 2026-02-26 | Size: 14117 bytes

TL;DR

Claude Code degrades as your context window fills up, a problem called context rot. Get Shit Done (GSD) fixes this by splitting projects into phases, spawning fresh subagent contexts per task (keeping the main window at 30-40%), and committing atomically. Five slash commands give you a full loop: define → discuss → plan → execute → verify. If you're on Claude Code and tired of quality tanking mid-session, install it with npx get-shit-done-cc@latest and start building.

If you've used Claude Code for anything beyond a quick script, you've hit the wall. You start a session, things are great, Claude is nailing it. Two hours later, the context window is stuffed with stale code, old errors, and half-forgotten instructions. Quality tanks. Claude starts hallucinating file paths. You start a new session and lose all the momentum.

This is context rot - and it's the single biggest problem with AI-assisted development right now.

Get Shit Done (GSD) fixes it. And it does it in the most obvious way possible: by not letting the context window fill up in the first place.

What GSD Actually Is

GSD is a set of Claude Code commands (slash commands like /gsd:new-project, /gsd:plan-phase, /gsd:execute-phase) that turn your vibe-coding session into a structured, spec-driven build pipeline. It installs in one line:

npx get-shit-done-cc@latest

It's not a framework. It's not a SaaS. It's a collection of prompts and orchestration logic that sits inside your .claude/ directory and makes Claude Code behave like a disciplined engineering team instead of a caffeinated intern.

20k+ GitHub stars in two months. Created by a solo developer who goes by TÂCHES. MIT licensed. Works with Claude Code, OpenCode, Gemini CLI, and Codex.

The Core Problem It Solves

Here's what normally happens when you build something with Claude Code:

You describe your project
Claude starts coding
Context fills up with file reads, errors, retries, old plans
Quality degrades - Claude starts ignoring instructions, repeating mistakes, losing track of what's done
You restart, lose context, repeat

GSD breaks this cycle by splitting your project into phases, each phase into plans, and executing each plan in a fresh subagent context. The main context window stays at 30-40%. Each executor gets a clean 200k tokens purely for implementation.

The Workflow

The whole system is five commands in a loop:

1. `/gsd:new-project`

You describe your idea. GSD asks questions until it understands what you're building - goals, constraints, tech preferences, edge cases. Then it spawns parallel research agents to investigate the domain, extracts requirements (v1 vs v2 vs out of scope), and creates a phased roadmap.

Output: PROJECT.md, REQUIREMENTS.md, ROADMAP.md, STATE.md

2. `/gsd:discuss-phase 1`

This is the underrated step. Your roadmap has one-line descriptions per phase. That's not enough for Claude to build what you actually want. This command identifies gray areas and asks you about them - layout preferences, API response formats, error handling behavior, empty states.

The output feeds directly into research and planning. The more you put in here, the closer the result matches your vision.

3. `/gsd:plan-phase 1`

Research agents investigate how to implement the phase. A planner creates 2-3 atomic task plans with XML structure. A checker verifies the plans against requirements. Loop until they pass.

Each plan is small enough to execute in a fresh context window. This is where the magic happens - instead of one massive prompt that tries to do everything, you get surgical task definitions:

<task type="auto">
  <name>Create login endpoint</name>
  <files>src/app/api/auth/login/route.ts</files>
  <action>
    Use jose for JWT (not jsonwebtoken - CommonJS issues).
    Validate credentials against users table.
    Return httpOnly cookie on success.
  </action>
  <verify>curl -X POST localhost:3000/api/auth/login returns 200 + Set-Cookie</verify>
</task>

4. `/gsd:execute-phase 1`

Plans are grouped into dependency waves. Independent plans run in parallel, dependent plans wait. Each executor gets a fresh context, implements its task, and commits atomically.

You walk away, come back to completed work with clean git history. Each task gets its own commit. You can git bisect to the exact failing task.

5. `/gsd:verify-work 1`

The system extracts testable deliverables and walks you through them one at a time. "Can you log in with email?" If something's broken, it spawns debug agents to find root causes and creates fix plans ready for re-execution.

Then you loop: discuss → plan → execute → verify for the next phase.

Why It Actually Works

Three things make GSD effective where other tools fail:

Fresh context per task. This is the killer feature. Each executor subagent starts with a clean 200k context loaded only with the project spec, the specific plan, and relevant file contents. No accumulated garbage. No stale error messages. No "let me be more concise" degradation.

XML-structured plans. Claude performs significantly better with structured XML prompts than with freeform instructions. Each plan has explicit file targets, specific actions, and built-in verification steps. No ambiguity, no guessing.

The discuss step. Most spec-driven tools go straight from requirements to implementation. GSD adds a conversation layer where you shape the implementation before any code is written. This eliminates the "that's not what I meant" loop that eats entire sessions.

Configuration That Matters

GSD has model profiles that control quality vs cost:

Profile	Planning	Execution	Verification
`quality`	Opus	Opus	Sonnet
`balanced`	Opus	Sonnet	Sonnet
`budget`	Sonnet	Sonnet	Haiku

Switch with /gsd:set-profile budget. You can also toggle research agents, plan checkers, and verifiers on/off depending on how thorough you need the session to be.

There's also a /gsd:quick mode for ad-hoc tasks - bug fixes, small features, config changes - that skips the full planning ceremony but keeps atomic commits and state tracking.

What I'd Change

It's not perfect. A few things to watch out for:

The --dangerously-skip-permissions recommendation - GSD works best when Claude can run commands freely, and they recommend skipping the permission system entirely. I get why, but it makes me twitch. The granular permissions alternative they provide is better practice.
Token cost - All those research agents, plan checkers, and verifiers add up. The quality profile with Opus everywhere will burn through your API budget fast. Start with balanced or budget.
Overkill for small projects - If you're building a single-file script or a quick prototype, the full ceremony (discuss → plan → execute → verify) is too much. Use /gsd:quick or just raw Claude Code.

How GSD Compares to Spec-Kit

GSD isn't the only game in town. Spec-Kit - built by GitHub, 72k stars - takes a similar spec-driven approach but with a very different philosophy.

	GSD	Spec-Kit
Creator	TÂCHES (solo dev)	GitHub (influenced by John Lam)
Stars	20k+	72k+
Install	`npx get-shit-done-cc@latest`	`uv tool install specify-cli` (Python/uv)
Philosophy	"No enterprise roleplay. Just build."	"Specifications become executable"
Agent support	Claude Code, OpenCode, Gemini, Codex	20+ agents (Claude, Copilot, Cursor, Gemini, Windsurf, Amp, etc.)

The workflow difference

Spec-Kit follows a traditional spec-first discipline: write a constitution (project principles), then a functional spec (tech-agnostic - the what, not the how), clarify gaps with structured Q&A, then bring in the tech stack for planning, break into tasks, implement.

GSD compresses this. One command (/gsd:new-project) does questions + research + requirements + roadmap. Then you loop through phases: discuss → plan → execute → verify. Less ceremony, faster feedback.

Where GSD wins

Context freshness. This is the big one. GSD spawns fresh subagent contexts for each task, keeping the main window at 30-40%. Spec-Kit runs everything in one session - same context rot problem as raw Claude Code.

Parallel execution. GSD groups tasks into dependency waves and runs independent plans simultaneously. Spec-Kit marks parallelizable tasks with [P] but execution is still sequential within one agent.

Built-in verification. /gsd:verify-work walks you through testable deliverables and auto-spawns debug agents when something fails. Spec-Kit leaves testing to you.

Session continuity. /gsd:pause-work and /gsd:resume-work for mid-session handoffs. Spec-Kit uses branch-per-feature via shell scripts.

Where Spec-Kit wins

Agent breadth. Spec-Kit supports 20+ agents - Cursor, Copilot, Windsurf, Amp, Roo Code, you name it. If you're not on Claude Code, Spec-Kit is the better choice.

Spec rigor. The separation between functional spec (tech-agnostic) and implementation plan (tech-specific) forces you to think about what before how. GSD's "discuss" step blends the two. Spec-Kit's constitution concept - project-wide governing principles - is also something GSD lacks.

Cross-artifact validation. /speckit.analyze checks consistency across your spec, plan, and tasks. /speckit.checklist generates quality checklists that act like "unit tests for English." GSD verifies plans against requirements but doesn't have the same depth of spec-level analysis.

Clarification workflow. /speckit.clarify does structured, coverage-based questioning to find gaps in your spec before planning. GSD's /gsd:discuss-phase is similar but happens per-phase rather than upfront.

When to use which

Use GSD if you're on Claude Code and care most about execution quality - fresh contexts, parallel builds, atomic commits, automated verification. It's the better choice for solo developers and small teams who want to move fast without ceremony.

Use Spec-Kit if you work with multiple AI agents, want rigorous spec-first discipline, or need enterprise-style governance. It's also the safer bet if you're not sure which AI coding tool you'll be using six months from now.

Both are MIT licensed. Both are free. Both are better than raw vibing.

The Bottom Line

GSD doesn't make Claude Code smarter. It makes Claude Code consistent. By keeping context fresh, structuring tasks precisely, and verifying results automatically, it turns an unreliable vibes-based workflow into something you can actually trust for production code.

The insight is simple: the problem was never Claude's intelligence - it was context management. GSD solves context management. Everything else follows.

npx get-shit-done-cc@latest

GitHub · User Guide · Discord

References

GSD Repository - github.com/gsd-build/get-shit-done - MIT licensed, 20k+ stars as of Feb 2026
GSD User Guide - docs/USER-GUIDE.md
- Full configuration reference, model profiles, workflow toggles
GSD npm package - npmjs.com/package/get-shit-done-cc
- One-line installer for Claude Code, OpenCode, Gemini CLI, Codex
Claude Code Documentation - code.claude.com/docs
- Official docs for the CLI tool GSD extends
Claude Code: Best Practices for Agentic Coding - code.claude.com/docs - Anthropic's guide on subagent patterns, context management, and prompt engineering for Claude Code
Context Engineering for AI Agents - simonwillison.net/2025/Jun/27/context-engineering
- Simon Willison on why "context engineering" is replacing "prompt engineering" as the key discipline
Lost in the Middle: How Language Models Use Long Contexts - arxiv.org/abs/2307.03172 - Nelson Liu et al. (2023). The research paper that demonstrated LLMs degrade when relevant info is in the middle of long contexts, the academic basis for "context rot"
Spec-Kit - github.com/github/spec-kit - GitHub's spec-driven development toolkit, 72k+ stars, supports 20+ AI agents
BMAD Method - github.com/bmad-code-org/BMAD-METHOD
- Another spec-driven approach with more enterprise ceremony, also referenced in GSD's motivation
Conventional Commits - conventionalcommits.org - The commit message spec GSD follows for its atomic commits