cc-reflection: teaching Claude Code to reflect

A framework to reflect on working with Claude Code so you can continuously improve both your work (project) and your workflow (process).

Claude Code has an EDITOR hook - Ctrl-G. It opens a temp file in your editor, you write, save, close, and whatever you wrote gets injected back into Claude’s input box. The hook doesn’t care what happens between open and close. It just watches the file.

That gap is a black box. I replaced the gap with an fzf menu. Now Ctrl-G opens a control surface where any operation can run: edit your prompt with your IDE, enhance your prompt with an agent, browse and expand reflection seeds, and collapse its result back into the prompt file.

cc-reflection is what grew from within that black box. It’s not one system but several, wired together: a reflection skill that teaches Claude three lenses for self-examination, a dice accumulator that builds pressure over turns until it triggers organically, a seed store that captures observations, and a thought agent that expands seeds into codebase-grounded artifacts. The EDITOR hook is the connective tissue, and the menu is the container. Each piece works independently, but the hook gives them a shared surface.

Modeling how we actually reflect

Reflection is always after the fact. You observe what happened, not what is happening. Something in the work catches your attention and you go: huh. That’s the whole event. A small noticing.

Most of the time you’re mid-feature. You don’t want to pivot the session over a hunch. So you let it go. It dissolves. Whatever you half-noticed rejoins the noise and you never think about it again.

cc-reflection gives that moment a place to land. When a /reflection triggers, the agent assesses whether there’s anything worth surfacing using a designer skill. Sometimes there isn’t, and that’s fine. When there is, you get three choices: fix it now, dismiss it, or create a seed.

A seed is the interesting one. It captures the observation with a proposed artifact - what concrete action this thought might lead to. Then you move on. The session continues. The seed sits in the background, in the framework and in the back of your head, and it grows.

Actually engaging with a thought is expensive. It requires context-switching, holding multiple perspectives, stepping outside the work to look at it. You rarely do it in the moment. You come back later, when you have headspace, and open the menu. Your seeds are there, sorted by freshness. You pick one and engage. This is where the thought agent comes in. It gets a full interactive session: it reads the seed, pulls up the relevant code, reconstructs the context. You steer it. You add thoughts you’ve had since. It expands the seed into a detailed, codebase-grounded analysis with a clear next step.

That expansion can become a prompt you feed back to your main agent, like a sophisticated plan mode grown from a single observation. Or you let the thought agent make the changes directly. There’s freedom in how you use it.

The core design bet: noticing and analyzing compete for the same attention. If you try to do both in the moment, you either derail execution with second-guessing or suppress the observation to maintain flow. Seeds let you notice without analyzing. Expansions let you analyze without rushing.

The three examinations

The /reflection skill teaches Claude three lenses, borrowed from Confucius (吾日三省吾身 - I reflect upon myself thrice daily):

  • 一省: “Am I building this correctly?” - architecture, security, complexity, engineering patterns
  • 二省: “Am I building the right thing?” - UX, product, business logic, user mental models
  • 三省: “How am I working?” - process, conventions, workflow, skills and hooks

These aren’t categories for organizing notes. They’re lenses the agent applies in sequence when the skill fires. For each, it generates seedlings internally, tests them (“is this rooted in something I actually felt during the session?”), and discards the ones that don’t survive. A seedling that merely restates what’s already known, or describes a bug that was just fixed, gets rejected. Most reflections produce nothing.

That’s by design. 今日無省 - nothing to examine today. The skill treats silence as clarity, not failure. Fabricating insight to fill the space would pollute the seed store with noise. The framework’s most important output is often no output at all.

When thoughts surface

You can’t schedule insight. Real reflection happens because enough unexamined experience has accumulated that something pushes through. The longer the session, the more likely it is to happen.

The dice accumulator models this. After 7 (default) turns without reflection, the system starts rolling 1d20 behind the scenes at every turn. After 14 turns, 2d20. After 21, 3d20. Any natural 20 triggers an explicit nudge for the agent to reflect and a reset. Short sessions are never interrupted. Long sessions eventually get one, but when is unpredictable.

The dice decides when to reflect. The skill knows what to look for: moments where the session’s trajectory changed. The agent assumed something and was corrected. A standard approach failed twice and forced a creative workaround. The user kept steering in a different direction. Something took longer than it should have. These state transitions are where the interesting observations live, and the skill is trained to anchor there.

Inside the menu

When you press Ctrl-G, the hall of mirrors opens:

The Ctrl-G menu - seeds, editor integrations, prompt enhancer, settings

  • Edit prompt in your editor - vi, VS Code, Cursor, Zed, Windsurf, Antigravity. Opens the current prompt, you edit, save, close, it’s injected back into Claude’s input box.
  • Enhance your prompt - hands your draft prompt to a separate agent living in a tmux window, who rewrites it for clarity: adds structure, verifies every file path exists, and adds acceptance criteria, before sending back to your input box. Available in interactive (chat with the enhancer) or auto (fire-and-forget) mode.
  • Browse seeds - all project rooted seeds are listed with freshness indicators (🌱 fresh, 💭 growing, 💤 stale, 📦 archived). Preview pane shows in-the-moment rationale and expansion history. Select one to expand with a thought agent.
  • Expand seeds - select a seed and the thought agent takes over. It reads the relevant code, reconstructs the context, and you steer it interactively. It won’t speculate about code it hasn’t read. The expansion and its conclusion get attached to the seed permanently.
  • Settings - toggle model (opus/sonnet/haiku), expansion mode, permissions, filter, context window depth. All persistent.
  • Gardening - curate and archive stale seeds, purge old ones, delete individual seeds.

Reflection is first-person

My early version of cc-reflection had a background mode where the agent could delegate reflection to a subagent who would read the transcript instead. It never triggered.

Part of this is practical friction: subagents in Claude Code don’t inherit the parent’s full context. They get a fresh window with a task prompt. The best they can do is to read the transcript file as a third person. Apparently, the main agent never opted for this.

There’s a subtler thing. And this is something I’m claiming with my direct experience: even with perfect context transfer, the executing agent has the session “hot”: recent mistakes, abandoned approaches, and half-formed concerns carry disproportionate weight in the attention window. A subagent reading the same transcript cold treats everything with equal weight.

I call this “dark matter” in the project - context that’s technically present in the transcript yet practically lost in any handoff. Foreground reflection preserves it. Delegation strips it.

A recent paper (Dadfar 2026, “When Models Examine Themselves”) provides mechanistic evidence for this. When LLMs examine their own processing, the vocabulary they produce tracks their concurrent activation dynamics (r=0.44), but the same words in non-self-referential contexts show zero correspondence (r=0.05), even at 9x higher frequency. The paper studied models answering “what are you?”, but the principle appears to generalize: self-referential output during processing captures something that post-hoc description loses.

What it actually catches

cc-reflection in action - the skill discards tactical observations and surfaces a real architectural insight

When a reflection triggers and a seedling is chosen, you get three options: create seed, fix now, or dismiss. Many catches get fixed immediately and leave no trace. The seeds are what survived, observations that deserve their own space to develop. Here’s how they break down in my own usage:

32% are meta observations: insights about process, learning, and how to work with AI. They become additions to CLAUDE.md, hooks, and most importantly, skills. Another 24% split between engineering and architecture concerns which later became refactors and abstractions. The rest scatter across product decisions, UX patterns, type safety, complaints about external libraries, and performance concerns. This didn’t end up a ticket tracker as I initially worried (though you can absolutely build one on top), but a self observation framework. The seeds capture patterns that only become visible during the work.

vs. Claude Code’s built-in /insights

Claude Code already has a function to generate a retrospective report from your recent transcripts. It’s directionally similar but limited to a third-person review and lacks the depth that cc-reflection provides.

/insights can only see what was said, tool calls, messages, friction events. It can tell you “you had 59 wrong-approach events” but not how the approach was wrong in the active context. It partially covers the third examination (三省 - “how am I working?”) and has no lens for the first two: whether the architecture is sound, or whether you’re building the right thing. It doesn’t maintain the fidelity rooted in the project, either. A profile of you as a user, not an understanding of the space you’re in.

Seeds are repo-scoped and moment-scoped. They capture what a transcript can’t: the hesitation before a design choice, the implicit assumption that made a shortcut feel safe, the architectural pattern that only becomes visible 30 turns in. That’s the dark matter: first-person context that doesn’t survive into the commit, the summary, or the retrospective.

Final words

cc-reflection isn’t a 10x magic power. It slows you down, deliberately, but you end up somewhere you actually needed to be. It is a product of deeply engaging with each coding session while resisting the urge to abstract into something grander than what we can truly tame. After all, what a blessing is already upon us.

Technical bits

  • GitHub repo
  • Built for Claude Code
  • Bash + TypeScript, fzf menu
  • All state in ~/.claude/reflections/ - flat JSON
  • ~500 hermetic tests
  • MIT licensed