I asked Claude why my files kept growing. It said: ‘that is boil the frog’

TL;DR: Vibe coding is great for PoCs and miserable for real projects. I had Claude write 55,000 lines of code for me in about eight weeks and learned that skills and claude.md are not sufficient. At the bottom of this post there’s a plugin that packages the method I developed. It gives traceable, fully documented implementations. Add the plugin with two commands and it’s in your project.

How this started

Starting this year I heard about OpenClaw. Skyrocketing. And Peter Steinberger went famous “in a minute”. Obviously right point, right time. Well deserved I guess. And then everything started to move at light speed. Demos everywhere, people were building apps in twenty minutes, and I was sitting there thinking if I didn’t figure this out soon I’d miss whatever was happening. Needed to get my hands dirty. Something with real stakes, something I could actually learn from.

The hypothesis was simple. All of it was about AI. Thinking about all the streams and virtual assistants doing great things, what do I need? Ticket to PR. An agent that reads a ticket, understands it, changes the code and finally opens a pull request. Controlled implementations to move the easy or medium complex tasks to an AI. What does it mean to set this up?

Trying to move fast while hitting walls

Bought Claude max. I considered 110 Euro/ month to be pretty expensive, but for a month at least? I started to let Claude implement it. I wanted to see if Claude is really able to do it autonomously. And I didn’t write a line. I didn’t want to “speed up by not knowing”. And I do not tell the “AI takes over all developer jobs end of the year” story. I didn’t believe in it anyway, this was my test balloon to prove it.

So I let Claude do the job.

Used ZED, JetBrains and VsCode as IDEs. Stuck to VsCode finally. It has the same problems as all the others anyway. Sometimes it “just gives up”. Or does not response anymore. When having talked a lot to Claude to explain my next feature, this is really time consuming when the context is gone. Starting all over again when having restarted the IDE, was annoying. Really annoying.

Another thing I did miss was some kind of a structure. I need to tell Claude the folder structures, the separation of code in files, to know where to put what. How to split things. Do it SOLID, DRY and tell don’t ask.

So do what all the other did as well, I guess. Add CLAUDE.md with instructions. coding-principles.md with the rules. That should do it, I thought on the first run. And the second.

Surely, it didn’t work out.

This is not good enough

When there is feature after feature, how does Claude know where is what? How do I know what is actually there to understand what is in place?

Putting lots of tokens he’ll find it and can tell me. This does not convince me as a solution. Sure, Skills and coding principles help. After some features I asked Claude: We have this rules in coding principles:

120 lines of code max per file
20 lines of code max per method
only one type per file (interface, class, enum,…)

“Claude, please calculate all file sizes and let me know where sizes exceed the limit”. I did this multiple times and it was the same everytime. Files exceeded 500 lines of code.

I asked Claude why and he answered “that is boil the frog”. Things are going to be added and the files grow. This is really a difference to how I program. I don’t just add. If something exceeds a certain degree of complexity I am going to change my plan. One reason why Claude will not directly replace everybody, I guess.

There are regular refactoring sessions to split up the code to match the conventions.

But anyway I needed kind of a plan that is written down. Talking to Claude to let him “just do something” always ends up in undocumented somethings.

It is much easier to search in an index than the whole book, right?

So where are my plan to control the flow and to structure it for my AI? On the one hand, I’m trying to tame the beast, but I still have no idea how to handle it.

The phase, the context and the reasoning

The structure I ended up with wasn’t designed. It evolved.

First I just had too many features and working on them in parallel meant juggling multiple Claude sessions, each with its own memory of what we were doing. I experienced that switching contexts between Claude session even if I don’t write the code is pretty exhausting. I didn’t expect this.

Anyway, I need plans. I discussed with Claude and let him write down what we are going to do. Just md, like he wanted. Then a context.md. This context would just have the summarized information of what the program is about and what plans are active, done or in planning. I didn’t call it plan, but phase. Context is read right from claude.md instructions. Full phase information only when needed.

Phases got long and therefore also expensive. I didn’t notice this on the first run. When I had 70 plans with 120,000 tokens, it grew to be a challenge not an advantage. Again, letting Claude read all the phases consumed too many tokens and got slow.

Anyway I didn’t like these phases. Lots of explanation and even code samples. Why should this be a benefit? I anyway don’t read phase documents, Claude does. Let’s do “key=value”. Use YAML with a schema. Claude reads YAML faster than prose, and I can validate it. Claude consumes differently than a human does.

And while we are talking about phases and optimization. Usually decisions and reasoning are made when defining the phases and make the plan. When I get stuck with a complex piece of code that has a certain age, I always asked for the “why?”. Certainly I do not find this in code, maybe in developers’ minds. Claude can automate this.

Three things that actually worked

After 90+ phases it came down to three artifacts:

The phase. Short & structured. A summarized, AI understandable artifact that tells the complete story about the next thing to be done. A schema that can be followed that phases look comparable consisting of goal, decisions, steps.

The Context. A short context.yaml at the project root. A summarized picture of the architecture, the stack, the current state of the software in terms of phases. Again a yaml file that follows a schema. The agent reads it before every session. With this, Claude has an overview about the software with less than 1000 tokens.

The reasoning. Claude is forced to write the architectural choices of phases to decisions.md. This is the “why”. Since AI will not complain about the time it needs to document, unlike most developers including me, documenting the why is easy. I never had reasoning in code that made understanding the decision tree of the code that easy.

The Idea

I now have 90+ phases used in my own implementation. At some point in time I realized it doesn’t make sense to keep it buried in this project, so I extracted it.

It got its own github repository, I added a Claude Code plugin for easy usage. Bootstrap a project, some phase management decision logging and methodology updates are part of the skill set and run automatically. Two commands to install:

/plugin marketplace add holgerleichsenring/specification-first-agentic-development
/plugin install spec-first@specification-first-agentic-development

When you want more details have a look here:

GitHub: Specification First Agentic Development.
Agent where the implementation had been extracted: Agent Smith
Blog post: Next Level Vibe Coding

codingsoul