LLM14 – Human in the loop – rlbisbe @ dev – El Blog de Roberto Luis Bisbé

It’s been a year since I’ve started working heavily with coding agents, first chat-based, then Cline, then Kiro IDE, then Kiro CLI and Claude Code, and those two have been consistent to this date.

As time went by, my relationship with the agents has evolved, focusing on testing and mocking [LLM9 and LLM11], communication via documents [LLM10 and LLM12], and keeping myself grounded [LLM7 and LLM13].

However, lately after some conversations I’ve found myself having a more detached relationship with the work I was producing, and that has caused me to move back to a position of being more engaged and spending more time directly in the code, not just reviewing what agents produce.

In today’s article, I’m going to borrow from the SWOT analysis (which we have been running in the family for ages now): we will talk about internal causes that lead us to lose engagement with the work, how I’ve moved to a more «driving seat approach» and a warning about the external threats that might be, without us realizing, pushing us out of the loop.

Internal causes

It’s just a document

I mostly write text for a living, code is just one of the types of text that I do. As I mentioned in LLM12, for some documents I took a more passive approach to editing, steering the agent and providing feedback.

The problem of providing feedback is that the content improves, but you don’t retain it the same way. You didn’t write it, so it doesn’t stick.

The other problem of becoming detached is that the process is gradual and subtle, and one day you realize you are sharing a document that has your fingerprints all over the place, that is technically solid, concise and to the point, but you don’t have the details in your head.

For me then it was somehow natural to drop the «editor» role for text without even realizing, stop using my Markdown Viewer, go back to Obsidian, make my changes in the document, rewrite entire sections, and then ask the agent to apply my comments, and to repeat some patterns that I set across the document.

Writing the words myself, even with the agent spell-checking and tone correcting the crap out of it, made a significant difference with my relationship with the work, and made me more proud of what I’d built. The question is, why didn’t that happen with the code?

Review roles and the long loops

Once I placed myself in the review position, the other problem I got is that as time went by, I found myself offloading more reasoning to the agent and trusting it would self-correct, and it did. As a result the work that is presented to us is complete, passing tests if we closed our loop correctly, but arrives minutes after we last touched it.

In those minutes I’d probably spun out another agent and have it also doing its thing, and a third, and a fourth. I don’t really believe I can manage more than 7±2 agents, but not all agents demand the same attention. A research agent can be left to its own devices (pun intended), but an implementation agent is changing code I’m responsible for.

Once the problem comes back (because they always do) I have spun out more agents so the context change tax is steep, I need to load the context again in my flesh-based working memory, understand the code that was produced, the changes that were done, the reason that triggered those changes, and I end up wasting so much time that it could even eat up the benefits of agents. So how can we leverage the technology without going insane, my answer, at least for now, is to interrupt the agent, and bring back the human in the loop.

Back to the human in the loop

The indicator that something is going in the wrong direction is subtle, but it can be detected. In my case it was a conversation with an engineer where he was sharing his workflow and he mentioned that he, specifically, built manually the data structures and the test cases, and let the agent fill the gaps, reducing the feedback loops, and that minor comment as part of a larger conversation resonated with me, a lot.

The models have become so good that given a prompt «build me a spec for a todo list», I can create the spec, the test plan, the linters, the steering documents, the Claude.md file and the integration tests and the GitHub Actions without looking at the code until the very end, and let the agent make a lot of decisions on its own that as of today aren’t inherently wrong, but reviewing a decision and making a decision are not the same thing.

With that in mind I’ve started to add friction points to my interactions with the agents, let them cook for other things, and the best way of trying it out is with a scale model.

I’ve created yet another app, a personal finance one, started with a couple of documents, a UI definition of the interface and the data structure, and let the agent implement some of it, then I would go to the code and basically rewrite part of it, add comments, delete sections and replace it with a comment to create or extract a function.

Because I can rely on the agent for the syntax, my comments and my changes can be wrong syntactically, but semantically they guide the application in the direction I want to take it. I’m not doing it for all the components (the piping and the infrastructure to create a Tauri application, which the agent can pretty much handle on its own).

My gain is simple, I know how the app works, I’ve probably burned fewer tokens because the amount of correction has been lower, and I’ve actually gone faster, and the only thing that has changed is how engaged I am with the work.

What about spec-driven development?

SDD is a growing paradigm for agent-driven development, and front-loads the focus on the spec, the specification document that defines what the application is meant to do. There might be a time I move away from looking at the code, and the time between now and then will tell me how well this post has aged, then the focus will be moved to the spec.

In this case, I’m treating the spec as I treat this article, the words are mine, and I let the agent critique it, but I do the work, I get the learnings. And this is important because, as we are about to see, the expectations of the work we produce do change and with those, the incentives to multitask.

How did we get here

Expectations of the role

As I’ve grown in my tech career, the problems shift from a function, to a component, to a system, to multiple systems, to programs spanning multiple systems that either exist, need to be created, or need to be removed. The trick is that you don’t move permanently, you shift up and down like in a manual car during the span of a week (sometimes even during the span of a day).

With an ever-more-fragmented calendar, the incentive is to multitask, have a more detached relationship with the work I do so that I can «produce» more of it, but the consequence is that I’ve seen it hinder my ability to focus on a single hard problem that can have a meaningful impact.

The voices and the tooling

The rest of the world seems to be doing just fine, Claude Code and Codex are on mobile, OpenClaw is a thing, and you can now produce work by talking to the machine, typing is not a thing anymore. On top of that, established professionals from the industry are coming out of retirement to build these tools claiming that «code is solved».

The incentives are something to be skeptical about, vendors are looking for ways to capitalize on token consumption, so a model that self-corrects in a 30-hour loop will get the job done while consuming a significant amount of tokens. The tools work, that’s undeniable, but what’s still a spectrum is the level of interaction with these tools.

The world ahead

This reflection doesn’t mean that suddenly all my workflow needs to revert to the IDE. I do a lot of exploration and prototyping, and not all code changes need to have this level of engagement, but they all need some level of it, and I do need to leverage any opportunity to keep my skills sharp.

The problem with losing grip, as Avril Lavigne would say, is that I didn’t notice until something slipped: I was in a meeting defending an architecture decision but wasn’t able to recall specifics, I was asked about a critical path in another system I contributed to and wasn’t able to give an on-the-spot answer.

These things happen naturally as we grow in our careers, but an over-indexing on AI accelerates the detachment before we get a hold of the skills we need for the next level. Keeping myself in the loop and keeping engaged with the work, that can be a narrative, a spec, an implementation detail or an operational manual, is key to understanding the work, and most importantly, to stand behind it and defend it.

What about you, have you noticed the shift?