My way of interacting with agents hasn’t changed that much since I moved from Cline to Kiro and Claude, so I treated all the news and the fuzz about openclaw, ironclaw, and all the other claws with skepticism when not absolute denial (I still don’t think orchestration works for me), but in recent days after talking to some of you I realized that I might be missing something so I gave it a try, and let’s say I learned a thing or two.
In today’s article you’ll read me ranting loudly about the perils of automatic memory management and how I ended up fighting the machine more than benefitting my workflow, and the upmost question: «what problem are we really trying to solve here»?
As a disclaimer, this is my personal experience, and I don’t have the time, the patience, and the energy to perfectly tune my system, so it might be a skill issue, as the youngsters say these days. Another main callout is that my work, specially lately, involves a lot of systems and processes that are not related to each other, so I might not be the market fit for these tools. You have been warned, here we go.
Cruise control
I’ve ran this experiment of moving all my agent interaction to a claw system, and for the first few days thing were trending very good, I could continue conversations, and I could link concepts, so, admitedly, I started to pay less attention to the context window, and stopped treating each conversation as its own separated «cinematic universe».
Before, I would go for large conversations, specialy when working on large documents, just to compact them when I felt the context window was reaching 60-70% (On an Opus 4.6 with 1M context window) and continue for days. I would be mindful about the MCPs I load and the stuff I put in the context window.
With this claws, even as I had the indicators of the context window, you can feel there’s a tendency of assuming the system manages that for you (people stopped talking about context window size).
That’s the problem with cruise control, I stopped paying attention to part of the process, so I ended up stop paying attention at a larger part of this process, something I had already suffered a few months ago, just in a different shape. Did you know that Cruise Control (the old one, without sensors and auto-brakes) was linked to 12% increase of crash probability?. Things have improved, and sensors do help, but you get the analogy here.
To make things worse, this was right after my Human in the loop piece, so I was even more engaged in the work and less in the system, and let a few things slip (or slop, to take the pun).
Context poisoning
As I mentioned in the disclaimer, lately my job involves digging in different unrelated systems, make sense of them, and finding out if they work for my usecase or how do I need to bend them (or my use case) to make a perfect match.
There are three problems with this type of work and the memory management tools (I’m using claws but this rant applies to any self-updating aggressive memory management tool).
The first problem is that the LLM doesn’t really know what matters, so it will make up relationships across things, will store them and the next session will read them as part of the prompt and treat them as gospel. This compounds the problems of non-determinism, because we are adding wrong information to the context window, information the agent has no reason to question, and by the end of the session it might add another piece of information, and the story goes on.
The second problem, and it is somehow related with the first, is that the LLMs are stubborn, and whatever goes in the beginning of the prompt will stick. My usual solution for this stubborness involved dumping the context in a markdown file, editing out the broken piece, clear the context window and start again. With memory management tools, the problem doesn’t become visible until very late, specially if we don’t get used to seeing the thought process of the LLM (And who wants to spend their time watching the machine think).
The third problem with context poisoning is that is expensive, it causes our sessions to be longer, and we end up burning tokens for the sake of pulling data that we might not need, this increases the context window, which increases the risk of hallucinations, which makes for more inefficient workflows, specialy if we need to pay for those tokens.
Remember the point of all this
My wait a minute problem became clear when data from unrelated investigations started showing up in my current thread, and when I started seeing the model trying to apply learnings from other threads into this one, when they didn’t apply. I ended up spending some time fighting the tool instead of producing work.
These tools are expected to make us more productive, to get things done faster, to spend less time on the tedious parts, but it just doesn’t work if the facts the tool works with are not properly linked.
These tools also suffer from the same perils that any successful productivity tool. There are hundreds of hours of Youtube videos on how to customize your Notion, Asana, Obsidian, Trello, or your Bullet Journal. Not losing the focus is key here because the customization is infinite, and we use these tools for a reason, we want them to get a specific outcome, and sometimes the FOMO and the marketing prevents us from doing the very thing these tools are expected to help with.
That made me think that the way, at least for me, was not pervasive memory nor an obsessive level of tool customization, so maybe the learning here was to harness the right type of knowledge with the tools I’ve already used.
Developing new Skills
From this journey I take a renovated interest in short bursts of energy pointing in the right direction. Smaller sessions focused on one problem at a time, with the context I decide that is important and the sources I think matter. I might lose the serendipity that all these tools promise, but at least I keep the amount of voices in the head of the agent to one, mine, and to one moment, now.
With that in mind I’ve started paying more attention to skills (I know, I’m late, but considering OpenClaw is not even one year old, I’m on my standard lag), which I’m starting to see as something more useful than this idea of the ever-present memory. The Skills allow you to have pockets of knowledge and wicked ways that you can load on demand, and be mindful about when to load them and when not to load.
Keeping the sessions (and skills) separated also enables adversarial analysis. I can have an agent to do a research and go wild, and then I can have a second agent fact-check the findings. The fact that those two live in their separate context window is a feature of the system, not a bug, and it’s something to exploit, not to bury, which is another way of seeing the OODA loop.
This becomes a way more interesting value proposition, and I have started using it for a few projects, specially when I need the agent to be brutally honest with my work. I am exploring having skills that critique my work so that it matches certain expectations but to not lose my voice, and I’m looking at ways to make skills work not only for me but for my team, and that’s going to be an interesting journey. This is a work in progress and something I am looking forward to write more about.
Conclusions
There’s a massive market for OpenClaw and alikes (Microsoft just announced native support on Windows, Amazon runs it on Lightsail) and there’s a lot of benefits to these tools. They make AI more accessible to more people, and for certain workflows that have a strong correlation they might be the assistant that rules them all.
I do believe that these systems have pushed the industry in the right direction, showing us the way on what agents can do, but I’m starting to see it more like the clothes from fashion week or the prototypes in car shows. They show the future, but that future takes a couple iterations to land in a position that works.
They are hammers, and they are very good hammers, and very well crafted hammers, but not everything is a nail. We might just need a command line, a problem and a few skills. I’m still figuring it out, stay tuned.

Deja un comentario