I think that is because you do implicit plan tracking, creation and modification of the plan in your head in light of new information and then follow that plan. I'm not sure these tools do that very well.
The long running task, at it's core, is composed of many smaller tasks and you mostly focus on one task at a time per brain part. It's why you cannot read two streams of text simultaneously even if both are in your visual focus field.
> you do implicit plan tracking, creation and modification of the plan in your head in light of new information and then follow that plan. I'm not sure these tools do that very well.
I think the plan is not just words, if it was, you could read a book on how to ride a bike.
Because we communicate in language and because code output is also a language we think that the process is also language based, but I think it's not, especially when doing hard stuff.
I know for certain in my case it isn't -- when tracking a hard problem for a junior after 2 hours of pair programming the other week, I had to tell him to commit everything and just let me do some deep thinking/debugging and I solved the problem myself. Sure I explained my process to him in language the best I could, but it's clear it was not language, it was not liniar, I did not think it step by step.
I wish I could explain it, but when figuring out a hard problem, for me it takes some time to take it all in, get used to the moving parts, play with them. I'm sure there are actual neurons/synapses formed then, actual new wires sprawling about in the brain, that's why it takes time. I think the solution is a hardware one, not a software one.
That's why we can sleep on it and get better the next day and that's why we feel the problem. There are actual multiple paralel "threads" of thinking going at the same time in our heads and we can FEEL the solution as almost there.
I think it simply is that hard problems can occur in a combination of code, state, models that simply cannot be solved incrementally and big jumps are necessary.
I'm not saying the problem cannot be solved incrementally, but it's possible that by going in small steps, you either reach the solution or a blocker that requires a big jump.
I just finished my workday, 8hrs with Claude Code. No single task took more than 20 minutes total. Cleared context after each task and asked it to summarize for itself the previous task before I cleared context. If I ran this as a continuous 8hr task it would have died after 35-ish minutes. Just know the limitations (like with any other tool) and you’ll be good :)
I always find it wild that none of these tools use VCS - completed logical unit of work, make a commit, drop entire context related to that commit, while referencing said commit, continue onto the next stage, rinse and repeat.
Claud always misunderstands how API exported by my service works and every compaction it forgets all over and commits "oh api has changed since last time I've used, let me use different query parameters", my brother Christ nothing has changed, and you are the one who made this API.
Yes, I can, and I do, I'm pointing out that compressing an entire conversation history into a single message is so lossy that I might as well start a new session.
Yes, I can also tell any agent to commit more often, but that's again not what I'm saying. I'm saying version control can be integrated way deeper into agent workflow.
Idk I had cursor/claude untangle and commit to two separate logical branches yesterday from a bunch of random working copy changes that I had made. You can prompt it to use git commands and it works well enough in my experience.
That's a wild comparison to make. I can easily work for an hour. Cursor can hardly work for a continuous pomodoro. "Long-running" is not a fixed size.