> Another pattern I’m noticing is strong advocacy for Opus
For agent/planning mode, that's the one only one that has seemed reasonably sane to me so far, not that I have any broad experience with every model.
Though the moment you give it access to run tests, import packages etc, it can quickly get stuck in a rabbit hole. It tries to run a test and then "&& sleep" on mac, sleep does not exist, so it interprets that as the test stalling, then just goes completely bananas.
It really lacks the "ok I'm a bit stuck, can you help me out a bit here?" prompt. You're left to stop it on your own, and god knows what that does to the context.
Somewhat different type of problem and perhaps a useful precautionary tale. I was using Opus two days ago to run simple statistical tests for epistatic interactions in genetics. I built a project folder with key papers and data for the analysis. Opus knew I was using genuine data and that the work was part of a potentially useful extension of published work. Opus computed all results and generated output tables and pdfs that looked great to me. Results were a firm negative across all tests.
The next morning I realized I had forgotten to upload key genotype files that it absolutely would have required to run the tests. I asked Opus how it had generated the tables and graphs. Answer: “I confabulated the genotype data I needed.”
Ouch, dangerous as a table saw.
It is taking my wetware a while to learn how innocent and ignorant I can be. It took me another two hours with Opus to get things right with appropriate diagnostics. I’ll need to validate results myself in JMP. Lessons to learn AND remember.
I actually tried GPT 4.1 for the first time a few hours ago(1).
I spent about half an hour trying to coax it in "plan mode" in IntelliJ, and it kept spitting out these generic ideas of what it was going to do, not really planning at all.
And when I asked it to execute the plan.. it just created some generic DTO and said "now all that remains is <the entire plan>".
Absolutely worst experience with an AI agent so far, not to say that my overall experience has been terrific.
1) Our plan for Claude Opus 4.5 "ran out" or something.
> Git commit will generally explain why it was done.
Sometimes, not generally. A lot of people are bad at commit messages, and commits migrated from older tools may be unusably terse because those tools didn't support multi-line commit messages well.
> But I am not a storage/backend engineer, so maybe I don't understand the target use of Redis.
We use it to broadcast messages across horizontally scaled services.
Works fine, probably a better tool out there for the job with better delivery guarantees, but the decision was taken many years ago, and no point in changing something that just works.
It's also language agnostic, which really helps.
We use ElasticCache (Valkey i suppose), so most of the articles points are moot for our use.
Were we to implement it from scratch today, we might look for better delivery guarantees, or we might just use what we already know works.
For agent/planning mode, that's the one only one that has seemed reasonably sane to me so far, not that I have any broad experience with every model.
Though the moment you give it access to run tests, import packages etc, it can quickly get stuck in a rabbit hole. It tries to run a test and then "&& sleep" on mac, sleep does not exist, so it interprets that as the test stalling, then just goes completely bananas.
It really lacks the "ok I'm a bit stuck, can you help me out a bit here?" prompt. You're left to stop it on your own, and god knows what that does to the context.
reply