Hi Jeroen,
Great observations — and yes, model selection is absolutely becoming a core skill. Your instinct to split "thinking" from "execution" is exactly the right mental model. Here's how I approach it:
The tiered model approach
I follow a similar split but think of it in three tiers rather than two:
- Architect tier (Opus-level) — complex reasoning, ambiguous problem scoping, architecture decisions, anything where getting the direction wrong is expensive
- Execution tier (Sonnet-level) — writing code, drafting content, iterating on a known solution, structured transformations
- Autocomplete tier (Haiku/Codex-level) — inline suggestions, boilerplate, repetitive edits where latency matters more than depth
The key insight is that the cost of a wrong answer should drive model choice, not just task complexity. A fast wrong answer in the architect tier wastes far more time than a slow right one.
Prompt structure matters as much as model choice
A well-structured prompt on a lighter model often beats a vague prompt on a powerful one. What consistently speeds things up for me:
- Give the model its role upfront: "You are reviewing this as a senior backend architect"
- Constrain the output format explicitly: "Respond only with bullet points, no explanations"
- For coding tasks, include the error message, the relevant code snippet, and the expected behavior in one block — no back-and-forth needed
- For complex problems, ask the model to restate the problem before solving it — this catches misunderstandings early and saves iterations
Splitting thinking from execution in practice
A workflow I use regularly for feature work:
- Use a reasoning model to produce a structured implementation plan with clear acceptance criteria
- Feed that plan directly as context to the execution model — it now has the "why" and produces much tighter output
- Use a fast model for review loops and small fixes
This chaining approach means the expensive model runs once, and the cheaper model does the heavy lifting with good context.
One pattern worth trying
When a task feels stuck or the output quality is dropping, the problem is almost always the prompt, not the model. Switching to a more powerful model without fixing the prompt rarely helps. Reframing the problem statement and retrying on the same model usually does.