Choosing AI models effectively: how do you optimize for speed and quality?

I’ve been using GitHub Copilot more and more in my daily work with AI, and I’m starting to feel that choosing the right model is becoming a skill in itself.

For example, I tend to use different models depending on the task:

  • More advanced models (like Opus-level reasoning) for planning, architecture, and breaking down complex problems

  • Faster / lighter models (like Sonnet, Codex-style models) for execution, coding, and iteration

This approach seems to significantly improve both speed and output quality. I also notice that when prompts are structured well, tasks can be completed surprisingly fast, sometimes much faster than expected.

What I’m curious about:

  • How do you choose which AI model to use for a given task?

  • Do you deliberately split work between “thinking” models and “execution” models?

  • Have you found specific prompts, workflows, or tricks that consistently speed things up?

  • Are there patterns you follow to get better or faster results?

Curious to hear how others are approaching this.

Tags:
AI KentiCopilot Code generation Software development

Answers

Hi Jeroen,

Great observations — and yes, model selection is absolutely becoming a core skill. Your instinct to split "thinking" from "execution" is exactly the right mental model. Here's how I approach it:

The tiered model approach

I follow a similar split but think of it in three tiers rather than two:

  • Architect tier (Opus-level) — complex reasoning, ambiguous problem scoping, architecture decisions, anything where getting the direction wrong is expensive
  • Execution tier (Sonnet-level) — writing code, drafting content, iterating on a known solution, structured transformations
  • Autocomplete tier (Haiku/Codex-level) — inline suggestions, boilerplate, repetitive edits where latency matters more than depth

The key insight is that the cost of a wrong answer should drive model choice, not just task complexity. A fast wrong answer in the architect tier wastes far more time than a slow right one.

Prompt structure matters as much as model choice

A well-structured prompt on a lighter model often beats a vague prompt on a powerful one. What consistently speeds things up for me:

  • Give the model its role upfront: "You are reviewing this as a senior backend architect"
  • Constrain the output format explicitly: "Respond only with bullet points, no explanations"
  • For coding tasks, include the error message, the relevant code snippet, and the expected behavior in one block — no back-and-forth needed
  • For complex problems, ask the model to restate the problem before solving it — this catches misunderstandings early and saves iterations

Splitting thinking from execution in practice

A workflow I use regularly for feature work:

  1. Use a reasoning model to produce a structured implementation plan with clear acceptance criteria
  2. Feed that plan directly as context to the execution model — it now has the "why" and produces much tighter output
  3. Use a fast model for review loops and small fixes

This chaining approach means the expensive model runs once, and the cheaper model does the heavy lifting with good context.

One pattern worth trying

When a task feels stuck or the output quality is dropping, the problem is almost always the prompt, not the model. Switching to a more powerful model without fixing the prompt rarely helps. Reframing the problem statement and retrying on the same model usually does.

There are three main attributes I consider when choosing an AI model:

  • Desired output: Am I generating code or text such as analysis, proposals, or explanations?

  • Context size: Will I work with just a few files or a large codebase, possibly using MCPs and similar tools?

  • Task complexity: How difficult and ambiguous is the task?

I use Cursor, and my typical workflow is to start with Plan mode and then move to Agent mode for execution. Plan mode is especially useful for refining prompts and aligning on the approach before implementation starts.

How I choose models

  • For text-heavy work such as analysis, writing, or planning with large context and high complexity, I usually use Opus 4.6 in both Plan and Agent mode. In my experience, lower-tier models often do not meet the quality bar I expect for this type of work.

  • For code tasks with large codebase context and high complexity, I typically use GPT 5.4 in both Plan and Agent mode. If the task is very complex, I sometimes use Opus 4.6 as a reviewer to validate the solution.

  • For code tasks with large context but lower complexity, I use GPT 5.4 for planning and Composer 2 Fast for execution.

  • For smaller codebase scope with high complexity, I stick with GPT 5.4 for both Plan and Agent mode, or occasionally use Composer 2 Fast for execution if the plan is already very clear.

  • For small scope and low to medium complexity, I often rely on Auto mode or Composer 2 Fast for both planning and execution.

In general, reasoning models are excellent for planning and for any non-trivial task. Faster execution models like Composer 2 Fast work best when the task is well defined and the plan is already solid.

A few tips to improve speed and quality

  • Always use Plan mode when possible. It takes more time upfront, but significantly reduces rework and iteration later.

  • When I am not sure which model is best suited for a given task based on its complexity, I’ve found that choosing the higher-tier model usually pays off, as it saves time on later iterations.

  • Composer 2 Fast (Cursor specific) is very fast for straightforward execution tasks.

  • Create skills for recurring tasks. It saves time and keeps outputs consistent.

To response this discussion, you have to login first.