Context engineering for software engineers

Date published

Monday, 02 March 2026
•
Author

Alex Jones
•
Estimated reading time

8 minute read

A generated summary of what context is, how context fails, what context engineering is and practices we can use to follow context engineering principles

This audio content has been generated by AI. It is provided for convenience and may not perfectly reflect the original text.

What is context engineering?

Context engineering is the purposeful design of everything that goes into the context window of a Large Language Model (LLM). The aim of context engineering is to garner as much consistency out of code generation as possible. LLMs are inherently non-deterministic, meaning their output varies across runs. We need to make sure there is a higher level of determinism by providing LLMs with good context and following strong, defined processes.

Explaining context

If your LLM is a CPU, context is the RAM

Andrej Karpathy

Imagine you’re using an LLM (like Claude Sonnet, or GPT Codex) to generate some code. Your LLM is responsible for processing your request and giving you some output, but it only has so much capacity for memory, so you've got to put the right stuff in there. Just like with RAM, we don’t aim for 100% utilisation. In fact, the closer we drift towards 100% utilisation, the further we drift away from code quality.

LLMs are stateless functions; they rely purely on inputs to produce their outputs. The only other factor that affects the quality of the output is the model's training and tuning. Since we're not likely to be training the model ourselves any time soon, the only way we are going to improve what comes out of them is by improving what goes in them.

What is context made up of?

Context is everything you load into the RAM:

Instructions, prompts, skills
Tool definitions
Tool responses

Below is a screenshot from VSCode CoPilot Chat showing the makeup of a context window. For the session, I've utilised 47.8k tokens of the 128k window, and it's made up of various pieces of context, including: System instructions, Tool definitions, Messages, Files and Tool results.

Context failures

Context can fail in four main ways. These failures were described by Drew Breunig in a blog post titled How long contexts fail (2025). The blog post pulls information from a few sources to come up with categories of failure, which are summarised below:

Context poisoning: When a hallucination makes it into the context
- If an LLM has hallucinated a goal or an appropriate strategy before it codes, it can ruin everything
Context distraction: When the context overwhelms the training
- LLMs can end up behaving preferentially to its history, rather than its training
Context confusion: When superfluous context influences the response
- This is effectively stuffing the context with too many tools from your shiny MCP servers
Context clash: When parts of the context disagree
- Chunks of context can perform worse and cause clashes, vs using all the correct context in one shot. This is in part due to the distraction

Some of these failures were observed on models with 1m+ token context windows. Although having a bigger context window helps, really, the issue is context failure, not context window limitations.

Below is a chart showing the results from a study called NoLiMa: Long-Context Evaluation Beyond Literal Matching 2025):

A line chart showing how, with the increase in context length, the performance score decreases

Model	Context Window (K)
GPT-4o	128
GPT-4o mini	128
Llama 3.3 70B	128
Llama 3.1 405B	128
Gemini 1.5 Pro	2,000
Gemini 1.5 Flash	1,000
Claude 3.5 Sonnet	200

The study explains that LLMs are now built to handle very long documents, but many tests don’t really measure deep understanding. The tests typically performed are known as ‘needle in the haystack’ tests. A key piece of information (the needle) is inserted into a document (the haystack), and the LLM is tasked with retrieving it. In these cases, the LLM is performing relatively simple pattern matching.

This study addresses this by hiding important information in the context in ways that don’t match the wording of the question asked of the LLM, forcing models to think more abstractly.

When 13 popular models were tested, they performed well on short texts but became much less accurate as texts grew longer and clues became less obvious.

Approaching resolving these failures can be broken down into four ways:

Writing context: storing context on disk
Selecting context: Pulling in the right context into the window
Compressing context: Only keep tokens relevant to the task
Isolating context: Splitting up context, typically using subagents

How does this apply to coding agents?

Engineers are responsible for building the context: selecting files and tools, and organising instructions relevant to the task at hand. This means we contribute to context failures, and we must employ techniques for ‘focusing’ the agent. The main technique we're going to focus on is intentional compaction.

Intentional compaction techniques: research, plan, implement

Breaking development into three parts helps the agent. It acts as intentional compaction, at points where you decide, helping to hone the input to your agent.

It also helps you!

Creating prompts for each phase will give you control over what is generated at each phase and provide consistency with both input and output. For example, if you’re trying to fix a bug, you might break it down like so:

Research: Collect all the information related to your bug. This could be a bug or a crash report. Ask the agent to research the problem and verify its output. Ask for verification from others.
Plan: Use the research output to plan a fix in the codebase. The agent will detail the necessary steps to resolve the issue. The plan prompt might require the agent to:
- Identify coding conventions
- Perform deeper research
- Plan implementation phases
Implement: Manual or Agentic implementation of the plan

Research - Understanding the problem and how the system works today, including filenames

Prompting an agent to research and build a document for you to review keeps the agent focused. After reviewing the output, you can carry on the conversation with the agent to refine or expand any research. Once the document has been completed, you should be able to delete the short-term memory of the agent, i.e. clear the conversation history, or start a new session.

A screenshot showing the CoPilot /clear command

A screenshot showing the New Session button in VSCode CoPilot chat

The research now effectively represents the problem you're looking to solve, including distillations of the conversation that you had with the agent. You should, at this point, have enough information in the research document to develop a plan.

Example Research Output:

## Research Question
A customer reports that when registering a custom `IScimValidator<User>` via the 3-parameter `AddResource<User, UserStore, ScimUserValidator>()` overload, the application throws at startup

Planning - Building a step-by-step outline of the changes to make

Now that there is a research document, a new agent can come along and create a plan of action. The same principles apply here: conversationally create a document whose final output is a distillation of everything that needs to be covered during implementation.

This plan output can follow a structure that you define in your prompts. It's advisable to create plans that are broken up into phases. This will help you clearly understand what will be produced. You'll be able to commit code at the end of a phase, test a specific phase, and pick up where you left off at another time.

Example ‘Plan’ output:

## Overview
Fix two dependency injection registration bugs that cause runtime failures when users call `AddResource` with 3 type parameters or use `AddExtensionValidator`:

A note on the 'Plan' agent in VSCode or the like

This is just a prompt backing an agent. You and your organisation should, ideally, own this prompt. This gives you the finest control over how the agent should behave and avoids unexpected behaviour caused by an update. A sample of the prompt VSCode uses can be found on our GitHub.

Using subagents

In a standard chat, each question and piece of research consumes valuable space in the LLM's context window, potentially leading to context failure. Subagents solve this by starting with a fresh context for every task. When a subagent finishes, its temporary context is discarded. The main agent receives only the distilled outcome.

Define your task in detail (set standards, behaviour patterns) and specifically request that your main agents use subagents. The main agent will delegate the task to the required subagent instances, passing in any necessary context. The subagent will do the work in its own context window and return a summary of its outcome to the main agent. At this point, the subagent's context has been discarded, whilst the main agent has all the context it needs to continue its next task.

It's worth noting that subagents aren't for creating specific types of agents, e.g., a 'UI agent' or a 'Backend agent,' but for executing specific, possibly context-intensive tasks. Examples include using a subagent to find existing testing patterns in a solution, or to research documentation on the internet.

Tool selection, skills & impact

Tools

The IDE you use determines the tools available to your agent. VS Code offers three kinds of tools in chat:

Built-in tools that come ready to use
Model Context Protocol (MCP) tools that let AI models connect to external services
Extension tools that are added automatically when you install an extension, giving the model deeper, specialised capabilities through the VS Code extension APIs.

Skills

Agent Skills are self-contained skill packages that include instructions and bundled resources to help AI agents perform specialised tasks. Unlike simple instruction files, skills can include supporting assets such as scripts, code samples, or reference data, making them ideal for complex or repeatable engineering workflows. Skills also support progressive disclosure, ensuring they are loaded only for relevant tasks rather than unnecessarily occupying context.

A great starting point for incorporating skills into your workflow, or writing your own, is the Awesome CoPilot GitHub.

Impact

Since the tool set is sent to the LLM in the context window, tool selection is important. Putting too many tools into the context window can flood the LLM with tokens, and can lead to confusion, where the LLM has so many tools to choose from that it picks the wrong one. Prompt files can be annotated with the tools that it requires use of, helping to mitigate issues of flooding the context window.

Due to the nature of progressive disclosure, the same overloading is not true for skills. Skills are loaded in a just-in-time manner, helping to reduce context window utilisation.

Conclusion

As software engineers, the real advantage of context engineering comes from treating an LLM’s context window as something you intentionally design rather than a dump of information. Throughout this article, we explored how the quality of code generation depends on what you choose to include in context, how failures like poisoning, distraction, confusion, and clash emerge when context is noisy, and why larger windows do not solve these issues on their own.

The Research, Plan, Implement workflow provides a structured approach to intentionally compressing and refining context at each stage, giving the model only the minimum necessary information to succeed. Subagents help isolate complex tasks so that each one works with a clean slate, keeping the main agent focused while avoiding token bloat.

For engineers building with or alongside AI, simply put, you get better results when you treat context as an architectural responsibility, deliberately shaping the inputs that guide your tools rather than letting them accumulate unchecked.

Last updated: Wednesday, 01 July 2026

Alex Jones

He/him
•
Head of Application Development

Alex is Head of Application Development and a full-stack software developer at Rock Solid Knowledge. He works with Angular, SQLServer, and C# on a daily basis.

Music generation with AI Thu, 22 May 2025
Hackathon vibes: A day of AI, pizza and creativity Thu, 08 Jan 2026

Context engineering for software engineers

What is context engineering?