© 2009-2026 Rock Solid Knowledge
Monday, 02 March 2026
Alex Jones
8 minute read
Context engineering is the purposeful design of everything that goes into the context window of a Large Language Model (LLM). The aim of context engineering is to garner as much consistency out of code generation as possible. LLMs are inherently non-deterministic, meaning their output varies across runs. We need to make sure there is a higher level of determinism by providing LLMs with good context and following strong, defined processes.
If your LLM is a CPU, context is the RAM
Imagine you’re using an LLM (like Claude Sonnet, or GPT Codex) to generate some code. Your LLM is responsible for processing your request and giving you some output, but it only has so much capacity for memory, so you've got to put the right stuff in there. Just like with RAM, we don’t aim for 100% utilisation. In fact, the closer we drift towards 100% utilisation, the further we drift away from code quality.
LLMs are stateless functions; they rely purely on inputs to produce their outputs. The only other factor that affects the quality of the output is the model's training and tuning. Since we're not likely to be training the model ourselves any time soon, the only way we are going to improve what comes out of them is by improving what goes in them.
Context is everything you load into the RAM:
Below is a screenshot from VSCode CoPilot Chat showing the makeup of a context window. For the session, I've utilised 47.8k tokens of the 128k window, and it's made up of various pieces of context, including: System instructions, Tool definitions, Messages, Files and Tool results.

Context can fail in four main ways. These failures were described by Drew Breunig in a blog post titled How long contexts fail (2025). The blog post pulls information from a few sources to come up with categories of failure, which are summarised below:
Some of these failures were observed on models with 1m+ token context windows. Although having a bigger context window helps, really, the issue is context failure, not context window limitations.
Below is a chart showing the results from a study called NoLiMa: Long-Context Evaluation Beyond Literal Matching 2025):

| Model | Context Window (K) |
| GPT-4o | 128 |
| GPT-4o mini | 128 |
| Llama 3.3 70B | 128 |
| Llama 3.1 405B | 128 |
| Gemini 1.5 Pro | 2,000 |
| Gemini 1.5 Flash | 1,000 |
| Claude 3.5 Sonnet | 200 |
The study explains that LLMs are now built to handle very long documents, but many tests don’t really measure deep understanding. The tests typically performed are known as ‘needle in the haystack’ tests. A key piece of information (the needle) is inserted into a document (the haystack), and the LLM is tasked with retrieving it. In these cases, the LLM is performing relatively simple pattern matching.
This study addresses this by hiding important information in the context in ways that don’t match the wording of the question asked of the LLM, forcing models to think more abstractly.
When 13 popular models were tested, they performed well on short texts but became much less accurate as texts grew longer and clues became less obvious.
Approaching resolving these failures can be broken down into four ways:
Engineers are responsible for building the context: selecting files and tools, and organising instructions relevant to the task at hand. This means we contribute to context failures, and we must employ techniques for ‘focusing’ the agent. The main technique we're going to focus on is intentional compaction.
Breaking development into three parts helps the agent. It acts as intentional compaction, at points where you decide, helping to hone the input to your agent.
It also helps you!
Creating prompts for each phase will give you control over what is generated at each phase and provide consistency with both input and output. For example, if you’re trying to fix a bug, you might break it down like so:
Prompting an agent to research and build a document for you to review keeps the agent focused. After reviewing the output, you can carry on the conversation with the agent to refine or expand any research. Once the document has been completed, you should be able to delete the short-term memory of the agent, i.e. clear the conversation history, or start a new session.


The research now effectively represents the problem you're looking to solve, including distillations of the conversation that you had with the agent. You should, at this point, have enough information in the research document to develop a plan.
Example Research Output:
## Research Question
A customer reports that when registering a custom `IScimValidator<User>` via the 3-parameter `AddResource<User, UserStore, ScimUserValidator>()` overload, the application throws at startup
Now that there is a research document, a new agent can come along and create a plan of action. The same principles apply here: conversationally create a document whose final output is a distillation of everything that needs to be covered during implementation.
This plan output can follow a structure that you define in your prompts. It's advisable to create plans that are broken up into phases. This will help you clearly understand what will be produced. You'll be able to commit code at the end of a phase, test a specific phase, and pick up where you left off at another time.
Example ‘Plan’ output:
## Overview
Fix two dependency injection registration bugs that cause runtime failures when users call `AddResource` with 3 type parameters or use `AddExtensionValidator`:
A note on the 'Plan' agent in VSCode or the like
This is just a prompt backing an agent. You and your organisation should, ideally, own this prompt. This gives you the finest control over how the agent should behave and avoids unexpected behaviour caused by an update. A sample of the prompt VSCode uses can be found on our GitHub.
In a standard chat, each question and piece of research consumes valuable space in the LLM's context window, potentially leading to context failure. Subagents solve this by starting with a fresh context for every task. When a subagent finishes, its temporary context is discarded. The main agent receives only the distilled outcome.
Define your task in detail (set standards, behaviour patterns) and specifically request that your main agents use subagents. The main agent will delegate the task to the required subagent instances, passing in any necessary context. The subagent will do the work in its own context window and return a summary of its outcome to the main agent. At this point, the subagent's context has been discarded, whilst the main agent has all the context it needs to continue its next task.
It's worth noting that subagents aren't for creating specific types of agents, e.g., a 'UI agent' or a 'Backend agent,' but for executing specific, possibly context-intensive tasks. Examples include using a subagent to find existing testing patterns in a solution, or to research documentation on the internet.
The IDE you use determines the tools available to your agent. VS Code offers three kinds of tools in chat:
Agent Skills are self-contained skill packages that include instructions and bundled resources to help AI agents perform specialised tasks. Unlike simple instruction files, skills can include supporting assets such as scripts, code samples, or reference data, making them ideal for complex or repeatable engineering workflows. Skills also support progressive disclosure, ensuring they are loaded only for relevant tasks rather than unnecessarily occupying context.
A great starting point for incorporating skills into your workflow, or writing your own, is the Awesome CoPilot GitHub.
Since the tool set is sent to the LLM in the context window, tool selection is important. Putting too many tools into the context window can flood the LLM with tokens, and can lead to confusion, where the LLM has so many tools to choose from that it picks the wrong one. Prompt files can be annotated with the tools that it requires use of, helping to mitigate issues of flooding the context window.
Due to the nature of progressive disclosure, the same overloading is not true for skills. Skills are loaded in a just-in-time manner, helping to reduce context window utilisation.
As software engineers, the real advantage of context engineering comes from treating an LLM’s context window as something you intentionally design rather than a dump of information. Throughout this article, we explored how the quality of code generation depends on what you choose to include in context, how failures like poisoning, distraction, confusion, and clash emerge when context is noisy, and why larger windows do not solve these issues on their own.
The Research, Plan, Implement workflow provides a structured approach to intentionally compressing and refining context at each stage, giving the model only the minimum necessary information to succeed. Subagents help isolate complex tasks so that each one works with a clean slate, keeping the main agent focused while avoiding token bloat.
For engineers building with or alongside AI, simply put, you get better results when you treat context as an architectural responsibility, deliberately shaping the inputs that guide your tools rather than letting them accumulate unchecked.
Last updated: Friday, 06 March 2026
Alex is Head of Application Development and a full-stack software developer at Rock Solid Knowledge. He works with Angular, SQLServer, and C# on a daily basis.
We're proud to be a Certified B Corporation, meeting the highest standards of social and environmental impact.