Back to posts
Mastering the AI Context Window: How to Stop Hallucinations

Mastering the AI Context Window: How to Stop Hallucinations

Matan Shaviro / January 2, 2026

Introduction

We have all been there. You start a session with an AI coding tool like Cursor or Composer, and the first few answers are amazing. But 30 messages later? The quality drops, the code breaks, and it seems to have forgotten the rules we set at the start.

The culprit is usually the Context Window.

In this post, we will explore how the context window actually works, why "truncation" kills your code quality, and the specific strategies you need to use to keep your AI acting like a Senior Engineer.

What is the Context Window?

Imagine an LLM as a giant black box with a fixed capacity for input, measured in tokens (e.g., 200,000 tokens).

When you chat with an LLM, it doesn't "remember" the conversation like a human does. Instead, every time you submit a prompt, the tool takes the entire history of everything you just discussed, appends your new prompt, and re-feeds it all into the model.

The Feedback Loop

This creates a loop where your input keeps growing.

  1. You send a prompt (Context size: Small).
  2. LLM replies.
  3. You reply again. The tool sends: [Old Prompt + Old Reply + New Prompt].
  4. The context window grows larger.

Eventually, you hit the limit. When you do, the system performs Truncation. It simply deletes the oldest part of the conversation to make room for the new stuff.

The Problem: Losing "Intellectual Control"

When truncation happens, the LLM loses the "setup" instructions you gave it at the start—things like your tech stack definitions, project structure, or documentation.

Furthermore, if you don't manage context manually, the LLM will try to guess what files are relevant by running tools like grep over your codebase. This often fills your limited window with garbage (irrelevant files, dist folders, package.json), confusing the model further.

Strategies for Better AI Context

Here is how to stay in the driver's seat and manage your tokens effectively.

Strategy 1: The "New Feature, New Chat" Rule

Stop treating the chat like a never-ending thread. As your context window grows, the model becomes slower and less accurate.

The Fix: Get into the habit of making a brand new chat window whenever you switch tasks.

  • Adding a new feature? New Chat.
  • Fixing an unrelated bug? New Chat.
  • Asking a general question? New Chat.

Strategy 2: Manual Context Selection

Instead of saying "Fix the bug" and letting the AI guess where the bug is, you must provide the map. You know your codebase best.

Let's look at an example. If we want to add a pal (power) command to a CLI tool, we shouldn't just ask for it. We should explicitly tag the relevant files.

Bad Prompt vs. Good Promptmarkdown
// ❌ Bad Prompt (Relies on AI guessing/grep)
"Create a new command called pal that calculates the power of a number."

// The AI might scan your entire repo, wasting tokens and grabbing // irrelevant
files like tests or config files.

// ✅ Good Prompt (Manual Context) "I am adding a feature to @commands.js.
Please create a new command called 'pal' that performs a power operation.
Reference @util.js for helper functions."

// You explicitly provided the files. The context is small, focused, and
accurate.

Strategy 3: Leverage Recency Bias

LLMs suffer from Recency Bias. They give significantly more weight to the information provided at the very end of the context window.

This means your prompt (the instruction) should always be the last thing in the window, with the context (files, docs) coming before it. Most tools handle this chronologically, but it's important to remember: if you dump a massive documentation file after your question, the model might get confused.

Strategy 4: Auto-Compacting vs. Intentional Reset

Tools like Cursor and Cloud Code often have an "Auto Compact" feature. When the limit is reached, they summarize the history to save space.

How Compacting Worksjson
{
"previous_state": "Full conversation history (200k tokens)",
"action": "Auto-Compact Triggered",
"new_state": "Summary of discussion: User added a CLI tool using Node.js. (500 tokens)"
}

While useful, a summary is never as good as the raw data. Don't rely on it for complex tasks. If you see the "Compacting" warning, take it as a sign to open a new tab and drag in only the files you need.

Conclusion

The secret to getting high-quality code from LLMs isn't just about writing better prompts—it's about managing what the LLM sees.

By keeping your context window clean, manually selecting relevant files, and resetting your chat frequently, you save tokens, save money, and most importantly, you stop the AI from hallucinating. You stay in the driver's seat.

Additional Resources

Related Posts