GitHub Copilot optimizes tokens and routes models automatically
New improvements in prompt caching and intelligent model selection promise longer and more efficient sessions in VS Code
June 23, 2026 · 4 min read
TL;DR: GitHub Copilot introduces prompt caching and tool search to reduce tokens, and an Auto system that chooses the optimal model based on the task. This enables longer and more complex sessions at lower cost.
GitHub has announced significant improvements in context handling and model routing for GitHub Copilot, especially in its integration with VS Code. These optimizations, detailed in the official GitHub blog, focus on two technical innovations: prompt caching and tool search (lazy loading of tool definitions). Additionally, the Auto system is expanded to automatically select the most appropriate language model based on the task and real-time model health. This move aims to make token usage more efficient in long and complex sessions where Copilot acts as an autonomous agent.
To understand the context, recall that GitHub Copilot, launched in 2021 as an autocomplete assistant, has evolved into an agent capable of planning, editing, debugging, and reviewing code. However, as sessions lengthen and more tools are integrated (such as MCP, terminal, file operations, workspace search), token consumption skyrocketed. Until now, each interaction sent the full context to the model: history, tools, instructions. This was costly and limited session duration. With prompt caching, the model state is reused for repeated prefixes, reducing required computation. With tool search, tool definitions are only loaded when the model needs them, avoiding sending full schemas every turn. This enables longer and more complex sessions without spiking token usage.
The Auto system solves a practical dilemma: which model to use for each request? According to GitHub, no single model is best for all tasks. Auto combines the task intent (quick explanation, targeted edit, multi-file change) with the current model health to choose the most efficient model that can achieve the same result. If the task requires deep reasoning, it routes to more powerful models; otherwise, it uses lightweight models. The goal is not to sacrifice quality for cost, but to use the model that best fits the job. This is particularly relevant in a market where models like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro coexist, each with different strengths. The expansion of Auto to other Copilot surfaces (such as GitHub Mobile or the web) suggests GitHub aims for a unified and optimized experience.
These improvements have direct implications for developers and companies. First, cost reduction is notable: by caching and deferring information, the number of tokens processed per session decreases, making Copilot cheaper, especially in environments with many developers. For example, a company with 500 developers using Copilot Business (at $19/month per user) could see an improved cost-benefit ratio by enabling more productive sessions without increasing token cost. Second, greater autonomy brings Copilot closer to an autonomous programming agent, capable of handling complex tasks without constant intervention. This is crucial for workflows like debugging legacy code or refactoring large projects, which previously required multiple interactions. Third, better user experience eliminates the friction of manually selecting the model; the system does it for them, optimizing speed and quality. Finally, the impact on competitors like Cursor, Amazon CodeWhisperer, or Tabnine is significant. These tools will need to respond with similar innovations in token efficiency and intelligent routing to keep up. Cursor, for instance, already offers an agent system but lacks such sophisticated routing; CodeWhisperer integrates with AWS but does not have a comparable Auto system.
From a market perspective, this evolution consolidates Copilot as an AI-assisted development platform, rather than just an autocomplete tool. The ability to manage long and complex sessions with efficient token usage marks a before and after in enterprise adoption of code assistants. According to GitHub data, Copilot is already used by over 1.3 million developers and 50,000 companies. With these improvements, adoption is expected to accelerate, especially among companies that hesitated due to cost or limitations in long sessions. Moreover, the expansion of Auto to other surfaces suggests GitHub plans to integrate these capabilities across its entire ecosystem, including GitHub Actions and Codespaces.
It is important to note that these improvements are already available in GitHub Copilot for VS Code. The Auto system is being expanded to other Copilot surfaces. This is not a new language model, but an optimization in the orchestration layer (harness) that maximizes every token. For a more detailed technical analysis, GitHub refers to a VS Code article explaining the implementation of prompt caching, cache checkpoints, and provider-specific tool search. At TheVortiq, we believe this evolution is a key step toward more autonomous and efficient code assistants, and we recommend developers try these features to evaluate their impact on productivity.