TheVortiq
Inteligencia Artificial

The End of the Waste Era: Companies Ration AI Tokens

Faced with excessive use of language models for trivial tasks, companies like Salesforce and JPMorgan impose token caps to contain costs.

June 27, 2026 · 4 min read

a computer circuit board with a brain on it

TL;DR: Companies are rationing AI tokens because employees used them for trivial tasks, exhausting budgets. This marks the end of unlimited access and forces optimization of language model usage.

What Happened?

According to a TechCrunch report, numerous companies are adopting measures to limit employees' AI token consumption after detecting excessive use for minor tasks such as summarizing emails or generating memes. The phenomenon, dubbed tokenmaxxing, has led to AI budgets being quickly depleted, forcing organizations to ration access to large language models (LLMs). This pattern is not new: during the dot-com boom, companies like Enron and WorldCom also saw excessive use of technological resources (bandwidth, storage) lead to rationing. However, the key difference is that the cost of AI tokens is variable and adds to existing infrastructure, exacerbating the problem. TechCrunch reports that companies like Salesforce, JPMorgan, and Microsoft have already implemented caps. Salesforce, for instance, set a monthly limit of 100,000 tokens per user for its Einstein GPT assistant, according to internal sources. JPMorgan restricted ChatGPT usage to tasks pre-approved by supervisors, and Microsoft adjusted token limits on Azure OpenAI Service, reducing the maximum from 8,000 to 4,000 tokens per request in some plans.

Why Is This Important?

This shift reflects the maturation of the enterprise AI market. During the initial adoption phase, many companies offered unlimited access to tools like ChatGPT or internal assistants, underestimating the real cost per token. Now, with pay-per-use models and expensive APIs, companies seek to maximize return on investment. Rationing not only affects productivity but also redefines how AI value is assessed in the workplace. According to Gartner data, global enterprise AI spending will reach $150 billion by 2025, but up to 30% of that spending is wasted on non-productive uses. Tokenmaxxing is a symptom of a lack of clear usage policies. Compared to cloud adoption in the 2010s, where 'shadow IT' led to similar controls, we now see 'shadow AI' requiring governance. Moreover, rationing could stifle innovation: a McKinsey study suggests that 60% of employees using AI for creative tasks report performance improvements, but with strict limits, that benefit could disappear.

Consequences and Recommendations

Experts anticipate that this trend will drive the development of more specialized and efficient AI tools, as well as clearer usage policies. For readers, it is key to understand that unlimited AI access was a transitional phase. Companies will need to invest in training so employees can distinguish between tasks that truly require an LLM and those that can be solved with traditional methods. Additionally, a boom in token budgeting and usage monitoring solutions is expected. Startups like Tokeet and BudgetAI already offer dashboards showing real-time consumption and alerting when approaching limits. At the market level, this could fragment the ecosystem: generalist LLMs (GPT-4, Claude) will lose ground to smaller, specialized models (like those from Hugging Face or fine-tuned open-source models). Companies like Salesforce are already developing more efficient models for specific tasks. It is also expected that companies will renegotiate contracts with cloud providers, demanding volume discounts or flat rates.

“Tokenmaxxing was brief. Now we enter the era of token rationing,” notes TechCrunch in its analysis.

Affected Companies

  • Salesforce: implemented monthly per-user caps (100,000 tokens) for Einstein GPT and is developing a lighter internal model for CRM tasks.
  • JPMorgan: restricted ChatGPT usage to approved tasks and created a use-case review committee to approve requests.
  • Microsoft: adjusted token limits on Azure OpenAI Service, reducing the maximum from 8,000 to 4,000 tokens per request in some plans, and launched a monitoring tool called AI Usage Dashboard.
  • Amazon: is testing a token allocation system by department in its internal assistant CodeWhisperer, according to Reuters sources.
  • Google: has limited Bard usage for employees on non-work-related projects and is experimenting with smaller models like PaLM 2 Lite.

In conclusion, token rationing is a sign that generative AI is moving from experimentation to real integration into workflows. Companies that manage to balance cost control with productivity will gain competitive advantages. For individual users, the recommendation is to prioritize AI use for high-value tasks and learn to identify when an LLM is truly necessary. The market for token monitoring and optimization tools will grow rapidly, and we will see a consolidation of providers offering more efficient models. Tokenmaxxing was a brief excess; maturity brings discipline.

Keep reading