AWS quintuples AgentCore quotas to scale AI agents

TL;DR: AWS has quintupled default AgentCore quotas, eliminating the need to request increases to scale AI agents in production. New limits allow up to 5,000 concurrent sessions and 200 tokens per second, easing the transition from pilots to enterprise deployments.

What happened?

AWS has announced a significant increase in execution quotas for Amazon Bedrock AgentCore, its service for orchestrating AI agents. The changes include:

Concurrent active sessions: from 1,000 to 5,000 in US East (N. Virginia) and US West (Oregon), and from 500 to 2,500 in other regions.
Tokens per second per agent: from 25 to 200 tokens/second in all regions.
Session creation rate for container deployments: from 100 to 400 sessions per minute.

These new default quotas eliminate the need to request quota increases, a process that analysts say could take days or weeks and stall production deployments. AWS documented the change in its release notes without prior notice or transition period, reflecting market urgency.

Why is this important?

AWS's move responds to a clear trend: companies are moving from experimenting with AI agents to deploying them in production for multiple users. According to Charlie Dai, principal analyst at Forrester, “the biggest change is not the number of agents, but the shift from single-task copilots to production agents serving larger user populations.” This implies higher concurrency, long-running agents, and more complex orchestration patterns that exceeded previous default quotas.

Ashish Banerjee, senior analyst at Gartner, notes that the new quotas reduce operational friction when scaling AI agents from pilots to production. Meanwhile, Amit Chandak, director of analytics at Kanerika, explains that “a quota increase request in an enterprise environment means a support ticket, a business justification, and a review cycle. It's days or weeks of overhead on something that shouldn't block a deployment.”

Historically, AWS has been cautious with default quotas to prevent abuse and ensure service stability. However, competitive pressure from Google Cloud and Microsoft Azure, which had already adjusted similar quotas in their AI services, has forced AWS to be more aggressive. This 5x increase is one of the most pronounced in recent cloud history, comparable to the Lambda quota increase in 2023 to support long-running functions.

Consequences and context

The quota increase has direct implications for system architecture: teams no longer need to design around restrictive limits, encouraging adoption of multi-agent patterns and integration with enterprise systems. However, AWS warns that greater capacity will translate into higher consumption of underlying resources (compute and runtime), potentially increasing operational costs. Companies must closely monitor token and session usage to avoid unexpected bills, as AWS charges by usage, not quota.

This announcement adds to the trend of hyperscalers removing barriers to generative AI adoption. Google Cloud and Microsoft Azure have also adjusted similar quotas in their AI services, but AWS's 5x increase is one of the most aggressive in the market. Specifically, Google Cloud increased Vertex AI Agent Builder quotas by 3x in early 2024, while Azure AI Agent Service allowed up to 3,000 concurrent sessions by default. AWS now surpasses those figures, potentially pressuring competitors to respond.

From a market perspective, this change reduces friction for startups and mid-sized companies that lack commercial relationships with AWS to expedite quota increases. According to Gartner data, 60% of companies experimenting with AI agents cite quota limits as an obstacle to moving to production. With these new quotas, an acceleration in AI agent adoption is expected in sectors such as customer service, process automation, and data analysis.

What should readers know?

The new limits are active from the announcement and require no additional configuration.
Higher quotas apply only to Amazon Bedrock AgentCore; other AWS services may have different limits.
Companies that have already requested custom quota increases will not be negatively affected; the new default values are a floor, not a ceiling.
It is recommended to monitor consumption to avoid billing surprises, as AWS charges by usage, not quota.
For extreme use cases (seasonal spikes, events), it is still possible to request additional increases, but the baseline is now much higher.

“Higher quotas change what teams are willing to try without triggering an exception process, and that shapes architectural decisions, not just cost.” — Amit Chandak, Kanerika

In summary, AWS has removed a key bottleneck for mass adoption of AI agents. Companies should leverage this capacity to innovate, but also manage associated costs. The next move from hyperscalers will likely be in runtime cost optimization, not just quotas.

AWS quintuples AgentCore quotas to scale enterprise AI agents

What happened?

Why is this important?

Consequences and context

What should readers know?

Keep reading