Gemini Limits Get More Flexible, Google Stops Heavy Prompts From Burning Through Quota Fast

Google is easing up on Gemini’s usage limits after a wave of complaints from users who felt their access was being drained too quickly. The biggest change is simple but important: one heavy prompt should no longer burn through a large share of a user’s quota.

That shift matters because the newer quota system was widely seen as too easy to exhaust. After the changes that followed I/O 2026, Gemini moved away from a prompt-count model and toward five-hour and weekly quotas, but the system still left many users feeling limited far too fast.

Heavier tasks are being treated more carefully

The main pressure came from users running complex jobs, especially with Gemini 3.1 Pro. Large prompts and big files require much more computation than ordinary text requests, and those heavier workloads were consuming quota at a pace many users considered unreasonable.

Google is now saying that a single prompt will no longer be allowed to take such a large slice of the allowance. The company is also limiting how much quota any one prompt can consume, which should help usage last longer across more sessions.

Another major change affects failed attempts. If Gemini returns an error, that interaction will no longer count against the quota, and only successful completions will reduce the limit. For users working with demanding prompts, that removes one of the most frustrating parts of the old setup.

A bug and video generation also drew attention

Josh Woodward, Vice President of Google AI Studio, Gemini, and Google Labs at Google, said the company has also fixed an Omni bug on X that was causing some users’ Gemini quotas to be consumed far more than they should have been. The issue was especially noticeable in video generation.

According to the update, a single video-generation prompt had been capable of using 100% of the five-hour limit. Google is now trying to prevent that kind of quota drain from happening again.

More visibility into what actually uses the quota

Google is also working on more detailed usage breakdowns. That matters because features such as Deep Research and other heavy tasks use far more tokens and computation than a standard text prompt.

With more granular usage data and clearer notifications, users should have a better view of what is consuming the five-hour quota and the weekly limit. That added transparency comes after many users complained that their allowance disappeared without enough explanation.

Flash-Lite gets a different treatment

One of the more notable adjustments concerns Gemini 3.1 Flash-Lite. Josh Woodward said prompts run on that model will not count toward the usage limit.

That gives users a lighter fallback when their access to other models is constrained or until quotas reset. It also creates more room for work that does not require a heavier model.

The app should feel more consistent day to day

The Gemini app is also getting small but practical changes. After a user selects a specific Gemini model, the app will now remember that choice and make it the default for the next session.

Model switching will still happen once a usage limit is reached, at which point Gemini moves to a lighter model. The difference is that the app should now feel less random and more predictable in everyday use.

Google’s latest adjustments are not the first since the company moved to compute-based quotas. After the new system launched, the company has already doubled the usage limits twice, bringing the total increase to three times across each adjustment round.

That suggests Google is still tuning the balance between AI workload and user comfort. For active Gemini users, the changes go beyond a simple quota increase and now include how quota is deducted, how errors are handled, how much a single prompt can consume, how Flash-Lite is treated, and how usage is surfaced for heavier tasks.

Source: www.androidpolice.com
Exit mobile version