Google’s newest coding-focused model is facing an awkward comparison. Gemini 3.5 Flash has landed behind older rivals on Android Bench while also emerging as the most expensive option in the lineup.
The result stands out because the Flash label has long suggested speed and lower cost. In this case, the latest model did not break into the top five on the Android development leaderboard.
Performance that does not match the price
Android Bench measures how well AI models handle Android development tasks. In the latest ranking, OpenAI GPT 5.5 took first place with a score of 74, setting the benchmark for the group.
GPT 5.4 and Gemini 3.1 Pro Preview followed closely behind, each scoring 72.4. Claude Opus also ranked above Gemini 3.5 Flash, adding to the pressure on Google’s newest release.
Gemini 3.5 Flash finished in sixth place with a score of 63.7. That gap placed it well behind the leading tier, even though it is being positioned by Google as a premium new model.
The pricing data is even harder for Google to justify. According to benchmark data cited by Google, Gemini 3.5 Flash used an average of 355.9 total tokens per run.
That translated into an average cost of $147.1 per run, making it the most expensive model in the ranking. For developers weighing price against output, the numbers point in an uncomfortable direction.
Google’s claims run into a different reality
Google introduced Gemini 3.5 Flash at Google I/O 2026 and described it as the most powerful Flash model it had ever built. The company also said it delivered stronger coding performance and better support for AI agents and complex workflows.
In the same presentation, Google said Gemini 3.5 Flash outperformed Gemini 3.1 Pro on several internal benchmarks. It also claimed the model could produce output up to four times faster than competing frontier models.
The Android Bench result does not fully match that picture. On this specific Android development test, the newer model did not show the kind of advantage that would normally be expected from a premium release.
That contrast is not unusual in the AI industry, where internal benchmarks and task-based public evaluations often tell different stories. But when the benchmark is focused on Android and comes from Google’s own ecosystem, the result draws even more attention.
The older model looks more practical
One of the clearest comparisons is Gemini 3.1 Pro Preview. The older Google model not only scored higher, but 9to5Google said it also cost roughly one-third as much as Gemini 3.5 Flash.
For Android developers, that makes the older option look more efficient on both performance and budget. The newer model’s position is harder to defend when a previous generation delivers a better balance of quality and cost.
The rankings also add another layer of pressure from competitors. GPT 5.5 leads the board, GPT 5.4 matches Gemini 3.1 Pro Preview, and Claude Opus sits ahead of Gemini 3.5 Flash as well.
What the leaderboard means for Android developers
For teams building Android apps, the benchmark may influence which model gets adopted in day-to-day work. A model that costs more but performs worse can be a difficult sell until clear improvements arrive.
At the same time, the result does not mean Gemini 3.5 Flash is weak across every task. It only shows that, on the Android development workload measured by Android Bench, the model has not yet met expectations.
Google may still improve the model through later updates, and attention is now shifting toward Gemini 3.5 Pro. For the moment, however, the clearest takeaway is simple: the latest Flash model is neither the fastest choice on this leaderboard nor the cheapest.
