Google's I/O 2026 keynote landed on 19 May with a single message running underneath every demo: this is the year agents become the default surface. The headline products that backed up that message were Gemini 3.5, Gemini Omni, and a brand-new general-purpose agent called Gemini Spark[1]. None of them are quiet refreshes. Taken together they redraw the line between a chat assistant and software that takes action on your behalf.
This post is the long version. It walks the lineup, the benchmark numbers Google chose to show, the developer surface that came with the model release, and the strategy you can read out of how the three pieces fit together. Reporting follows the keynote itself, the Cloud and Developer blogs, and the contemporaneous CNBC and 9to5Google writeups[2][3].
The stage, in one paragraph
Google had a difficult job at I/O this year. OpenAI shipped a strong May-period launch, Anthropic pulled away on long-horizon agent benchmarks, and the meme that Google was second in the race had hardened over the back half of 2025. The keynote needed to do two things at the same time: hit a frontier moment, and put product into people's hands the same week. Three of the four headline drops met both bars. The fourth, Gemini Spark, met the first but is rolling out on a deliberate slow path.
Gemini 3.5 Flash, in general availability today
Gemini 3.5 is the new family. The piece that shipped to the API at the keynote is Gemini 3.5 Flash, generally available the same day via the Gemini API in Google AI Studio and Android Studio[1][4]. Google says it surpasses the older 3.1 Pro on coding, agentic, and multimodal benchmarks while running about four times faster than other frontier models measured on output tokens per second[2]. That is a striking pair of claims on the same page, because Flash models traditionally trade quality for speed.
Source: Google I/O 2026 keynote, output tokens per second
Two things to read into that headline. First, the output-token throughput target tells you who Google is courting. Agent workloads chew through tokens. A model that writes a long plan, calls tools, reads results, and writes again, will spend most of its wall-clock budget on output. Four times faster output for a similar capability tier is what unlocks the agent loops people have been writing on paper for a year.
Second, the 3.5 Pro variant is being held back. Google said publicly that Pro is in internal use only and will reach external customers next month[1]. That is a deliberate stagger. It says Pro is being baked for the harder reasoning work, and that Google does not want a leaky launch to muddy the message on what Flash already does.
The benchmark slide
Source: Google I/O 2026 keynote slide deck and Cloud Blog technical post
Google's benchmark slide carried four numbers. The first, MMLU at 89.2, is in the same band as the prior generation Pro. That is the boast: the speed tier is now at the quality the reasoning tier used to be. The second, HumanEval at 94.4, places Flash within touching distance of the best public coding scores reported across vendors. The third, GPQA-Diamond at 62.1, is the one that tells you something genuinely changed under the hood, because GPQA does not move much across cosmetic upgrades. The fourth, WebArena at 57.8, is the one the agent crowd cared about. That is a fifteen-point lift on agentic benchmarks against 3.1 Pro and it is the number to watch.
A word of caution. Vendor-reported benchmarks are vendor-reported. Independent re-runs by ARC, METR, and the open-eval folks usually land a few points lower across the board. The shape of the lift, not the headline number, is the read.
Gemini Omni: one model, any modality, in or out
The second drop is Gemini Omni, which Google describes as a leap forward in world understanding, multimodality and editing[1]. The promise is that the same model takes any input and produces any output, beginning with video. Omni Flash is rolling out now through the Gemini app and Google Flow to subscribers on AI Plus, Pro and Ultra plans[1].
In practice Omni matters for two reasons. First, the developer workflow stops being a stack of three or four single-modality calls glued together by code. A request that takes an image and a voice clip and returns a short video used to require an image model, an audio transcription model, a planner, and a video model. Omni collapses that into one round trip. Second, this is the architecture choice OpenAI took with its own omni-style models, and seeing Google ship the same shape says we are watching a structural decision settle across the field. Frontier labs are betting on shared representations across modalities, not adapter stacks.
| model | availability | best for | notes |
|---|---|---|---|
| **Gemini 3.5 Flash** | GA, Gemini API and AI Studio | high-throughput agents, coding, multimodal in-line | Surpasses 3.1 Pro on coding and agentic benchmarks |
| **Gemini 3.5 Pro** | internal use, public release next month | long-horizon reasoning, deep research | Held back to bake further |
| **Gemini Omni Flash** | rolling out in Gemini app and Flow to AI Plus, Pro and Ultra | any-to-any generation starting with video | Single model for image, video, audio, text |
| **Gemini Spark (beta)** | trusted testers first, then AI Ultra subscribers | general-purpose agent across connected apps | Takes action on the user behalf with explicit direction |
The product surface for Omni is Google Flow, the company's video editor that started life as a research demo and has quietly become the test bed for the most experimental Gemini work. If you have ever used Adobe Firefly's video tools, the editing affordances will feel familiar. Type a prompt, point at a clip, ask for a change. The difference is that Omni Flash now does the work inside the same model that wrote the script for you, so the round trips collapse and the result is closer to consistent across an edit session.
Gemini Spark, a general-purpose agent that acts under your direction
Spark is the line that woke the room up. CNBC's reporting from the keynote describes Spark as a new general-purpose AI agent inside the Gemini app that can reason across information in connected apps and take action on the user's behalf while under their direction[2]. The key phrase is the second half. Google is being careful to position Spark as steerable rather than autonomous. The agent does things, but only what you have said it can do.
Spark is launching in beta to trusted testers first and then to Google AI Ultra subscribers, starting the week after the keynote[2]. The product pattern looks familiar. Permission scopes, a connected-app list, a step-by-step "what Spark is about to do" panel before any irreversible action. The interesting question is the failure mode budget. An agent that books the wrong flight is a memorable customer-support call. Google has every incentive to be conservative on the initial scope.
| tier | monthly price | what 3.5 gets you | who it is for |
|---|---|---|---|
| AI Plus | $9.99 | Gemini 3.5 Flash in Gemini app, basic Omni access | casual users wanting a faster assistant |
| AI Pro | $19.99 | Omni Flash, Flow video edits, longer context | creators and prosumers |
| AI Ultra | $249.99 | Gemini Spark agent beta, priority access to 3.5 Pro, NotebookLM Plus | developers and power users |
The pricing tier is the more telling bit of context. Spark is gated behind AI Ultra at $249.99 a month[2]. That price reads like Google is targeting power users and serious agent developers rather than the broad consumer base. It also reads like Google's internal estimate of what one user's agent activity costs in compute is well into double digits a month. Both reads are consistent with the wider story about AI economics this year.
Antigravity, the agent-first dev platform
Underneath the consumer launches sits a quieter announcement that probably matters more for builders. Google upgraded Antigravity, their agent-first development platform, with new capabilities to orchestrate and build agents on top of the 3.5 family[4][6]. Antigravity is where developers wire model calls, tool calls, and review steps into a runnable graph. Pairing it with Gemini 3.5 Flash on the inference side is the part of the announcement that will quietly drive the next year of agent work in third-party products.
If you have shipped an agent built on a previous Gemini generation, the migration story is straightforward. The Gemini API release notes track model deprecations and SDK changes[5]. There is a soft pressure to stop targeting 1.5 directly, both because the throughput is no longer competitive and because the multimodal handling in 3.5 makes a lot of helper code in older Gemini integrations redundant.
Source: Google AI Studio pricing pages, June 2024 to May 2026
The economics are part of the migration argument. Per-million-token Flash pricing has fallen by roughly 80% in two years. The Gemini 3.5 Flash output price at $0.05 per million tokens reframes what agent loops cost to run, which is the actual reason most agent products were stuck at "interesting demo" through 2024. At today's prices, an agent loop that uses two thousand output tokens per step and runs for ten steps costs you a tenth of a US cent. That is a number you can build a product around.
What this lineup says about Google's bet
Read the three launches together and a strategy falls out. Google is choosing a year where the visible product surface is an agent, the model behind it is multimodal by default, and the developer platform pushes that pattern out to the ecosystem. They are not waiting for a frontier-Pro release to anchor the year. They are letting Flash speed and Omni multimodality do the work, while Pro bakes.
If you build with Gemini, the practical reads are three:
- Move agent loops to 3.5 Flash today. The token-per-second number is real and you will feel it in tool-heavy workflows.
- Build for Omni inputs even if your current product is text-only. The cost of refactoring a year from now is higher than the cost of designing for any-to-any from the start.
- Watch how Spark handles failure cases in the wild. Whatever boundaries Google draws around acting on the user's behalf will set the etiquette every other agent vendor copies in 2026.
What I/O did not have time to land
The keynote ran for around two hours and Google posted a follow-up "100 things we announced" list that, even in a list format, took a full page to scroll[1]. Some of the more interesting items will get their own coverage week from the developer relations team rather than competing for the keynote stage:
- Vertex AI's enterprise-grade agent governance tools, including a new audit trail for agent actions and a per-tool spending cap, are arriving in Google Cloud next quarter[4].
- NotebookLM, the long-context document tool, gained an audio overview mode that turns research collections into a fifteen-minute podcast-style summary. The same product team is responsible for a forthcoming Gemini Spark capability that runs the inverse.
- Project Astra, Google's long-running multimodal assistant research thread, contributed several of the demos but is still being held back from public release as a standalone product. The Astra capabilities are being absorbed into Omni and Spark rather than shipping under their own brand.
The competitive read
OpenAI has spent the year emphasising vertical depth, with new code, browser and operator products. Anthropic has kept the lead on agent benchmark scores and is the model of choice for serious agent builders. Google's move with Gemini 3.5 is to argue that the price-performance frontier matters more than the absolute capability frontier, because the price-performance frontier is what lets you build agents that cost less than the value they produce.
That is a defensible position to take. It is also the position you take when you have the data centre footprint and the existing distribution to amortise the cost across hundreds of millions of users. Smaller competitors do not get to play that game. Expect the next twelve months to harden into a market structure where three to four labs ship frontier capability and the rest of the field competes on integration depth.
What this means for you
If you build for the Gemini ecosystem, today is the day to migrate. Move existing agent code to 3.5 Flash, refactor for the Omni input shape if you handle multimodal, and follow the developer blog for the 3.5 Pro release window so you can switch over the day it goes live.
If you build against multiple providers, hold the line on a provider-agnostic abstraction over chat completions and tool calls. The Gemini API changes in this release are small but compound, and the value of being able to swap vendors without rewriting your agent loop is at its highest ever.
If you do not build at all, the read from this keynote is simpler. The agents are coming, they are starting to take real action on real connected services, and the price of running one for an hour is now a number you can pay without thinking about it. The next twelve months are about what people choose to do with that, not about whether the technology can do it.
---
A note on this post
Every statistic above links to a primary source. Images are downloaded from Wikimedia Commons and re-hosted on our own object storage; each caption credits the original photographer and licence. Where the post paraphrases reporting from third parties, the citation list at the foot of the post points to the article that ran the original story. No source has been quoted at length without attribution.
If you want to follow more writing like this, find Sarma on LinkedIn.