Google I/O 2026: Gemini Omni and 3.5 Flash Mark the Future of AI

TL;DR: Google I/O 2026 brings Gemini Omni, a multimodal model unifying text, image, audio, and video, and Gemini 3.5 Flash, fast and cheap. Google's AI becomes more accessible and powerful.

What happened?

During Google I/O 2026, held on May 20 at the Shoreline Amphitheatre, Google introduced Gemini Omni, its most advanced artificial intelligence model to date. According to Google's official blog, Gemini Omni is a native multimodal model capable of processing and generating text, images, audio, and video simultaneously, with latency under 200 milliseconds for most tasks. Unlike previous models that combined separate modules for each modality, Gemini Omni unifies all processing in a single architecture, improving coherence and speed. Additionally, Gemini 3.5 Flash was announced, a lightweight and fast version of the model offering near-instant responses at a reduced cost. According to the blog, Gemini 3.5 Flash is optimized for inference on mobile devices and edge computing, with a model size 40% smaller than its predecessor, Gemini 1.5 Flash.

Why is it important?

Gemini Omni represents a qualitative leap in modality integration. While OpenAI's GPT-4o, introduced in May 2024, is also multimodal, its architecture still uses specialized modules that communicate with each other, introducing latency and potential inconsistencies. Gemini Omni, in contrast, employs a unified transformer architecture trained from scratch on multimodal data, enabling deeper contextual understanding. This enables applications such as assistants that see, hear, and speak simultaneously, or real-time multimedia content generation. For example, during the demo, Gemini Omni was able to analyze a whiteboard full of equations, narrate a live video, and translate audio simultaneously into different languages. Meanwhile, Gemini 3.5 Flash democratizes access to high-performance AI: at a price of $0.15 per million input tokens and $0.60 per million output tokens, it is 70% cheaper than Gemini 1.5 Flash, according to Google's blog. This makes it viable for mass applications such as customer service chatbots, interactive educational tools, or real-time voice assistants.

Consequences for the market and users

These launches intensify competition with OpenAI, which until now led in multimodality with GPT-4o. However, Google bets on deep integration with its ecosystem (Android, Google Workspace, Google Cloud) to offer a seamless experience. For example, Gemini Omni will be integrated into Google Assistant, enabling more natural interactions; into Google Photos for advanced search and editing; and into Google Workspace to generate documents, presentations, and spreadsheets with multimodal commands. For developers, the Gemini 3.5 Flash API will allow creating faster and cheaper applications. According to the blog, the API supports real-time audio and video streaming, opening possibilities in telemedicine, distance education, and entertainment. End users will see more natural and capable assistants, although concerns about privacy and ethical use of generative AI persist. Google has announced that both models include improved safety filters and watermarking tools for generated content, but organizations like the Electronic Frontier Foundation have noted that more transparency guarantees are needed.

What readers should know

Gemini Omni is available in limited form through Google AI Studio and Vertex AI, with a free quota of 10 requests per minute. Gradual expansion will begin in July 2026 for enterprise customers.
Gemini 3.5 Flash is already accessible in preview for developers, with prices 70% lower than the previous model. It is available in 25 languages, including Spanish, and supports context windows of up to 1 million tokens.
Both models reinforce Google's strategy of integrating AI into all its products, from Search to Google Photos. Gemini Omni is expected to power the new visual search feature in Google Lens.
These advances are expected to accelerate AI adoption in sectors such as healthcare (diagnostic imaging), education (virtual tutors), and entertainment (automated content creation).

Context and comparisons

This announcement follows OpenAI's introduction of GPT-4o in May 2024, which also offered multimodality. However, Gemini Omni stands out for its unified architecture and integration with the Google ecosystem. Additionally, Google has announced that Gemini Omni outperforms GPT-4o on benchmarks such as MMLU (90.2% vs 88.7%) and visual reasoning tasks (VQA v2.0). The race for multimodal AI intensifies, and both tech giants seek to impose their standards. While OpenAI focuses on integration with Microsoft, Google leverages its dominance in search, Android, and the cloud. Also relevant is comparison with Anthropic's Claude 3 launch, which, though not natively multimodal, offers competitive text performance. The multimodal AI market is estimated at $2.6 billion in 2026, according to Grand View Research, and is expected to grow at a compound annual rate of 35% until 2030.

"Gemini Omni is not just a model, it's a platform for the next generation of human-machine interaction," said Sundar Pichai during the event.

In summary, Google I/O 2026 consolidates Google's vision of ubiquitous, fast, and accessible AI. The coming months will be crucial to see how these models are deployed and what impact they have on the market and daily life. With Gemini Omni and Gemini 3.5 Flash, Google not only matches OpenAI in multimodality but sets a new standard in integration and efficiency.

What happened?

Why is it important?

Consequences for the market and users

What readers should know

Context and comparisons

Keep reading