The tech giant released the first model, Gemini 2.0 Flash, to global developers on December 11 through the Gemini API in Google AI Studio and Vertex AI. Users can expect Gemini 2.0 to impact Google search and AI reviews, with limited testing starting next week. A public rollout is scheduled for early 2025.
With Gemini 2.0, developers can access multimodal input and text output, while Early Access partners can test text-to-speech and spatial image generation. The Gemini app will be updated “soon” with Gemini 2.0 Flash. Google said in a press release.
General availability, and additional model sizes such as the base model Gemini 2.0, are expected to follow in January.
What is Gemini 2.0?
Gemini 2.0 is a multimodal generative AI model running on Google’s Trillium hardware. It’s designed to make online tasks easier and more intuitive by helping you summarize information, search the web, and even interact more naturally with tools or apps.
Google notes that Gemini 2.0 Flash is twice as fast as its predecessor 1.5 Pro and outperforms it in AI performance benchmarks such as MMLU-PRO and LiveCodeBench.
“If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it more useful,” Google CEO Sundar Pichai said in a statement.
What sets Gemini 2.0 apart are its agent capabilities. Pichai describes these capabilities as enabling the model to “understand more about the world around it, think several steps ahead, and take action on its own with its own monitoring.”
Google further emphasizes that Gemini 2.0 distinguishes itself by:
- Multimodal processing.
- Ability to understand long books or large sections of the web.
- Function calling.
- “Using a local tool.”
- “Following and planning complex instructions.”
Native tool usage allows AI to incorporate tools such as Google search and code execution to perform autonomous actions. In practical terms, it sometimes looks like Google’s Project Astra — an Android app now in testing that uses the phone’s camera and Gemini’s logic to answer questions about the world in real time. Project Astra can analyze up to 10 minutes of video at a time.
Google also announces additional projects, prototypes.
Project Mariner
Another proof of concept is Project Mariner, an experimental Chrome extension that showcases Google’s attempt to make Gemini browser screens readable. Users can ask it to summarize web pages or make purchases.
“It’s still early days, but Project Mariner shows that it’s becoming technically possible to navigate within the browser, even though it’s difficult to complete tasks today,” said Google DeepMind CEOs Demas Hassabis and Kore Kavokoglu. For is not always accurate and slow, which will improve rapidly over time.” , Google DeepMind’s CTO wrote in a press release.
WATCH: Google Reveals Special Image and Video Generation AI models Even in early December.
Deep research
Deep Research, available with a Gemini Advanced subscription, is an experimental model connected to the web. It is designed for grad students, scientists, or entrepreneurs to create research projects and outlines. The tool searches the web for a topic of your choice, submits a research plan for approval or modification, and then analyzes existing work.
Jules Developer Assistant
Google also announced a new developer tool called Jules, a coding assistant powered by Gemini 2.0 Flash. Jules sits inside GitHub and can write code, fix bugs, and create and execute multi-step plans. Jules is available today to a limited pool of testers. Google expects widespread availability in early 2025.
Google is gearing up to tackle cyber threats.
Google also noted that it knows that Project Mariner, in particular, could be a rich hunting ground for it. Instant injection attacks. The company said it is working on guardrails against it. Attempts at forgery and fraud. Where attackers can hide AI instructions in emails, websites, or documents.