Google Gemini's New Agent Mode, Go, and Immersive View Leak in Testing—Here's What They Do

Google is quietly testing three new experimental modes inside the Gemini web app that could fundamentally change how users interact with the AI assistant. Code-sleuths at TestingCatalog discovered the modes—Agent Mode, Gemini Go, and Immersive View—in Gemini's mode selector, complete with unique icons and descriptive labels. The features, which haven't been formally announced, align with Google's broader push toward agentic AI and visual productivity tools, as previewed at Google I/O 2025 and in recent Android Studio updates.

The appearance of these modes signals a deliberate shift from single-turn chat to a platform that can autonomously plan and execute tasks, collaborate on creative ideation, and provide rich visual explanations. Combined with the Agent Mode already documented for Android Studio, the discoveries paint a picture of a coordinated rollout—one that will let Gemini act as a delegate, a design partner, and a multimedia tutor.

Agent Mode: From Chatbot to Autonomous Executor

The most consequential of the three, Agent Mode, carries the label "perform autonomous exploration, planning and execution." That's not a casual addition. It presages a Gemini that can accept a high-level goal—book a trip, research a topic and compile a report, or automate a multi-step web workflow—and then break it down into actions, execute them, and course-correct along the way. The dedicated icon in the selector (distinct from temporary testing toggles) suggests Google may keep this as a permanent, easily discoverable feature rather than merging it into the main chat interface.

This consumer-facing Agent Mode mirrors what Google is building for developers. According to the Android Developers Blog, Agent Mode in Android Studio—listed as "coming soon"—is designed to "handle complex, multi-stage development tasks that go beyond typical AI assistant capabilities." It can formulate an execution plan across multiple project files, add dependencies, edit code, run builds, and fix errors iteratively. The parallel design suggests a unified agentic engine under the hood, adapted for different contexts.

At Google I/O 2025, the company demoed Project Mariner, an agent that can navigate websites on its own—filling forms, clicking buttons, and orchestrating multi-step research tasks. The TestingCatalog find suggests that Mariner-like capabilities are heading to the Gemini consumer app. For users, this could mean delegating mundane web chores to an assistant that understands their accounts and preferences. However, it also introduces serious safety and governance questions. An agent that logs into services, makes purchases, or posts on social media must operate within strict permission frameworks to avoid misuse or accidental damage. Google is likely to gate this feature carefully, with early access for paying subscribers and developer previews.

Gemini Go: Ideation and Rapid Prototyping

The Gemini Go mode is described simply as "explore ideas together," hinting at a collaborative, Canvas-centric workspace. Google has been building out Canvas—a shared multimodal surface where users can sketch, drop images, and co-edit with AI-powered generation—and a dedicated mode would streamline creative sessions. Imagine a design team launching a Go session to brainstorm product concepts: they could sketch a rough wireframe, prompt Gemini for variations, and iterate on generated assets, all within a persistent project space that saves history.

This aligns with recent Canvas expansions that allow uploading mockups, transforming UI elements with natural language, and even generating code from screenshots. In Android Studio, features like "Transform UI with Gemini" and "Compose preview generation" already let developers describe visual changes in plain language. Gemini Go would extend that fluid ideation to non-developers, bridging the gap between inspiration and a tangible prototype. Marketers drafting campaign visuals, product managers mapping user flows, or students brainstorming presentation materials could all benefit from a mode that reduces friction between asking and creating.

Immersive View: Visual Answers for Complex Questions

Immersive View promises "visual answers to your questions." That could mean step-by-step illustrated guides, annotated screenshots, or even short video explanations synthesized on the fly. Google has been laying the groundwork with Gemini Live's visual guidance, which analyzes camera input and highlights on-screen elements. Immersive View appears to be the next leap: generating visuals proactively rather than merely overlaying pointers on what the camera sees.

For troubleshooting, a user might ask, "How do I change my printer settings?" and receive a annotated visual walkthrough with arrows and highlights. In education, a student studying human anatomy could ask for a diagram of the circulatory system and get a labeled 3D rendering. The mode would turn Gemini into an on-demand visual tutor—useful for support teams, trainers, and anyone who learns better from pictures than paragraphs. Combined with the other modes, Immersive View could become the go-to tool for explaining complex concepts quickly.

A Coordinated Push Across Consumer and Developer Products

These three modes didn't appear in isolation. Google is simultaneously rolling out agentic and visual AI features in Android Studio, Workspace, and the Gemini app, suggesting a concerted strategy. The Android Developers Blog details AI features that fill in the picture: Journeys, which lets QA teams describe test scenarios in natural language and have Gemini execute them; crash analysis that suggests code fixes; Compose preview generation from images; and file-contextual prompts. All of these feed into a mission to make AI a native productivity layer, not just a side panel.

The same philosophy applies to the consumer modes. Agent Mode will likely lean on integrations with Gmail, Calendar, Docs, and Search to accomplish cross-app tasks. Gemini Go could pull in brand assets from Drive or team folders. Immersive View might draw from YouTube millions of demonstration videos to generate bespoke visual guides. The end result is an AI workspace where users think, prototype, and act—often without leaving the Gemini interface.

Google's approach to testing these as distinct toggles is telling. Historically, the company has floated experimental capabilities as named modes before either merging them into the main UX or discarding them. Agent Mode's dedicated icon signals a higher probability of sticking around as a separate feature, especially given the agentic realm's distinct interaction patterns and safety requirements. Gemini Go and Immersive View could eventually become contextually surfaced tools, but for now they offer focused sandboxes to refine the experiences.

Security, Privacy, and the Hard Parts of Agentic AI

Giving an AI the reins to act autonomously on the web or within your accounts raises the stakes dramatically. Google must address authentication scopes, consent dialogs, and error recovery. For instance, if an agent books an expensive flight by mistake, the user needs a clear undo path. Enterprises will demand data residency controls and transparency about how agent interactions are used for model training. Workspace image-editing tools recently drew criticism for lacking centralized admin toggles at launch—a reminder that governance often lags behind feature rollout.

Organizations that plan to adopt these modes should prepare now: tighten OAuth scopes, enforce two-factor authentication, define human-in-the-loop confirmation points for high-stakes actions, and test workflows in sandboxed accounts. Google's staged release pattern—subscriber gating, developer previews, and opt-in experiments—will buy some time, but the trajectory is clear.

Competitive Landscape and What It Means for Windows Users

Agentic AI is the current battleground. Microsoft's Copilot has been weaving autonomous capabilities into Office and Windows, and OpenAI's Operator and Anthropic's computer-use APIs are pushing similar boundaries. Gemini's advantage lies in its deep integration with the Google ecosystem and its multimodal foundation. For Windows users, the Gemini web app (and potential PWAs) can bridge Chrome and Edge workflows alongside Microsoft's own tools, making the agentic assistant increasingly platform-agnostic. The competition will force all players to improve safety and user experience rapidly.

Risks and Unknowns

TestingCatalog's discovery confirms active development, but no public timelines exist. Features could change significantly before launch, and some modes may never graduate from experimental status. The precise capabilities—whether agents can make financial transactions, write emails, or post on social media—remain undefined. Pricing and availability tiers are speculation. Until official documentation lands, treat the leaks as directional rather than finalized.

Practical Steps to Prepare

Audit which Google services and third-party sites an agent might access.
Review OAuth scopes and enforce strong authentication.
Set up human-approval gates for irreversible actions.
Test in a sandboxed environment using preview accounts.
Clarify data handling and model training policies with your organization.

Conclusion

Agent Mode, Gemini Go, and Immersive View are more than UI experiments—they're visible steps in Google's plan to turn Gemini from a conversational AI into a productive, autonomous, and visual workspace. The agentic leap offers clear efficiency gains but demands rigorous governance. By joining the dots between the leaked consumer modes and the documented developer tools, it's evident that Google is building a cohesive AI platform where the same agentic engine powers everything from code generation to travel planning. Users and organizations should track these developments closely, tighten their AI policies, and prepare for a future where an AI assistant doesn't just answer questions—it acts on your behalf.