Microsoft has started rolling out a significant update to Microsoft 365 Copilot Chat that enables the AI assistant to analyze and ground its answers in images embedded within Word documents, PowerPoint presentations, and PDF files. The rollout began in June 2026 and is gradually reaching desktop users worldwide, marking a major step toward truly multimodal AI assistance in everyday productivity tools.

Until now, Copilot Chat could only process text content within Office files. With this new capability, users can ask questions about charts, diagrams, photographs, screenshots, and other visuals directly through the chat interface. For example, a financial analyst could query a bar chart inside a quarterly report without manually extracting the data, or a student could ask Copilot to summarize the key elements of an embedded historical map in a research paper.

How the feature works under the hood

The update leverages advanced multimodal language models—likely a variant of OpenAI’s GPT-4o or Microsoft’s proprietary Phi-4 Vision—integrated into the Microsoft 365 Copilot runtime. When a user asks a question about a document with embedded images, Copilot Chat not only processes the document’s text but also extracts visual features from the images, analyzes them, and uses the combined insights to generate a response.

Microsoft emphasizes that the grounding is bidirectional: the model can cite specific image elements in its answer, and users can upload or select a file to trigger the analysis. This means Copilot can now “see” what’s inside your documents, bringing a new dimension to search, summarization, and insight generation.

Critically, this functionality is available exclusively in Copilot Chat for Enterprise—the chat-based AI assistant that comes with certain Microsoft 365 licenses—and not in the premium Microsoft 365 Copilot (formerly known as Copilot for Microsoft 365). The rollout is phased: desktop users running the Current Channel of Microsoft 365 Apps for enterprise are receiving the update through June and July 2026, with web and mobile support expected later in the year.

The broader shift to multimodal productivity

Microsoft has been signalling its intent to make Copilot truly multimodal since the early previews of GPT-4V. In the months leading up to this release, the company integrated image analysis into Copilot for the Edge browser and Windows itself. Extending the same capability to Office documents closes a long-standing gap. Now, business reports, academic papers, and legal contracts that rely heavily on embedded charts or scanned images become fully queryable.

This aligns with an industry trend where standalone AI tools such as Google Gemini, ChatGPT, and Claude have already demonstrated visual understanding. By baking it directly into the flow of document review, Microsoft aims to reduce context-switching. Instead of screenshotting a chart and pasting it into a chat window, users can simply reference the file or even ask a question while viewing the document in Word or PowerPoint.

Real-world scenarios that become possible

The practical implications are vast. Consider a project manager reviewing a complex PowerPoint deck filled with Gantt charts and org diagrams. They could ask Copilot, “What tasks are scheduled for the week of June 15th?” and receive an answer drawn directly from the embedded images. An HR professional sifting through benefit plan PDFs could inquire, “Which plan has the lowest deductible?” based on comparison tables saved as images. Researchers analyzing published papers can extract findings from graphs without manually digitizing the data.

For regulated industries, the feature’s grounding mechanism is especially important. Because Copilot Chat respects Microsoft 365’s existing data boundaries and compliance controls, the image analysis occurs within the tenant’s trust boundary. The AI does not store or train on customer images, and answers are anchored to the document content, reducing hallucination risks. Administrators can also disable the feature entirely through Microsoft 365 Admin Center if their organization’s policies forbid visual data processing by AI.

Comparing with the competition

Google Workspace’s Duet AI has offered image analysis in Google Docs and Slides for months, powered by Gemini. However, Microsoft’s implementation goes deeper by grounding responses specifically to the known content of the file, not just publicly uploaded images. ChatGPT for Enterprise also supports document image analysis, but it requires users to upload files manually; Copilot Chat’s advantage is its seamless integration with SharePoint and OneDrive, eliminating the friction of file exports.

Apple’s on-device intelligence offers similar visual understanding in the Notes and Mail apps but is limited to Apple’s ecosystem. Microsoft’s cross-platform footprint gives this update a broader addressable market, especially among enterprises already committed to the Microsoft 365 stack.

Performance and quality trade-offs

Early adopters in the Microsoft 365 Insiders program have reported generally high accuracy when analyzing standard charts, clearly labelled diagrams, and crisp screenshots. However, limitations remain. Copilot Chat may struggle with low-resolution images, noisy scans, or highly complex visuals like circuit diagrams with tiny text. In such cases, the assistant will typically flag the uncertainty rather than guessing.

Microsoft recommends embedding images at a resolution of at least 150 DPI and ensuring that text within images is machine-readable. For scanned documents, pre-processing with OCR (optical character recognition) improves results, though the model itself attempts to extract text from images automatically. The feature supports common image formats—JPEG, PNG, GIF, BMP, and embedded vector graphics where rasterised versions exist.

Security, privacy, and governance considerations

Security-conscious IT administrators will appreciate that image analysis respects the same data loss prevention (DLP) policies and sensitivity labels applied to the parent document. If a file is classified as “Confidential,” Copilot Chat will still answer questions about it but will not visually process images if the label restricts AI access. The feature is also automatically disabled for documents protected by Azure Rights Management with content extraction blocked.

Usage is logged in the Microsoft 365 Compliance Center, giving organizations audit trails of when and how visual data was queried. This level of governance sets it apart from consumer-grade AI tools that often lack enterprise-grade transparency.

What this means for the future of Microsoft 365

The image-reading capability is more than just a feature update; it’s a signal of where Microsoft is heading. Multimodal AI is rapidly becoming the default interface for productivity software. Future iterations are expected to handle video frames, 3D models, and even handwriting recognition inside OneNote. Leaked Microsoft 365 Roadmap items for late 2026 hint at Copilot being able to generate entire presentations from a handwritten whiteboard photo—closing the loop between physical brainstorming and digital output.

For now, the immediate benefit is a substantial reduction in manual data extraction. Analysts who once spent hours copying numbers from a chart image into Excel can now get an instant summary. This doesn’t replace data analysts but augments them, freeing time for higher-value tasks.

User reaction and early feedback

Reaction across the Microsoft community has been mixed but largely positive. On the Microsoft 365 Insider forums and X (formerly Twitter), users have highlighted the time savings in reviewing investor pitch decks and academic papers. Some have noted that the feature occasionally misinterprets visual hierarchy in org charts, mixing up reporting lines. Microsoft has acknowledged these edge cases and promises continuous model fine-tuning through its Flighting ring system.

There is also a lively debate about whether the feature should be extended to email attachments in Outlook. Microsoft Product Managers have indicated that Outlook integration is on the backlog, with a possible preview by year-end.

How to start using it today

The feature is turned on by default for Microsoft 365 E3, E5, Business Standard, and Business Premium subscribers with Copilot Chat enabled. Users need to be running Version 2309 (Build 16827.20166) or later of Microsoft 365 Apps for enterprise. To check for the update, go to File > Account > Update Options > Update Now in any Office app. Once updated, simply open a Word document, PowerPoint deck, or PDF that contains images, click the Copilot Chat icon, and ask your question.

If the feature isn’t available, verify that your IT admin hasn’t disabled it via the Microsoft 365 Apps admin center. Detailed documentation is now live on Microsoft Learn, and a dedicated adoption kit is available for organizations planning a rollout.

Looking ahead: the multimodal assistant in your pocket

As Copilot evolves, the line between document content and conversation will blur further. The ability to “talk” to images inside files democratizes data access—no SQL queries or BI tools required. It also raises the bar for what users expect from an AI assistant. Microsoft’s strategy is clearly to make Copilot the default reasoning layer across all digital artefacts, not just text.

The June 2026 update is another building block in that vision. For millions of knowledge workers, the most immediate change will be a subtle but profound one: documents that truly see and answer.