Microsoft has patched a critical flaw in Microsoft 365 Copilot that allowed attackers to access and summarize enterprise files without leaving any trace in audit logs—and did so without notifying customers or assigning a CVE. The silent server-side fix, deployed on August 17, 2025, arrived amid a growing governance crisis that also includes policy enforcement failures and a sandbox privilege-escalation vulnerability, collectively undermining core enterprise security assurances.
Security researcher Zack Korman, CTO of Pistachio, discovered the audit-log evasion on July 4. By simply instructing Copilot not to include a reference link in its response, an attacker could cause the AI to retrieve and summarize a file while suppressing the AccessedResources attribute in the corresponding Purview audit record. This meant security teams monitoring for suspicious data access would see no evidence of the interaction. Korman reported it to Microsoft’s Security Response Center immediately, and the company classified it as “important” but chose not to issue a CVE or tenant advisory.
This audit gap has profound implications. Microsoft Purview audit logs are the canonical source for forensic investigations and compliance reporting around Copilot interactions. A missing file reference can break SIEM correlation rules, blind automated detection playbooks, and create incomplete timelines that fail legal discovery. The vulnerability was already demonstrated publicly at Black Hat 2024 by Zenity CTO Michael Bargury, who used a jailbreak technique to bypass Copilot’s security controls—yet it remained exploitable until the August patch.
The audit flaw is not an isolated incident. Multiple independent reports have surfaced showing that tenant-level agent governance policies are also not reliably enforced. Administrators who configured “No users can access Agent” settings found that certain agents—particularly Microsoft-published prebuilt agents and some third-party agents—remained discoverable or installable by end users. This behavior was observed after the May 2025 Copilot update wave that broadly expanded agent visibility across Microsoft 365 surfaces.
While one investigative report claimed that 107 Copilot Agents were deployed into tenants during that rollout, that specific count is not corroborated by Microsoft’s public release notes and should be treated as unverified. However, the underlying enforcement failure is real and has forced organizations into manual per-agent revocation and PowerShell blocking as compensating controls, creating significant operational overhead.
The likely root causes of policy bypass include desynchronization between the agent inventory in the admin center and the enforcement decision path, privileged provisioning flows for Microsoft-published agents that may skirt tenant-level scoping, and policy semantics that operate at the UI layer rather than as hard-deny authorization checks across all product surfaces (Teams, Outlook, web, mobile). The net result: enterprises that believed agents were blocked could still have those agents accessing sensitive SharePoint, OneDrive, or Exchange content without administrator awareness.
A third class of failure emerged from Copilot’s live Python/Jupyter environment. Researchers at Eye Security published a proof-of-concept showing that a writable directory appearing early in the process $PATH allowed an uploaded malicious binary named pgrep to be executed by a root-run watchdog script that did not use an absolute path. The attack yielded root privileges inside the container. While root in a container does not automatically mean host escape, it grants full control over that container’s filesystem and processes—a powerful foothold for lateral movement, credential harvesting, or tampering with local telemetry. Microsoft patched the environment in late July after responsible disclosure.
The combination of these three issues—policy bypass, audit log gaps, and sandbox escalation—points to systemic weaknesses in how Microsoft’s Copilot governance layer interacts with its rapidly expanding agent ecosystem. For defenders, the practical impact is immediate: inventory validation, testing enforcement from representative user accounts, cross-validating Purview logs with Graph activity and SharePoint read counters, and restricting file upload features are now essential.
Microsoft’s decision to fix the audit flaw silently, without a public advisory or CVE, has drawn sharp criticism. The fix was server-side and required no tenant action, but its historical impact means that audit logs may be incomplete for a prior window. Organizations with high regulatory or legal requirements must now assume that Copilot audit telemetry is not fully trustworthy without secondary validation. The lack of transparency around cloud-side telemetry changes makes compliance attestation and incident reconstruction more fragile.
The sandbox root escalation also reinforces long-standing DevSecOps principles: privileged scripts must use absolute paths and drop privileges, upload interfaces must strictly validate filenames, and container isolation must assume breach. Eye Security noted that the container had limited sensitive data in their tests and that Microsoft had already fixed known breakout vectors, but the vulnerability’s existence shows that sandbox design assumptions can be brittle.
For enterprise CISOs and IT leadership, the immediate mandate is clear. Begin by exporting your Copilot Agent Inventory from the Microsoft 365 admin center and reconciling it against approved agents. Test policy enforcement using non-admin accounts to confirm that restrictions actually hide agents in all product surfaces. Where gaps are found, implement manual blocking via PowerShell and document it as a compensating control. Harden detection by augmenting Purview with secondary signals: Graph activity checks, OneDrive/SharePoint access logs, and Entra ID sign-in anomalies. Restrict who can publish agents from Copilot Studio and require approval workflows. Treat public Copilot agents as potential exposure vectors for sensitive content and enforce data classification gates for AI-bound queries.
Longer-term, architecture should decouple high-sensitivity workloads from multi-tenant agent services where possible, preferring private LLM deployments. Microsoft must also be pushed to improve transparency: when a server-side fix changes telemetry semantics or historical integrity, customers deserve notification policies and at least a CVE-like identifier to assess their exposure window.
The Copilot governance failures of mid-2025 serve as a critical reminder that feature velocity in enterprise AI must be matched by robust, cross-surface governance enforcement. Productivity gains are undeniable, but they are inseparable from the trust layer that underlies them. Without decisive corrective action and stronger platform guarantees, organizations will continue to bear the compliance and security costs of manual workarounds and uncertain telemetry. The immediate technical holes have been patched, but the structural lesson endures: governance must be as distributed and resilient as the features it protects.