SURF Downgrades but Doesn't Resolve Two High-Risk Privacy Gaps in Microsoft 365 Copilot

Two of the four high privacy risks flagged in Microsoft 365 Copilot’s initial Data Protection Impact Assessment (DPIA) remain only partially addressed, the Dutch education and research network SURF has confirmed. The organization’s June 26, 2025 update reveals that while Microsoft has made concessions, risks around hallucinated personal data and opaque diagnostic telemetry persist at a medium–high level. That verdict leaves schools and universities across Europe in a holding pattern — Copilot’s productivity promises are still shadowed by compliance and trust deficits.

The DPIA’s Origins and SURF’s Role

SURF, the cooperative body for Dutch education and research institutions, commissioned the original DPIA in 2024 after Copilot’s rapid integration into Office apps. The assessment, conducted with external privacy experts, scrutinized data flows, security measures, and GDPR alignment. Published in December 2024, it identified four high-risk areas that rendered the service unsuitable for broad use in environments where personal, sensitive, and research-grade data intermix daily.

The initial findings prompted SURF to advise its members against general deployment. In parallel, SURF and the Dutch government’s Strategic Supplier Management (SLM) began a dialogue with Microsoft, seeking technical and contractual clarifications. Microsoft provided additional documentation and commitments, which SURF evaluated in its latest update. While two of the four high risks were fully resolved, the remaining pair was only downgraded to medium–high — a status SURF uses to signal that material concerns linger.

The Two Persistent Risks: What They Mean in Practice

1. Inaccurate or Incomplete Personal Data: Hallucinations Meet Automation Bias

Generative AI systems like Copilot synthesize responses probabilistically. When processing institutional documents, they may invent plausible-sounding but false personal information — misattributed quotes, merged biographies, or fictional roles. The DPIA warns that users, particularly non-technical staff, will over-trust these outputs. That automation bias is especially dangerous in evaluative contexts: hiring, grading, research ethics reviews, or student discipline.

Key technical vectors amplify the threat:
- Copilot blends internal content with generative text without always surfacing confidence levels or exact sources.
- The default Office UI presents AI answers inline, making them appear as authoritative as a manually created spreadsheet or document.
- No consistent provenance metadata accompanies generated summaries, so recipients cannot easily verify claims.

SURF’s assessment is blunt: even a single hallucinated data point in a student record or grant proposal could cause reputational harm, unfair decisions, or regulatory penalties. Mitigation, the DPIA stresses, requires both technical controls and institutional policies that mandate human verification.

2. Diagnostic and Telemetry Data: Unknown Retention and Reidentification Risks

Diagnostic logs and telemetry are essential for debugging, but they often contain usage fingerprints, timestamps, hashed user identifiers, and document IDs. SURF’s second remaining medium–high risk centers on three unresolved questions:
- Retention duration: Microsoft’s disclosed policies are not specific or short enough for GDPR compliance in the education sector.
- Log content: The exact fields captured remain unclear, raising reidentification concerns when telemetry is combined with other data sources.
- Data subject rights barriers: Institutions may struggle to exercise deletion or access requests on behalf of users because of how telemetry is collected and stored.

In research settings — where cohorts are small and datasets unique — even pseudonymized logs can be combined with auxiliary information to single out individuals. That risk, SURF argues, is incompatible with the “data minimization” and “storage limitation” principles of GDPR.

Microsoft’s Position and the Broader Trust Context

Microsoft has consistently stated that customer content in Microsoft 365 is not used to train its large language models. That assurance helps for core data, but it does not address telemetry pipelines or retention windows. Moreover, the company’s recent privacy controversies — such as the now-retooled “Recall” feature for Copilot+ PCs — have left a residue of skepticism among privacy regulators. SURF’s June 2025 update notes that Microsoft has supplied new mitigation information and that the process is “going in the right direction,” but the network withholds a full green-light until independent verification occurs.

The DPIA’s broader findings also spotlight a lack of transparency: Microsoft’s public and contractual explanations about personal data collected by Copilot were judged incomplete and difficult to interpret. That ambiguity undermines institutions’ ability to fulfill their own transparency obligations toward data subjects.

Legal and Compliance Implications Across Europe

Under GDPR, organizations acting as data controllers must be able to demonstrate lawful basis, purpose limitation, and transparent processing. Copilot’s opacity around diagnostic data makes such demonstrations challenging. Specific touchpoints include:
- Transparency obligations: Staff, students, and researchers are entitled to know what data is processed and why. SURF found Microsoft’s communications insufficiently clear.
- Data subject rights: The right to rectification and erasure becomes hollow if diagnostic logs live in vendor-managed systems with ambiguous deletion practices.
- Data protection by design: Embedding AI into productivity apps demands provable minimization; the DPIA found gaps.

For institutions that ignore these signals, supervisory authorities could impose fines or order processing stops. The Netherlands’ Autoriteit Persoonsgegevens and other EU watchdogs have made AI-powered data processing a priority area.

Sector-Specific Vulnerabilities

SURF’s membership spans universities, research institutes, and vocational colleges. The DPIA deliberately scoped adult students and employees because Microsoft’s paid education licenses were not available for minors at the time. Using generative AI with minors introduces separate legal and ethical duties, which SURF stresses should be evaluated independently. Even with adults, the intermixing of HR files, medical research datasets, and administrative records in shared tenancies raises the stakes of any error or unauthorized disclosure.

Practical Steps for Schools and Research Institutions

SURF’s recommendation remains clear: limit Copilot to controlled pilots until mitigations prove effective. For organizations that proceed, the DPIA outlines four layers of precaution:

1. Policy and Governance

Create a written AI usage policy enumerating allowed and forbidden use cases.
Require formal approval for any department-level pilot.
Mandate human review before acting on Copilot-generated personal data.

2. Technical Configuration

Use tenant-level controls to restrict access to specific users or units.
Disable risky connectors and external sharing by default.
Log Copilot activity in the institution’s own SIEM for independent audit trails.

3. Data Minimization

Avoid feeding sensitive or special-category data into prompts.
Prefer on-tenant retrieval with redaction before using documents as prompt context.
Demand provenance tokens or traceability metadata from the vendor.

4. Contractual and Audit Clauses

Negotiate clear retention periods for telemetry and mechanisms for data subject rights.
Require Microsoft to supply technical descriptions and allow third-party audits.

Technical Mitigations That Could Close the Gap

The DPIA suggests several privacy-by-design measures that vendors like Microsoft could implement:
- Provenance-first outputs: Attach machine-readable metadata showing source documents and confidence scores.
- Explicit, short telemetry windows: Define strict retention periods for diagnostic data in EU/EEA regions, with institutional deletion options.
- Local-only processing modes: For sensitive workflows, offer inference that does not transmit payload data to global endpoints.
- Granular admin controls: Block specific data classes (e.g., student dossiers, HR files) from inclusion in prompt contexts.
- Independent attestations: Provide regular SOC/ISO-style reports specific to Copilot’s telemetry, retention, and provenance practices.

Remaining Concerns Even After the Update

While the downgrade from high to medium–high signals progress, experts point to persistent fault lines:
- Documentation ambiguity: Technical descriptions, especially around diagnostic log content, remain incomplete.
- Implementation drift: Commitments on paper require follow-up audits. SURF’s insistence on re-evaluation is a hedge against promises that go unfulfilled.
- Human factors: Automation bias can only be countered through institutional culture and training — no vendor patch can eliminate the tendency to trust a convincing-looking answer.
- Regulatory momentum: European data protection authorities are sharpening their focus on AI-driven processing. Institutions ignoring SURF’s guidance court enforcement risks.

A Template for Responsible AI Adoption

SURF’s approach — public, methodical, and sector-specific — offers a replicable model for educational consortia and enterprises worldwide. The DPIA moves beyond “AI is risky” to identify operational gaps, and its iterative dialogue with Microsoft demonstrates that vendor engagement can yield meaningful, if not complete, remediation. For CIOs and privacy officers, the lesson is to treat generative AI add-ons not as productivity trinkets but as high-impact data processors that demand the same rigor as any enterprise software. Until independent verification closes the two remaining gaps, caution is the only defensible posture.