Microsoft's Copilot just gained the ability to juggle multiple files in a single chat—a feature users have clamored for—but a real-world test shows the assistant still stumbles when it needs to tell one graphics card from another. In a hands-on experiment by PC Guide, Copilot confidently misidentified an RTX 5080 Founders Edition as an RTX 3090, and confused a ROG Astral card with a ROG Strix, exposing gaps in its visual reasoning that enterprises and power users should not ignore.
The multi-file upload feature, now rolling out across app and browser versions of consumer Copilot, allows users to attach up to three files (documents or images) and ask the AI to compare, contrast, or draw insights across them. It's a practical upgrade that mimics how people work: piling contracts, offer letters, or design references into one workspace and asking a single question. But the PC Guide experiment, which tasked Copilot with spotting the “odd one out” from a handful of GPU photos, reveals that the assistant's usefulness remains broad but brittle—great for high-level comparisons, unreliable for fine-grained product identification.
What the multi-file upload actually delivers
For everyday productivity, the new capability is a straightforward win. Instead of feeding files one by one and stitching context manually, users can upload, say, three versions of a contract, a couple of moodboard screenshots, or a set of textbook pages, and ask Copilot to summarize differences or create a unified plan. Microsoft announced the feature on its Copilot Discord server with the line “You asked, we shipped!”, emphasizing that the AI now “reasons across” files rather than treating them as isolated inputs. The interface adds a simple plus icon in the chat composer for file selection.
The official documentation for consumer Copilot specifies a 50 MB per-file size limit and notes that uploaded files are automatically deleted after 30 days—a retention policy that matters for privacy-conscious users. However, these limits are not uniform across Microsoft's Copilot ecosystem. Copilot Studio, designed for enterprise agent creation, accepts files up to 512 MB for knowledge ingestion, and OneDrive's Copilot-powered “Compare Files” tool explicitly handles up to five documents at once. This fragmentation means users and admins must carefully verify which product and tier they're using before relying on multi-file workflows.
The GPU test: a teachable moment
PC Guide's test exposed the divide between general vision smarts and exacting domain knowledge. Uploading photos of several graphics cards—including an Nvidia RTX 5080 Founders Edition, various partner cards, and the ROG Astral model—the journalist asked Copilot to identify the odd one out. The assistant correctly flagged the Founders Edition card as distinct from the partner models, displaying a decent grasp of visual categories. But then it called that card an RTX 3090, despite the 5080's uniquely redesigned aesthetic and clear branding. It also misnamed the ROG Astral as a ROG Strix, a different product line.
The errors occurred in both Quick Response and Smart (GPT-5) modes, ruling out a simple quality toggle issue. They underscore a deeper limitation: Copilot's vision pipeline can separate broad classes (FE vs. partner card) but lacks the granularity to read subtle design cues, badge text, and model-specific markings when those details haven't been heavily reinforced in its training data. For tasks like inventory auditing, quality control, or tech journalism, such confident mislabeling could cause real damage if outputs are taken at face value.
Why Copilot fumbles on specific hardware details
Understanding why this happens requires a peek under the hood. Multimodal AI systems like Copilot rely on a combination of image understanding models and large language models (LLMs). The vision component might detect generic objects—a GPU, a fan shroud, PCIe connectors—but distinguishing between GPU generations often hinges on tiny visual clues: a serial number, a slight fan curve change, or a different typeface on the product badge. Unless the model has been fine-tuned on a massive, labeled dataset of tech hardware with these variations, it will default to the most statistically plausible label for “graphics card by Nvidia” rather than the correct sub-model.
Additionally, the grounding step—where visual features are translated into words that the LLM can reason about—may introduce noise. If the vision model is uncertain, it might output a generic description (“silver Nvidia card”), and the language model will backfill with a specific but incorrect name pulled from its training data about common GPUs. The PC Guide test results suggest that the RTX 3090, being a recognizable older flagship, is a frequent guess when the system cannot confirm a newer model's identity. This is not a Copilot-specific flaw; it's a known challenge in multimodal AI, but one that users must internalize.
Where the feature shines and why it matters
Despite the GPU gaffe, multi-file uploads represent a meaningful step toward Microsoft's vision of a single workspace assistant. For knowledge workers, the ability to cross-reference a table in a PDF with a graph in a PowerPoint and then summarize both in a few seconds can shave hours off research. Creatives can drop in multiple mood images and get a synthesized vision board suggestion. Students can feed lecture notes and textbook excerpts to generate quiz questions. The feature shortens the interaction loop and retains context across files in a way that older, single-file processing never could.
On Copilot+ PCs equipped with neural processing units (NPUs), the semantic indexing and some inference runs locally, bolstering both speed and privacy. This on-device capability is a big differentiator for Windows hardware, and Microsoft has been weaving it deeper into search and recommendations. Multi-file reasoning on a local NPU could eventually allow enterprises to keep sensitive documents off the cloud entirely while still using AI analysis.
Microsoft is also integrating the multi-file concept across its ecosystem. OneDrive's Compare Files feature, for example, now supports up to five files for side-by-side analysis, and Copilot Studio lets organizations ingest dozens of files into a knowledge base for agent-based Q&A. These complementary approaches suggest that Microsoft sees multi-document reasoning as a core competency, not just a chat add-on.
The trust gap: when Copilot needs a fact-checker
Accuracy is paramount when AI outputs influence business decisions. The GPU test is a microcosm of a larger trust problem: Copilot will often deliver fluent, confident answers that are factually wrong in subtle but critical ways. For a finance team comparing two earnings reports, a misread percentage could skew a strategic decision. For an IT asset manager cataloging hardware, mislabeled components could lead to incorrect inventory records. The assistant's lack of a built-in confidence indicator or provenance display (highlighting which part of the image led to its conclusion) exacerbates the issue. Without a way to quickly verify claims, users must either double-check everything manually—defeating the purpose—or blindly trust the output.
Microsoft’s current copilot offerings also create confusion. The consumer version has one set of file size and retention rules, while Microsoft 365 Copilot, which taps into organizational data, operates under different compliance and throttling constraints. Some enterprise users report daily upload caps or file-count limits that aren't clearly documented. Admins are left guessing whether a Copilot chat session will suddenly restrict file uploads mid-workflow.
Practical advice: how to harness multi-file Copilot safely
For individuals and teams ready to experiment, a few guidelines can help avoid the pitfalls demonstrated by the PC Guide test.
- Know your Copilot flavor. Check which product page applies: consumer Copilot limits uploads to 50 MB per file with a 30-day retention policy, while OneDrive Compare may handle larger sets. Copilot Studio's knowledge ingestion is a different beast entirely. Always read the support article for your exact license.
- Pilot with non-sensitive data. Before feeding anything confidential, test with mock documents or public-domain images that mimic your real use case. This reveals hallucination tendencies and helps you gauge when Copilot’s output can be trusted.
- Pair with deterministic verification. For tasks like hardware identification or contract review, follow up Copilot's summary with a source check: a GPU-Z database query, a serial number lookup, or a manual read of key clauses. Use Copilot to accelerate the first pass, not to replace human judgment.
- Leverage the right tool for the job. Quick comparisons across a handful of files? Use consumer Copilot or OneDrive Compare. Building a persistent knowledge base for a department? Use Copilot Studio with Dataverse. Avoid forcing consumer features into enterprise scenarios they weren’t designed for.
- Audit privacy settings proactively. Even though Microsoft says uploaded files are not used for training and are deleted after 30 days, verify that your tenant’s compliance settings align. For sensitive corporate documents, consider using Copilot+ devices with on-device processing to minimize cloud exposure.
- For IT admins: gate deployment with Intune. Control which devices can access Copilot file features; test data loss prevention (DLP) interactions; monitor Copilot telemetry through your SIEM; and have a rollback plan if unexpected data flows are detected.
The roadmap: what Microsoft must fix
The GPU misidentification is not just a quirky bug; it points to areas where Microsoft must invest to make Copilot a trusted enterprise tool.
- Improve fine-grained vision models. Hardware recognition is a valuable domain. Microsoft should consider training specialized visual classifiers on high-stakes categories—GPUs, medical equipment, vehicle parts—where precision matters. Partnerships with retailers or manufacturers could provide labeled image datasets.
- Clarify limits and unify the message. A transparent, easily accessible matrix showing file size, count, retention, and compliance details for every Copilot surface (consumer, M365, OneDrive, Studio) would reduce friction and boost enterprise confidence.
- Show the work. When Copilot identifies a specific model or extracts a critical figure from a document, it should highlight the visual region or text snippet that supports its claim. A confidence score, even a simple percentage, would help users calibrate trust.
- Enable tenant-level corrections. Allowing users to flag and correct misidentifications within a managed organization, and then feeding those corrections back into a tenant-specific model adapter, could turn errors into continuous improvement—without exposing data to public training.
- Build enterprise controls for NPU usage. As Copilot+ devices proliferate, IT needs admin tools to audit what data gets indexed locally, whether local indices are encrypted, and how erasure requests are handled.
Multi-file upload in Copilot is a welcome, user-driven improvement that accelerates everyday tasks and opens up creative workflows. But the PC Guide GPU test is a stark reminder that usefulness and reliability are not the same. Copilot can spot broad patterns, but it cannot yet be trusted to name the thing in front of its digital eyes. For now, treat it as a smart junior assistant—fast, helpful, but in need of a more experienced eye for the final nod of approval.