Microsoft has delivered a long-awaited expansion of its local AI capabilities, allowing Windows 11 PCs with supported Nvidia GPUs to run on-device language models without requiring a dedicated neural processing unit (NPU). The update, part of the Windows App SDK 2.2 Experimental 9 release in June 2026, opens the door for a much broader range of devices to tap into Windows’ built‑in AI features, starting with GeForce RTX 30‑series graphics cards and newer.
For the first time, developers can leverage the same local AI APIs that were previously exclusive to Copilot+ PCs—machines with specialized NPUs like the Qualcomm Snapdragon X Elite or Intel Core Ultra—on systems with discrete Nvidia GPUs. The move signals a major shift in Microsoft’s AI strategy, prioritizing hardware flexibility over proprietary silicon.
Breaking the NPU Lock
When Copilot+ PCs launched in 2024, they introduced a new class of AI‑accelerated experiences: real‑time captioning, Studio Effects, and, most notably, local language models like Phi Silica. These features relied on a dedicated NPU capable of at least 40 TOPS (trillion operations per second). Without it, the APIs simply fell back to cloud processing—if they worked at all.
That decision frustrated many enthusiasts and developers. High‑end gaming PCs with RTX 4090 GPUs boasting over 1,300 TOPS of AI compute were left running AI workloads on the CPU or in the cloud, while modest laptops with entry‑level NPUs got first‑class treatment. Microsoft’s rationale was power efficiency and consistent user experience, but it created an artificial barrier that the new SDK release tears down.
What the Windows App SDK 2.2 Brings
The Windows App SDK is the foundational set of libraries that modern Windows apps use for UI, runtime, and now AI. Experimental channel 9 of version 2.2 adds a new execution provider for the LocalLanguageModel API suite, routing inferencing requests directly to Nvidia GPUs via DirectML. This means applications that integrate with Windows’ AI stack can now seamlessly tap into the massive parallel compute of GeForce RTX graphics.
Key highlights include:
- Phi‑based models run locally: The same Phi‑3 and Phi‑3.5 models that power Copilot+ features now execute on GPU hardware.
- Unified API surface: Developers call the same LanguageModel and EmbeddingModel classes regardless of whether the backend is an NPU, GPU, or CPU—Windows selects the best available device automatically.
- DirectML acceleration: Under the hood, the framework uses DirectML’s optimized operators for Nvidia GPUs, achieving token rates competitive with or exceeding NPU performance on high‑end cards.
- Experimental status: The feature is opt‑in, requiring a manifest flag and the installation of the experimental SDK package. Microsoft warns that APIs may change before general availability.
For users, this means AI‑powered apps like text summarizers, content generators, or real‑time translators can now run entirely offline on a much wider range of hardware. No cloud subscription, no data leaving the device.
Performance and Hardware Requirements
Early benchmarks from developers who’ve tested the experimental release show impressive results. An RTX 3060 can run Phi‑3‑medium at over 30 tokens per second—plenty for interactive chat or document analysis. Higher‑end cards like the RTX 4080 push past 80 t/s, rivaling cloud API services.
But there are catches. The feature only works on Windows 11 version 24H2 or later with the latest Nvidia Game Ready or Studio driver (r560 branch or newer). GPU compute capability must be 8.6 or higher, which effectively limits support to RTX 30 series and above—older GTX cards are excluded. Laptops with discrete RTX GPUs may see reduced battery life during sustained AI tasks, a trade‑off that disappears when plugged in.
| GPU Model | Compute Capability | Supported? | Expected Token/s (Phi‑3‑medium) |
|---|---|---|---|
| RTX 3060 | 8.6 | Yes | 30–40 |
| RTX 3070 | 8.6 | Yes | 40–55 |
| RTX 3080 | 8.6 | Yes | 50–65 |
| RTX 4090 | 8.9 | Yes | 100+ |
| GTX 1660 | 7.5 | No | – |
For comparison, a Snapdragon X Elite NPU typically delivers 20–30 t/s on the same model, showing just how much headroom even mid‑range GPUs have.
Developer Adoption and the Experimental Channel
Because the feature lives in an experimental SDK, it won’t appear in production apps overnight. Developers must download the special SDK package, update their project references, and declare the gpuAiExperimental capability in their app manifest. Microsoft has provided extensive C++/WinRT and C# samples on the Windows Developer Blog to ease the transition.
“We’re hearing from ISVs that the ability to ship a single app that lights up both on Copilot+ PCs and on existing gaming rigs is a game‑changer,” said a Microsoft program manager in a related GitHub discussion. “The goal is to make AI a ubiquitous part of the Windows platform, regardless of silicon choice.”
Still, experimental code carries risk. APIs could change, performance isn’t guaranteed, and certification for the Microsoft Store requires special approval. Microsoft cautions against shipping consumer‑facing products on the experimental track.
Community Reaction: Cautious Optimism
Across developer forums and Reddit threads, the response has been overwhelmingly positive—tinged with practical concerns. Many celebrated the end of the “NPU exclusivity,” with one popular comment summing it up: “Finally, my 4090 gets to do something useful beyond gaming.” Others worried about thermal management on laptops, noting that sustained GPU‑accelerated AI could lead to loud fan noise and hot keyboards.
Power consumption is another sore spot: a Copilot+ PC can run a local model for hours on battery without a noticeable drain, while a gaming laptop might deplete its battery in under an hour. Some developers have already started implementing hybrid modes that toggle between GPU (plugged in) and NPU/CPU (on battery), keeping the experience smooth regardless of power source.
A recurring question is AMD and Intel GPU support. The current release is Nvidia‑only, likely due to DirectML’s current optimization state. Microsoft hasn’t committed to a timeline for other vendors, but the underlying DirectML path should eventually work on any DX12 GPU with sufficient compute capability. For now, AMD Radeon and Intel Arc users remain on the sidelines.
Privacy and the Offline AI Promise
One of the biggest wins for end users is privacy. Running language models entirely on‑device means sensitive data—legal documents, personal photos, voice recordings—never leaves the PC. The SDK integrates with Windows’ existing AI permissions model, so apps must explicitly declare their intent to use AI and can be audited via the Privacy Dashboard.
This addresses a major criticism of cloud‑based AI services, which often require sending data to servers for processing. With GPU‑accelerated local models, even large enterprises can enforce data residency policies without sacrificing AI capabilities.
What’s Next for Windows AI
The June 2026 experimental release is a clear signal that Microsoft intends to make AI a core OS feature, not a premium add‑on. As the SDK matures, expect:
- Broader GPU support: AMD RDNA 3 and Intel Xe architectures are likely candidates for the next experimental update.
- Finer‑grained power management: Windows will probably add a system setting that lets users choose between “performance,” “balanced,” and “battery‑saver” AI modes.
- ML‑backed app distribution: The Microsoft Store could start listing AI features as a system requirement, much like RAM or DirectX level.
- Integration with Copilot stack: The same APIs might eventually power the system‑level Copilot assistant, reducing cloud dependency even for the digital companion.
For now, developers and enthusiasts can grab the experimental SDK from the Windows Insider Dev Channel and start testing. It’s a pivotal moment for Windows AI—finally, the hardware is no longer the bottleneck.
Microsoft has yet to announce a date for general availability, but given the rapid pace of experimental updates, a production‑ready release could land before the end of 2026. In the meantime, the message is clear: your trusty gaming GPU is now a serious AI accelerator, no NPU required.