Phi Silica Breaks Free: Windows App SDK 2.2.2 Brings Local AI to RTX GPUs

Microsoft today released a landmark experimental update to its Windows App SDK, unlocking the Phi Silica local AI model for developers on non-Copilot+ Windows 11 PCs equipped with Nvidia GeForce RTX 30-series or newer GPUs. The June 2026 Windows App SDK 2.2.2 experimental release tears down the previous NPU requirement, letting the same language-model APIs that power AI experiences on Copilot+ laptops run on a wide range of existing gaming and workstation desktops. Developers can now integrate local, on-device natural language processing directly into their Windows apps without needing dedicated neural processing hardware.

The move marks a significant expansion of Microsoft’s on-device AI strategy. Since its debut alongside Copilot+ PCs, Phi Silica has been tightly coupled to the Qualcomm Snapdragon X Elite’s Hexagon NPU, later expanding to AMD Ryzen AI 300 and Intel Core Ultra 200V NPUs. By porting the runtime to leverage Nvidia’s CUDA cores and tensor cores, Microsoft is acknowledging the vast installed base of discrete GPUs and the growing demand for AI capabilities in desktop applications that never touch the cloud.

What Is Phi Silica?

Phi Silica is a compact, highly optimized language model derived from Microsoft Research’s Phi family. Unlike the cloud-based Phi-3 models running on Azure, Phi Silica is designed to execute entirely locally, ensuring privacy, low latency, and offline operation. It powers features such as text summarization, smart reply suggestions, and contextual rewriting in apps like Paint, Photos, and the Windows Copilot sidebar on Copilot+ PCs.

The model typically weighs around 3.3 billion parameters and uses aggressive 4-bit quantization, letting it fit into as little as 2–3 GB of memory. That efficiency made it a natural candidate for NPUs, where power constraints limit memory bandwidth. However, the same small footprint also makes it ideal for discrete GPUs, where even entry-level RTX 3060 cards with 8 GB VRAM have ample headroom.

The Windows App SDK 2.2.2 Experimental Release

The experimental build, version 2.2.2, introduces a new set of Windows Runtime (WinRT) APIs that mirror the existing Phi Silica AI namespace found in the Windows Copilot Runtime. These APIs—LanguageModel, TextEmbedding, and PromptTemplate—allow developers to load the model, pass in prompts, and retrieve completions without managing memory or execution schedules manually.

Key aspects of the release include:

GPU Offloading: The runtime automatically detects a compatible Nvidia GPU and offloads inference to it, using DirectML as the backend. Developers do not need to write CUDA code; the SDK abstracts hardware selection.
Model Caching: Once downloaded, the model files (about 2.5 GB) are stored in a shared package folder, so any app using the SDK can reuse the cached model, avoiding redundant downloads.
Experimental Status: The APIs are marked as Experimental and may change in future releases. Microsoft warns that performance and behavior are not yet finalized, and production apps should not ship with a dependency on this experimental layer until it stabilizes.
Target Audience: The release is aimed squarely at developers building .NET or C++ Windows desktop applications. End users will only benefit once apps integrate the capability.

Hardware Requirements and Compatibility

To take advantage of the new APIs, a system must run Windows 11 version 24H2 or later and include an Nvidia GeForce RTX 30-series, 40-series, or newer GPU with at least 8 GB of dedicated VRAM. While the minimum VRAM specification is not explicitly documented in the pre-release notes, tests by early adopters indicate that the 3.3B parameter model requires roughly 2.8–3.2 GB at runtime, leaving comfortable headroom on 8 GB cards. The RTX 3060, RTX 4060, and higher-tier cards all satisfy this.

Laptops with RTX 30-series discrete GPUs, such as the RTX 3050 (4 GB VRAM) or RTX 3060 (6 GB variants), may struggle due to lower memory capacity. The SDK includes a dynamic offload mechanism that can fall back to CPU inference using ONNX Runtime, but performance on a CPU will be significantly slower and is not recommended for real-time interactions.

Importantly, AMD Radeon and Intel Arc GPUs are not currently supported, though Microsoft’s documentation hints that future builds may expand support using the same DirectML abstraction layer. For now, the Nvidia ecosystem—by far the dominant discrete GPU platform among Windows developers—gets the first look.

How Developers Can Get Started

Interested developers can obtain the experimental Windows App SDK 2.2.2 from the official GitHub releases page or via a dedicated NuGet package. The package is versioned as Microsoft.WindowsAppSDK.Experimental.1.7 (not to be confused with the stable 1.7 release), and it installs side-by-side with stable SDKs without conflict.

After installation, the model is fetched on first use. The SDK provides a setup guide that walks through:

Adding the experimental NuGet reference to the project.
Checking for GPU compatibility via LanguageModel.IsAvailableAsync().
Creating a LanguageModel instance and loading the default Phi Silica model.
Sending prompts and receiving completions asynchronously.

A minimal C# example looks like this:

using Windows.AI.Experimental;
var model = await LanguageModel.CreateAsync();
var result = await model.GenerateResponseAsync("Write a haiku about Windows.");
Console.WriteLine(result);

Developers can also customize generation parameters like temperature, top-p, and max tokens. The underlying DirectML pipeline is transparent—no manual GPU setup is required.

Implications for the Windows AI Ecosystem

This experimental release signals a deliberate strategy shift. By decoupling Phi Silica from NPU-exclusive hardware, Microsoft opens the door for a wave of AI-enhanced desktop applications that previously couldn’t justify targeting only Copilot+ systems. Categories like code editors, creative tools, gaming launchers, and productivity suites can now add local intelligence without sacrificing their user base.

Consider a game development studio that ships a level editor on Windows. With Phi Silica running on the same RTX GPU already rendering complex scenes, the editor could offer context-aware scripting assistance, natural language search within massive asset libraries, or dynamic tutorial generation—all without phoning home. For enterprises, document summarization, email drafting assistants, and line-of-business workflow automation become feasible on standard fleet machines without cloud dependencies.

The move also reduces fragmentation. Until now, developers targeting both Copilot+ and traditional PCs had to maintain separate code paths: one for NPU-accelerated features and another for fallback or cloud-dependent logic. With GPU acceleration, a single API can serve the majority of Windows 11 devices with discrete graphics. IDC data from early 2026 estimates that over 40% of active Windows 11 systems include a DirectX 12-capable discrete GPU with 8 GB or more VRAM—a substantial addressable market.

Performance and Practical Considerations

In internal benchmarks shared by early testers, an RTX 4090 processes a 500-token prompt in under 200 milliseconds, rivaling the speed of the Snapdragon X Elite’s NPU. An RTX 3080 achieves similar latency, while an RTX 3060 lands around 400–500 milliseconds—still perfectly acceptable for interactive assistants. These figures depend on batch size and model quantization level, but they demonstrate that even mid-range gaming GPUs can deliver a snappy experience.

Power draw is a trade-off. An NPU might consume 5–10 watts during inference, whereas a discrete GPU can easily draw 50–150 watts. For laptop users on battery, this makes Phi Silica on an RTX GPU a less attractive option for always-on background features. Microsoft acknowledges this and recommends that developers query battery status and system power profiles before enabling GPU-accelerated AI.

Memory management is another consideration. When the model loads into GPU memory, it competes with other GPU-intensive workloads. For example, a game that already saturates VRAM with textures and geometry may force the model to spill into shared system memory, degrading performance. The LanguageModel API includes events to signal memory pressure, allowing apps to gracefully release the model when necessary.

Community Reaction and Early Feedback

While the official discussion channels are still filling with first impressions, early reactions from Windows developer forums and GitHub issues highlight a mix of excitement and cautious optimism. Many developers have long requested a Microsoft-backed local LLM API for Windows, frustrated by the fragmentation of third-party solutions like Llama.cpp or Transformers.NET. The official, supported API with hardware acceleration is seen as a major quality-of-life improvement.

Some voices express concern over the exclusive Nvidia support, calling for AMD and Intel GPU compatibility sooner rather than later. Others note that the 8 GB VRAM floor leaves out many popular productivity laptops that ship with 4 GB or 6 GB GPUs. Microsoft’s response to these issues will likely shape adoption in the coming months.

A recurring question is whether Microsoft will eventually allow developers to swap in fine-tuned variants or entirely different models (e.g., Meta’s Llama). The current experimental APIs hard-code the Phi Silica model path, but the underlying DirectML pipeline is model-agnostic. Industry speculation suggests that future releases could expose a pluggable model interface, turning Windows into a more flexible local AI platform.

The Road Ahead

Microsoft has not committed to a timeline for graduating the Phi Silica GPU support from experimental to stable. The Windows App SDK 2.2 series will see several more experimental builds throughout 2026, with the goal of refining per-GPU performance profiles, expanding VRAM detection, and integrating with the Windows Copilot Runtime to enable hybrid scenarios where the OS chooses the best available hardware at runtime.

Looking further, the company’s broader AI investments—Codename “Hudson Valley” OS-level AI features, collaboration with Nvidia on DirectML optimizations, and the push for a unified hybrid AI developer platform—suggest that local model support will only deepen. The day when a Windows user can install any app and have it leverage on-device AI regardless of underlying silicon is inching closer.

For developers today, the experimental build is a tangible milestone. It proves that Microsoft’s AI ambitions are not locked to a single hardware platform and that the PC’s existing GPU muscle can be harnessed for intelligent, privacy-preserving experiences. The road from experiment to production may be long, but the direction is clear: local AI is coming to every Windows 11 PC that can run it.