Nvidia RTX GPUs can now run Microsoft's on-device AI engine. The Windows 11 Dev Preview lifts the NPU lock, letting developers experiment with Phi Silica using their existing GeForce cards. Microsoft has unlocked a significant expansion for Windows 11’s local AI capabilities: developers can now run the Phi Silica language model on PCs equipped with Nvidia GeForce RTX 30-series or newer GPUs, as long as they have at least 6GB of dedicated video memory. This update, delivered as part of a Dev Preview for the Windows Copilot Runtime, eliminates the strict requirement for a dedicated Neural Processing Unit (NPU) that previously limited Phi Silica to Copilot+ PCs. The move opens the door for millions of Windows users with capable graphics cards to test and build AI-powered applications without waiting for specialist hardware.

The announcement makes good on a promise Microsoft has been quietly shaping: to weave AI deeply into the Windows ecosystem, not just through cloud services, but with robust on-device inference that can run locally, privately, and responsively. Phi Silica is the centerpiece of that vision—a small language model fine-tuned for Windows devices. Until now, it was only accessible if you had a Copilot+ PC with an NPU like Qualcomm’s Snapdragon X Elite or Intel’s Core Ultra (Lunar Lake) processors. By adding support for popular Nvidia GPUs, Microsoft is effectively giving a legion of developers and early adopters a sandbox to play in.

Breaking the NPU Barrier

When Copilot+ PCs launched in mid-2024, Microsoft heavily emphasized the role of the NPU. This dedicated accelerator was painted as the secret sauce for running AI models efficiently, providing up to 40+ TOPS of performance while sipping minimal power. Phi Silica, a trimmed-down variant of the Phi family of models from Microsoft Research, was showcased as the local engine behind features like Recall, Click to Do, and AI-enhanced creativity tools in Paint and Photos.

But the exclusive NPU dependency left out a huge segment of the Windows user base: those with high-end gaming or workstation GPUs. Nvidia’s RTX cards pack massive parallel processing power, often exceeding an NPU’s raw throughput by a considerable margin. However, until now, the Windows Copilot Runtime didn’t provide a direct path to utilize that power for Phi Silica inference. The Dev Preview changes that.

Microsoft announced the update through its Windows Developer Blog, stating that the new local Language Model APIs—part of the Windows App SDK—now accept GPU devices via DirectML, the low-level machine learning acceleration API built into DirectX. Developers can target any GPU that supports DirectML with sufficient VRAM, but Microsoft’s initial validation focuses on Nvidia GeForce RTX 30-series and above. The 6GB VRAM floor is practical: the Phi Silica model weighs in at around 3.3 billion parameters and requires a few gigabytes of memory for weights and activations. Cards with less VRAM might struggle with context lengths or background tasks.

What Is Phi Silica?

Phi Silica is not just another LLM variant. It’s been carefully distilled and optimized for Windows devices, balancing model size, inference speed, and quality. Microsoft designed it to handle a range of tasks: text summarization, rewriting, information extraction, and even programming assistance—all while running entirely on-device. Unlike cloud-based copilots, on-device inference means no data leaves the machine, a key selling point for privacy-conscious users and enterprises.

The model shares its lineage with the larger Phi-3 and Phi-4 series, but the “Silica” designation signals the on-device focus. It integrates tightly with Windows APIs, allowing apps to call it through a simple interface without worrying about tokenizers, memory management, or batching. Early benchmarks from Copilot+ devices show Phi Silica delivering competent, if not earth-shattering, performance for everyday productivity tasks. For developers, it’s a testbed for exploring how local AI can augment their apps.

Now, with GPU support, that testbed extends to much more common hardware. An RTX 3060 with 12GB VRAM, for example, can comfortably run Phi Silica alongside other workloads. Even an RTX 3050 Ti mobile GPU with 6GB might scrape by, though multi-tasking could become tight.

GPU Requirements and DirectML

The Dev Preview explicitly lists Nvidia GeForce RTX 30-series, 40-series, and presumably 50-series (once available) as supported, but the underlying technology—DirectML—makes the API hardware-agnostic in theory. In practice, Nvidia’s mature driver stack and extensive DirectML support make them the first-class citizen. AMD Radeon RX 6000 series and newer also support DirectML, but Microsoft hasn’t announced official support yet, likely due to validation and performance tuning priorities.

DirectML is the key enabler. It provides a unified interface for GPU-accelerated machine learning operations, abstracting away hardware differences. Developers using the Windows Copilot Runtime’s local Language Model APIs need only specify the GPU as the device; DirectML handles the kernel launches and data movement. The API itself is part of the broader Windows App SDK, so developers can integrate it into any Win32 or UWP app, including traditional desktop software and modern Windows Store applications.

Microsoft recommends developers test thoroughly on the target hardware, as inference speed can vary widely. An RTX 4090 will churn through tokens faster than an RTX 3060, but both should work. The company has not provided hard performance numbers for GPU-accelerated Phi Silica, but early community experimentation suggests performance tiers that follow PC gaming logic:

  • RTX 3050 (6GB VRAM): Minimal spec, suitable for basic prompts with limited context length. Likely on par or slightly slower than a typical NPU.
  • RTX 3060/4060 (8–12GB): Good balance, handles most workloads comfortably.
  • RTX 3070/3080/4070+: Fast inference, can sustain multiple concurrent sessions.
  • RTX 4090: Overkill, potentially 5–10x faster than a high-end NPU.

These are rough estimates; actual speeds depend on prompt complexity, batching, and system load.

Developer Access and APIs

To get started, developers need to join the Windows Insiders program, specifically the Dev Channel, for the latest SDK and runtime binaries. The local Language Model APIs are exposed through the Windows.AI.LanguageModels namespace. A typical workflow involves creating a language model session, loading the Phi Silica model, and then submitting prompts. The API handles the rest.

Here’s a conceptual snippet from Microsoft’s documentation:

var modelPath = @\"C:\\Program Files\\WindowsApps\\Microsoft.Windows.AI.LanguageModels_...\";
var model = await LanguageModel.LoadAsync(modelPath);
var session = await model.CreateSessionAsync();
var response = await session.ExecuteAsync(\"Summarize this text: ...\");

The model files themselves are part of the Windows Copilot Runtime, and developers don’t need to manage separate model downloads—they’re provisioned with the OS update. On Copilot+ PCs, the runtime defaults to the NPU. On systems with an eligible GPU and no NPU, it falls back to DirectML and the GPU. Microsoft has indicated that future updates may allow explicit device selection for power users and mixed scenarios (like running low-priority inference on the NPU while the GPU handles graphics).

Developer Implications in Depth

This expansion changes the calculus for independent software vendors (ISVs) and enterprise developers. Previously, building AI features that relied on Phi Silica meant targeting an audience with Copilot+ PCs—a tiny fraction of the market. Now, any user with a mid-range gaming laptop or desktop from the past three years becomes a potential customer. Developers can add local smart reply suggestions in email clients, summarize documents before uploading, extract key phrases from meeting notes, or build a code assistant that respects intellectual property by keeping source code on the machine.

The toolchain integrates seamlessly with Visual Studio and the Windows App SDK 1.6-experimental (or newer). Microsoft has published extensive documentation and sample projects to accelerate adoption. Debugging is straightforward: the API behaves identically whether running on NPU or GPU, so developers can write once and test on their existing RTX system.

Monetization opportunities are also emerging. Apps that leverage on-device AI can market privacy as a premium feature, charging users a one-time fee or subscription without incurring per-token cloud costs. For enterprises, local AI means sensitive data never leaves the device, easing compliance with regulations like GDPR or HIPAA.

Performance and Practical Use Cases

It’s important to temper expectations: GPU-accelerated Phi Silica in this Dev Preview is not meant to replace cloud AI services or even run production-grade workloads. Instead, it’s a development vehicle. Developers can prototype AI features, test prompt engineering, and build application logic without sending every query to Azure OpenAI. For independent software vendors and enterprise developers, that means faster iteration, lower costs, and the ability to work offline.

Because the model runs locally, latency is deterministic and unaffected by internet congestion. An RTX 3080 can generate tokens at a pace that feels responsive for interactive chat or inline text editing. However, the RTX 30-series minimum and 6GB VRAM requirement will exclude many ultrabooks and older desktops with integrated graphics or lower-end discrete GPUs. For those machines, the NPU remains the aspirational hardware, and Microsoft continues to push Copilot+ features through the NPU pipeline. The GPU path is explicitly a developer preview; it’s not yet targeting broad consumer scenarios.

Online, early adopters have already shared demos of local document summarizers and code assistants powered by Phi Silica on their RTX machines. Community feedback has been largely positive, with developers praising the drop-in API simplicity, though many note that GPU memory management can be finicky under heavy load. Microsoft has acknowledged these rough edges and plans to refine scheduling and multi-app support in upcoming Dev builds.

The Bigger Picture: On-Device AI

This update fits into a larger industry trend: the move toward hybrid AI, where processing is split between the cloud and the edge. Apple has been aggressively pushing on-device intelligence with its Neural Engine in M-series and A-series chips. Google is optimizing Gemini models for on-device use on Android and ChromeOS. Microsoft’s response is the Copilot Runtime and now the expanded hardware support.

By opening Phi Silica to Nvidia GPUs, Microsoft signals that the Windows platform is not locking AI innovation to a single OEM or chipmaker. It’s an inclusive approach that acknowledges the installed base—hundreds of millions of Windows devices ship with discrete Nvidia GPUs, many in the hands of developers and prosumers who are eager to experiment with AI.

Moreover, it gives Microsoft a chance to gather telemetry and feedback on a wider scale. The Dev Preview serves as a real-world stress test for the runtime, API design, and model performance across diverse hardware configurations. That feedback will be crucial as the company refines the experience for a broader launch, likely tied to the next Windows 11 feature update (version 24H2 or later).

How to Get Started

For developers itching to try Phi Silica on their RTX GPU, the steps are straightforward:

  1. Join the Windows Insider Dev Channel on a machine with a supported Nvidia GPU and 6GB+ VRAM.
  2. Update to the latest Dev build and install the latest GPU driver from Nvidia (Game Ready or Studio driver, version 555.xx or later recommended).
  3. Install the Windows App SDK (1.6-experimental or newer) from the official Microsoft download page.
  4. Create a new project or open an existing one, add a reference to the Microsoft.Windows.AI.LanguageModels package.
  5. Follow the official documentation for initializing and calling the model.

Microsoft cautions that the Dev Preview has known issues: some GPU memory allocations may fail if the system is under heavy load, and multi-app concurrency is not yet fully supported. Expect occasional crashes and API changes as the team iterates.

Looking Ahead

The Dev Preview is clearly a stepping stone. Once the APIs stabilize and DirectML’s GPU scheduling matures, Microsoft is likely to extend official support to AMD and Intel discrete GPUs. There’s also speculation that integrated GPUs with sufficient shared memory (e.g., those in AMD’s Strix Point APUs or Intel’s Arrow Lake-H) might eventually make the cut, though they would need careful memory management to handle the model’s footprint.

In the long run, Microsoft wants every Windows PC to be AI-capable. The combination of NPUs in new silicon and DirectML-accelerated GPUs in existing hardware creates two complementary paths to that goal. Phi Silica is just the first model; the Copilot Runtime is designed to host multiple models optimized for different tasks—code generation, image understanding, and more. By opening the door to GPUs now, Microsoft ensures that when those models arrive, a ready developer ecosystem will be waiting to adopt them.

For Windows enthusiasts and developers alike, this Dev Preview is a tangible step toward a more intelligent, responsive, and private computing experience. It’s not ready for primetime, but it’s an invitation to start building the next generation of Windows apps—today.