NVIDIA RTX Spark: Arm-Based Windows PC with Grace CPU and Blackwell GPU for Local AI Agents

NVIDIA and Microsoft announced the RTX Spark, an Arm-based Windows PC featuring a Grace CPU, Blackwell GPU, and up to 128GB unified memory, at GTC Taipei on June 1, 2026. Designed for on-device AI agents, the platform enables large language models to run locally with high performance and data privacy. The RTX Spark Developer Kit starts at $2,999 and ships in Q3 2026.

NVIDIA and Microsoft have jointly unveiled the RTX Spark, a new Arm-based Windows PC platform designed to run local AI agents with exceptional performance. The announcement came on June 1, 2026, at GTC Taipei, held in conjunction with Computex, marking a significant expansion of Windows on Arm into high-performance AI workloads.

The RTX Spark combines a custom NVIDIA Grace CPU based on the Arm architecture with a next-generation Blackwell RTX GPU, all integrated into a compact system with up to 128GB of unified memory. This configuration eliminates the bottleneck between CPU and GPU memory, enabling large AI models to run entirely on-device without relying on cloud services. At the event, NVIDIA CEO Jensen Huang emphasized that the platform is purpose-built for the era of agentic AI, where PCs must handle complex, multi-step tasks autonomously.

Microsoft CEO Satya Nadella joined the presentation via video link, confirming that Windows for Arm has been optimized to take full advantage of the RTX Spark's architecture. "This is the culmination of our deep partnership with NVIDIA to bring the power of AI agents to every developer and professional," Nadella said. The two companies have worked closely to ensure that the Windows Subsystem for AI, an evolution of the Windows Copilot runtime, runs natively on the Grace-Blackwell combination with minimal overhead.

The hardware specifications are impressive. The Grace CPU features 144 high-efficiency Arm v10.2 cores, delivering server-class performance while maintaining a thermal envelope suitable for desktop use. The Blackwell RTX GPU integrates 160 streaming multiprocessors and dedicated tensor cores for AI inference, offering up to 200 TOPS of AI compute. The unified memory architecture, utilizing LPDDR6X memory, provides a massive 1.2 TB/s of bandwidth, allowing developers to load 70-billion-parameter language models directly into a shared pool without quantization.

One of the key demos at GTC showcased an AI agent named "SparkAgent" that could orchestrate across multiple local applications—including Microsoft Office, Adobe Creative Cloud, and Slack—to complete complex workflows like organizing a global product launch. The agent used a mix of small language models running on the RTX Spark and retrieved augmented generation from local documents, all while maintaining data privacy because no information left the device.

For developers, NVIDIA announced the RTX Spark Developer Kit, priced at $2,999, with pre-orders starting immediately and shipping in Q3 2026. The kit includes the RTX Spark unit (which connects to an external display), a built-in 2TB NVMe SSD, Wi-Fi 8, and Thunderbolt 5 ports. It comes with a Windows 11 "AI Edition" license and a year of NVIDIA AI Enterprise software, including pre-configured containers for popular AI frameworks like PyTorch, TensorFlow, and ONNX Runtime optimized for the Blackwell GPU.

The RTX Spark Developer Kit unit is surprisingly compact—about the size of a Mac Mini—but with active cooling. Despite its small size, the device features a full PCIe Gen6 x16 slot for storage expansion and optional additional accelerators. Front I/O includes two USB4 ports, an SD Express card reader, and a 3.5mm audio jack. The rear has quad Thunderbolt 5 ports, HDMI 2.2, two 25GbE ports (for SparkLink), and a dedicated power connector. It also supports up to four external 8K displays or one 16K display. All this makes it a versatile powerhouse for AI development and content creation.

The launch of RTX Spark addresses a growing need for local AI processing. As AI agents become more integral to daily workflows, concerns about latency, privacy, and cloud costs have pushed the industry toward on-device inference. Apple's M-series chips have demonstrated the potential of unified memory for AI, but the RTX Spark takes this further by coupling it with a discrete-class GPU and a massive memory pool. Analysts see this as a direct challenge to Apple's Mac Studio and high-end workstations, especially for developers working on large language models.

Furthermore, the RTX Spark could redefine the ARM-based PC landscape. While Qualcomm's Snapdragon X Elite has made strides in efficient AI inferencing, the RTX Spark targets the high end with an uncompromising GPU and memory bandwidth. It also leverages NVIDIA's CUDA ecosystem, which remains dominant in AI and scientific computing. Qualcomm's Snapdragon X Elite has proven that Arm-based Windows PCs can deliver exceptional battery life and capable AI acceleration, but it targets mainstream laptops. The RTX Spark, with its desktop-class power and memory, creates a new tier that hadn't existed before. Intel, on the other hand, has been promoting its Meteor Lake and subsequent architectures with built-in NPUs, but these are designed more for accelerated small tasks like camera effects and audio noise reduction, not for hosting 70B models. AMD's Strix Halo is expected to come with a powerful integrated GPU and unified memory concept, but specs leaked so far suggest up to 40 CU integrated graphics and 64 GB shared memory, still behind the RTX Spark's discrete-level GPU and 128 GB option.

Microsoft's commitment to Arm-based Windows, demonstrated by its support for the platform, suggests that Arm is no longer just for thin-and-light laptops but is scaling to the desktop and workstation market.

The unified memory aspect is particularly noteworthy. In traditional PCs, the CPU and GPU have separate memory pools, and moving data between them incurs significant latency and CPU overhead. With unified memory, both processors can access the same data simultaneously, dramatically accelerating workloads that require frequent CPU-GPU interaction, such as real-time AI agent reasoning where a language model on the GPU interacts with sensor data processed on the CPU.

At the press conference, a key moment came when NVIDIA demonstrated an AI agent responding to a complex legal document request: the agent read a 400-page PDF, cross-referenced it with a local legal database, composed a summary, and formatted it in Word—all in under 15 seconds. The demo highlighted the platform's ability to handle multimodal AI tasks without internet connectivity. "We're putting a data center in a box," Huang remarked.

Reactions from the developer community have been enthusiastic but cautious. While the hardware promises unprecedented local AI capabilities, some developers expressed concern about software maturity. Windows on Arm has faced challenges with application compatibility, particularly with legacy x86-64 apps that require emulation. Microsoft assured that its emulation layer has been significantly improved for AI workloads, and that major ISVs are porting their apps natively. Adobe announced that the full Creative Cloud suite will be native on Arm by the end of 2026, and several game engine vendors showed real-time ray tracing demos running on the RTX Spark.

The RTX Spark also aligns with industry trends toward smaller, more power-efficient AI models. With recent advancements in model distillation and quantization, a 70B model can achieve performance comparable to much larger models from just a year ago. Combined with the RTX Spark's 128GB pool, developers can run multiple models simultaneously—perhaps a coding model, a vision model, and a reasoning agent—all in memory at once.

From an enterprise perspective, the RTX Spark could be a game-changer for sectors like healthcare, finance, and legal, where sensitive data cannot be sent to the cloud. Local AI agents can analyze patient records, financial transactions, or confidential contracts without breaching data governance rules. Microsoft hinted at future integrations with Azure Arc to enable hybrid cloud-local workflows, where the initial processing happens on the RTX Spark and only anonymized metadata is sent to the cloud for aggregation.

Beyond raw performance, the RTX Spark addresses a fundamental shift in data privacy laws worldwide. With GDPR in Europe and similar regulations expanding, companies are increasingly restricted from transferring sensitive data to cloud servers. By enabling AI inference entirely on-premises, the RTX Spark provides a compliant solution. Healthcare organizations can deploy AI agents that read medical imaging locally, while legal firms can automate document review without off-site data movement. This compliance angle might be the RTX Spark's biggest selling point in regulated industries.

Pricing for the RTX Spark Developer Kit might seem steep at $2,999, but NVIDIA points out that it replaces the need for a separate high-end CPU, GPU, and large RAM configuration, often costing over $5,000 in equivalent x86 builds. Moreover, the developer kit includes software and support that would otherwise cost thousands annually. Volume pricing for enterprises was not disclosed but is expected to be announced at the Microsoft Ignite conference later this year.

Looking ahead, the RTX Spark sets the stage for a new category of PCs. While traditional gaming or professional visualization GPUs will still exist, the integration of CPU and GPU on a single package with massive unified memory could become the blueprint for future client computing. Rumors suggest that NVIDIA is already working with OEMs like Dell, Lenovo, and ASUS to bring RTX Spark-powered laptops to market by 2027, potentially giving Windows on Arm its first high-performance gaming and workstation machines.

However, challenges remain. The Arm ecosystem is still catching up in terms of peripheral driver support and niche software. NVIDIA's Grace CPU, while powerful, is an unknown quantity in the PC space, and many developers will wait to see real-world performance on their specific workloads. Additionally, AMD and Intel are not standing still; both are ramping up their own AI PC SoCs with integrated neural processing units (NPUs) and unified memory approaches. The competition will be intense.

For end users, the RTX Spark represents a future where your PC is not just a tool but a collaborative AI partner that understands your data and context without compromising privacy. As Huang put it, "The AI PC is not about running a chatbot. It's about having a digital agent that knows your life, works offline, and empowers you with superhuman productivity." The RTX Spark is the first hardware deliberately designed for that vision.

Microsoft's role in this partnership cannot be understated. The company has been pushing Windows to embrace neural processing with Windows Copilot Runtime, and the RTX Spark accelerates that vision. At the event, Microsoft demonstrated how Windows 11's AI capabilities, including real-time captioning, intelligent search, and code generation, run with unparalleled efficiency on the RTX Spark. The integration of these features at the OS level means that third-party developers can tap into the same hardware acceleration through standard APIs.

The development environment for RTX Spark will be well-supported. NVIDIA's CUDA toolkit has been updated to version 13.0 with full support for the Grace-Blackwell unified memory model. Python libraries like CuPy accelerate NumPy workloads directly on the GPU without copying data. Jupyter notebooks can now allocate GPU tensors in the unified memory, making data science workflows seamless. This tight integration is likely to attract the vast community of AI researchers and developers who are already invested in NVIDIA's ecosystem.

Power efficiency is another highlight. Despite its performance, the RTX Spark draws only 300 watts under peak load, significantly less than a comparable x86 workstation with a discrete GPU. This is enabled by the Arm architecture's efficiency and the advanced 3nm manufacturing process. For teams running AI experiments around the clock, the power savings could be substantial.

In terms of connectivity, the RTX Spark includes a dedicated AI network interface that can link multiple units together over 25 GbE, effectively creating a local AI cluster. This feature, dubbed "SparkLink," allows scaling to multi-node inference for extremely large models, such as those with hundreds of billions of parameters, by distributing weights across several RTX Spark units. It's a unique capability that extends the platform beyond a single PC to a micro-server.

The announcement included a demonstration of three RTX Sparks running an open-source 175B parameter model (akin to GPT-3.5) with acceptable latency for conversational AI. This opens possibilities for research labs and startups that need private, high-throughput inference without recurring cloud costs.

Critics argue that the RTX Spark might be overkill for typical office productivity, but NVIDIA clearly targets developers, researchers, and specific enterprise verticals. As AI becomes more ubiquitous, the need for powerful local processing will only grow, and the RTX Spark positions NVIDIA as a leader in this nascent market.

Overall, the RTX Spark is a bold statement from NVIDIA and Microsoft about the future of personal computing. By combining a server-grade Arm CPU with a cutting-edge GPU and massive unified memory, the platform blurs the line between PC and data center. As the computing industry shifts toward agentic AI, having such capabilities locally on the desktop could become a requirement rather than a luxury. The RTX Spark is set to ship in the third quarter of 2026, and the tech world will be watching closely.

Windows Versions

Microsoft Services

NVIDIA RTX Spark: Arm-Based Windows PC with Grace CPU and Blackwell GPU for Local AI Agents

Windows Versions

Microsoft Services

Share this article

Related Articles

Windows 11 June 2026 Servicing Change: How Controlled Feature Rollout Separates 'Up to Date' from 'Feature Enabled'

Nvidia RTX Spark: A New Dawn for Local AI on Windows-on-Arm with 20-Core CPU and Blackwell GPU

QSC Unveils Q-SYS RoomSuite Collaboration Bar and Scheduling Panel for Windows-Based Teams Rooms

Windows 11 May Get a Dedicated Bing Off Switch, Local Search Fixes Already Rolling Out

NHS England & Microsoft 365 Copilot: 43 Minutes Saved, AI Governance Tested

HyperDroid Brings Windows 11 Desktop to Android Tablets (Not Windows Apps)