Dell, Microsoft, and AMD used the stage at Dell Technologies World 2026 in Las Vegas this week to announce a tightly integrated enterprise AI stack that puts hybrid infrastructure, SQL Server–driven AI, and aggressive cost control at the center of the conversation. The three companies laid out a roadmap that lets businesses run AI workloads where their data lives—often on-premises—while unifying management through Azure Arc and slashing the price tag with CPU-first inference.

A Hybrid Control Plane with Azure Local and Dell PowerEdge

Microsoft confirmed that Azure Local—the next evolution of Azure Stack HCI—now runs on Dell PowerEdge servers that have been validated for AI workloads using AMD EPYC processors. Azure Local brings Azure services to an organization’s own data center, enabling a single control plane across cloud and on-premises resources. Dell’s latest 16th-generation PowerEdge servers, including the R7625 and R6615, pair with 5th Gen AMD EPYC (Turin) CPUs, which embed AI inference acceleration directly into the processor.

These configurations ship pre-loaded with Azure Local, ready to join an Azure Arc management fabric. Administrators see their on-prem servers in the Azure portal alongside cloud VMs, using the same Role-Based Access Control (RBAC) and policy engine. The combination allows enterprises to develop AI agents in Visual Studio Code or Azure AI Studio, then deploy them locally—on the same servers that host sensitive databases—without ever moving customer data to a public cloud endpoint.

AMD EPYC Turin: AI That Doesn’t Need a GPU

A key pillar of the cost argument is AMD’s latest EPYC 9005 series, codenamed Turin, which packs up to 192 Zen 5c cores and integrates the XDNA 2 neural processing unit. This NPU block accelerates quantization-aware inference on common ONNX and DirectML models, making it feasible to run a small- to mid-size language model entirely on CPU without a discrete GPU. AMD demonstrated a 7-billion-parameter Mistral variant running on a two-socket EPYC 9575F system, delivering 30 tokens per second while consuming under 400 watts—performance that rivals entry-level accelerators at a fraction of the capital expense.

For workloads that do need a GPU, Dell’s PowerEdge XE9680 with AMD Instinct MI350X accelerators remains the high-end option. But the companies stressed that many enterprise AI tasks—retrieval-augmented generation (RAG) over SQL data, document summarization, classification, and anomaly detection—fit comfortably within the new EPYC’s capabilities. That lets IT departments defer GPU purchases, reducing both hardware acquisition cost and ongoing power and cooling overhead.

SQL Server Becomes an AI Engine

Perhaps the most consequential announcement was the deep integration of AI into SQL Server. Microsoft previewed SQL Server 2025 with native vector indexes, embedding generation, and a dedicated RAG operator that runs inside the database engine. Instead of pulling data into a separate AI service, users can issue a T-SQL SELECT statement that calls a stored procedure to generate embeddings for a text column, perform cosine-similarity search across millions of rows, and return results in milliseconds—all within the same transactional context.

Dell demonstrated a real-world scenario: a global retailer using SQL Server 2025 on a 4-socket PowerEdge R7625 with EPYC 9575F processors. The retailer ran a nightly pipeline that re-embeds product descriptions as they change, then serves customer-facing semantic search through a local REST endpoint. The entire stack—database, vector store, and embedding model—lived on the same machine, eliminating data movement to a cloud AI endpoint. Total query latency dropped by 70% compared with the previous architecture that called Azure OpenAI for embeddings and used a separate vector database.

Cost Control at Every Layer

The three partners hammered on total cost of ownership numbers. Dell quoted a 45% lower three-year TCO for a hybrid AI inference cluster compared with a cloud-only setup when running on-prem SQL Server RAG workloads. Microsoft introduced a new consumption-based billing model for Azure Local AI services: customers pay a per-core-hour rate for vCPUs that run AI operators, similar to Azure’s pay-as-you-go logic but on their own hardware. Unused cores don’t incur charges, avoiding the stranded-cost problem common with dedicated cloud AI instances.

AMD contributed power-efficiency data: the EPYC 9575F delivered 2.8x the inference throughput per watt of a competing Xeon 6 processor in the same workload. When multiplied across a rack of eight PowerEdge servers, the annual electricity savings could exceed $15,000 per rack at U.S. commercial utility rates. Dell added that its OpenManage software now provides AI-specific telemetry—power consumption of NPU engines, GPU utilization, and thermal headroom—so administrators can fine-tune placement of AI jobs for minimal energy use.

Real-World Use: Manufacturing Quality Control

To ground the announcement, a customer from the automotive manufacturing sector shared their pilot. The factory floor generates terabytes of sensor data stored in SQL Server. Previously, quality engineers would query the database for out-of-spec measurements, then manually cross-reference images from inspection cameras stored in a separate data lake. With SQL Server 2025’s native RAG, the same T-SQL query can now return both the numerical anomaly and the semantically similar past images, which are retrieved through vector search on image embeddings. The entire pipeline runs on a single Dell PowerEdge R7625 sitting on the factory LAN, keeping proprietary process data inside the plant while still leveraging AI.

Developer Workflow: From Laptop to On-Prem Agent

Microsoft highlighted the toolchain: developers write a LangChain agent in VS Code, test it against Azure SQL Database in the cloud, then publish the complete agent to an Azure Local instance using Azure Arc. The agent can call local SQL Server stored procedures that generate embeddings, run RAG, and reply to user prompts—all without crossing the firewall. Dell’s validated solution catalog now includes a “SQL AI Stack” reference architecture that provisions the entire software stack, including the operating system, database, and AI libraries, in under 15 minutes using Dell OpenManage integrations with Azure Arc.

Enterprise Security and Sovereignty

For regulated industries, the trio emphasized that every data path remains encrypted and inspectable. Azure Local enforces Azure Policy to ensure that only approved models are loaded onto NPUs, and SQL Server row-level security integrates with Microsoft Entra ID for fine-grained access to row data even during AI operations. The architecture supports air-gapped deployment, letting defense and government entities run AI without any internet connectivity, a feature that has been requested since the first wave of LLMs.

What It Means for Windows Enthusiasts

While targeted squarely at enterprise IT, the ripple effects reach power users and developers in the Windows ecosystem. The same ONNX and DirectML runtime that powers AMD NPU acceleration on servers is available on Windows 11 Copilot+ PCs. Skills developed in building local SQL Server AI tools translate directly to the same stack running on a workstation or even a high-end laptop, making the hybrid model a continuum from device to data center.

The Takeaway

Dell, Microsoft, and AMD have drawn a line: enterprise AI does not need to be a cloud-only affair that hemorrhages data egress fees and sacrifices data sovereignty. By pairing Azure Local with EPYC Turin’s CPU-based AI and baking vector intelligence into SQL Server 2025, they’ve created a stack that lets organizations keep AI close to the data while controlling costs at a granular level. The message to the C-suite is clear: the fastest path to ROI may be the server room down the hall, not a hyperscale data center 1,000 miles away.