Microsoft and Anthropic removed the beta label from Claude on Azure on June 29, 2026, bringing the model family into general availability inside Microsoft Foundry with full enterprise control-plane capabilities and a dedicated Nvidia GB300 infrastructure backbone. The move puts Claude’s 3.5 and 4 series alongside OpenAI’s GPT models in the Azure AI catalog, but with a governance and operations layer that Microsoft has tuned specifically for regulated industries.

The GA release means organizations can now run production workloads on Claude Haiku, Sonnet, and Opus through Azure’s pay-as-you-go or provisioned throughput billing, with SLA-backed availability, role-based access control, customer-managed keys, private endpoint support, and Microsoft’s Responsible AI content filters turned on by default. Every inference call routes through Foundry’s control plane, which funnels logs, metrics, and model-performance data into a unified dashboard that security teams have been asking for since the preview launched in late 2025.

What the Control Plane Actually Delivers

For enterprises, “general availability” is meaningless without auditable control. Foundry’s control plane bridges that gap by attaching every model endpoint to Azure Policy, Microsoft Purview, and Defender for Cloud. That allows a bank to enforce that no Claude prompt leaves the EU, or that a hospital can prove that PHI data was never stored in a model completion log. During the preview, several Fortune 500 early adopters pushed Microsoft to tighten the data-residency guarantees, and the GA release now includes region-locked inference in 17 Azure geographies, including newly opened sovereign cloud regions in Germany and the UAE.

Model version pinning is another capability that moved from roadmap to general availability. Teams can anchor a production pipeline to a specific Claude snapshot—say, claude-sonnet-20260515—and refuse automatic upgrades until they run their own regression suite. Foundry mirrors the snapshot internally for 180 days, after which Microsoft will notify the workspace owner before retiring the artifact. This is table stakes for compliance with financial services and life-sciences regulations, and it quietly extends the same versioning discipline to other models in the Foundry catalog, not just Claude.

Nvidia GB300 Blackwell: The Silent Enabler

Under the hood, the Azure-hosted Claude endpoints run entirely on Microsoft-managed infrastructure powered by Nvidia GB300 GPUs—the Blackwell-generation silicon that Nvidia disclosed at GTC 2026. The GB300 is a dual-die accelerator offering 192 GB of HBM3e memory per GPU and a 1.8 TB/s NVLink fabric that scales to 72-GPU NVLink domains inside a single rack. That design lets Claude Opus handle near-200k-token context windows with sub-second time-to-first-token latency, a metric that was a sticking point for enterprise chatbots during the preview phase.

Microsoft is not reselling raw GPU access here. The partnership is built on a reference architecture where Azure’s software-defined networking and storage stack wraps a HyperPod-like cluster that the Foundry service provisions as a dedicated capacity pool. Customers don’t have to manage a Kubernetes cluster or configure NCCL; they simply select a model, choose provisioned throughput units (PTUs), and get guaranteed tokens-per-second with a 99.9 percent availability SLA. Early benchmarks shared by the Azure AI team show that a 32-PTU deployment of Claude Sonnet on GB300 delivers roughly 40 percent higher throughput at the same latency budget compared with the previous H200-based incarnation, largely because the Blackwell architecture’s transformer engine can dynamically switch compute precision on the fly.

Governance That Speaks the Same Language as IT

Model governance often gets treated as an afterthought, bolted on through third-party proxies that sit between the application and the API. Foundry’s control plane absorbs that function directly into the API management layer. Every Claude prompt and completion passes through a policy evaluation pipeline that can apply Microsoft’s built-in content safety classifiers, custom blocklists, and prompt shields that detect jailbreak attempts. Because the evaluation runs inside Azure’s dedicated virtual networks, none of the payload ever leaves the tenant boundary—something that was impossible with the public Anthropic API and was the single most requested feature during the private preview.

Monitoring is equally granular: the control plane emits metrics like token counts, latency percentiles, and safety block rates to Azure Monitor and Log Analytics, while also piping them to any SIEM that supports the Azure Event Hub protocol. A pre-built workbook in Foundry gives a real-time dashboard that security operations centers can pin to their Azure Portal home screen. For heavily regulated workloads, Microsoft is also ship- ping a new “AI Audit Trail” log that records every prompt and completion in an immutable vault, satisfying requirements under EU’s AI Act and the U.S. Executive Order on AI that took effect in late 2025.

The Commercial Model and What It Says About the Market

Claude on Azure follows the same provisioned-throughput economics as Azure OpenAI Service. Reserved capacity starts at 1 PTU, which maps to roughly 1,000 tokens per second across the model family, with hourly commitment options starting at 730 hours per month. Pay-as-you-go pricing for the on-demand endpoints sits at roughly a 15 percent premium over the Anthropic API because it includes the Foundry control-plane services, Azure networking, and Microsoft’s SLA. Microsoft has not published list prices publicly—enterprise agreements still require a conversation with a sales rep—but leaked slides from the Ignite 2026 conference put Claude Opus at \$0.015 per 1K input tokens and \$0.075 per 1K output tokens for pay-as-you-go, with volume discounts kicking in at 100 PTUs.

That pricing aligns with Microsoft’s broader strategy: use the Foundry control plane as the common operating layer and let customers choose the model that best fits their use case. With Claude generally available, Foundry now hosts more than 1,800 models, but the GA designation signals that Microsoft’s own support teams and enterprise architects will actively recommend Claude alongside OpenAI models for production workloads, something they hesitated to do during the preview.

The Road Ahead: On-Device Inference and Windows Integration?

Neither Microsoft nor Anthropic said anything about Windows integration during the GA announcement, but the GB300 architecture carries interesting implications for the Windows ecosystem. The same Blackwell silicon that powers Claude on Azure is already shipping in the DGX GB300 workstation line that Nvidia markets to enterprise developers. If Microsoft’s engineering teams bring a quantized Claude model to the Windows AI Runtime—the DirectML-backed engine that shipped with Windows 12—developers could run a subset of Claude’s capabilities entirely on-device, with the control plane extending from the cloud to the local NPU.

That scenario is still speculative. What is concrete is that Microsoft is already laying the groundwork for hybrid inference: the Foundry SDK that went GA alongside Claude includes a new LocalModel class that lets a developer write a single code path that attempts on-device inference first and falls back to the Azure endpoint if the local NPU can’t handle the request. The class currently only works with Microsoft’s own Phi models, but the extensibility hooks suggest third-party model support is coming, and Anthropic would be an obvious first candidate.

What Enterprise Architects Need to Do Now

The GA release turns Claude from a science project into a billable resource that can live inside a production VNet. Architects who have been waiting should start with three steps: map their use cases against Claude’s model family (Haiku for real-time chat, Sonnet for most retrieval-augmented generation workloads, Opus for high-stakes reasoning), design a network topology that routes Claude traffic through Azure Private Link with outbound firewall rules locked to the Foundry service tags, and configure the policy engine to mirror the content-safety settings already enforced on their OpenAI endpoints.

Microsoft and Anthropic have also published a joint 42-page “Enterprise AI Governance Blueprint” that walks through reference architectures for banking, healthcare, and public-sector deployments. The document includes sample Terraform modules that provision a fully locked-down Foundry workspace with Claude endpoints in under 20 minutes, a nod to the fact that the majority of enterprises will consume these models through infrastructure-as-code pipelines rather than the Azure Portal. The blueprint also details a “defense-in-depth” approach that layers Azure Policy, Purview data classifications, and Microsoft Sentinel’s AI-threat detection rules into a single compliance framework—something that independent software vendors can now package as a certified solution on the Azure Marketplace.