Microsoft: Red Sea Cable Cuts Cause Azure Latency Spike, Rerouting Underway

Microsoft has confirmed that multiple undersea fiber-optic cable cuts in the Red Sea are causing higher-than-normal latency for Azure cloud customers, forcing traffic onto longer alternate routes while repair operations are coordinated. In a Service Health advisory, the company warned that users with data traversing the Middle East corridor could experience “higher latency on some traffic” and that engineers are actively rebalancing and optimizing routing to minimize disruption.

The Incident in Brief

The global internet relies on a dense mesh of submarine cables, with the Red Sea serving as a critical choke point connecting Asia, the Middle East, and Europe. When several high-capacity segments in this corridor were damaged, the sudden loss of throughput sent shockwaves through transit providers and cloud networks alike. Microsoft’s update specifically noted that traffic previously routed through the Middle East would see increased round-trip times, while flows not dependent on that path remain unaffected.

This is not the first time the region has experienced cable faults; similar events in recent years have highlighted the fragility of this key transit area. However, the current incident underscores how even the most resilient cloud architectures are ultimately dependent on physical infrastructure that can take days or weeks to repair.

Anatomy of a Subsea Cable Cut and Cloud Disruption

When a submarine cable is severed — whether by a ship’s anchor, geological activity, or other causes — the immediate effect is a sharp drop in available bandwidth on that link. Internet routing protocols like BGP automatically detect the failure and redirect traffic onto alternative paths. For cloud platforms, these reroutes can mean longer physical distances, more hops, and increased congestion, all of which translate into elevated latency and jitter for applications.

Azure’s control plane and data plane both rely on extensive global backbones, but physical diversity is finite. Even with redundant routes, the Red Sea corridor concentrates so much east-west capacity that when multiple cables are damaged, the remaining paths cannot fully absorb the load without performance degradation. Microsoft’s advisory captured this precisely: “We do expect higher latency on some traffic that previously traversed through the Middle East.”

Which Azure Services Feel the Pain?

Not all workloads are equally sensitive to latency spikes. The most visible symptoms are likely to appear in:

Synchronous cross-region replication: Database mirroring, storage sync, and real-time analytics rely on tight latency bounds; increased RTT can cause transaction delays or timeouts.
Chatty APIs and microservices: Services that make many sequential calls between regions will see overall response times stretch dramatically.
Large data transfers: Backups, migrations, and media streaming between Asia and Europe may slow noticeably.
Latency-sensitive applications: VoIP, video conferencing, and gaming workloads can suffer from jitter and packet loss.

Control-plane operations like provisioning and VM management typically use separate routing paths and may remain responsive. Microsoft’s ExpressRoute private connectivity service will also be affected if its physical transit relies on the impacted corridor.

Why Repairs Take Days, Not Hours

Fixing a broken subsea cable is a major engineering feat constrained by three hard realities:

Specialized repair ships: The global fleet of cable-laying and repair vessels is limited; scheduling a ship to reach the fault site often takes several days, even in ideal conditions.
Permitting and security: Working in national or contested waters requires clearance from local authorities. In the Red Sea, geopolitical tensions can add weeks of delays.
Harsh marine environment: Weather, sea conditions, and the depth of the break can complicate splicing new cable sections.

Because of these factors, full restoration of physical capacity is measured in days to weeks, not hours. That is why cloud providers immediately turn to traffic engineering and rerouting rather than waiting for repairs.

Microsoft’s Mitigation Playbook

In response to the outage, Microsoft’s network engineering teams have implemented a set of industry-standard actions:

Dynamic rerouting: Adjusting BGP policies to push traffic onto remaining viable paths, even if they are longer.
Capacity leasing: Temporarily buying transit capacity from partner carriers to relieve congestion on saturated links.
Traffic rebalancing: Optimizing internal backbone flows and peering to avoid hotspots.
Increased monitoring and communication: Providing daily Service Health updates and direct notifications to affected subscriptions.

These measures reduce the risk of a complete outage but cannot eliminate latency increases entirely. As Microsoft put it, “Undersea fibre cuts can take time to repair, as such we will continuously monitor, rebalance, and optimise routing.”

A Timeline of Uncertainty

Short-term (hours to days): Performance will be uneven. Some regions and customer paths may see near-normal latency, while others experience seconds of delay. Temporary traffic engineering should stabilize most flows, but baseline latency will remain elevated.

Medium-term (days to weeks): The key variable is how quickly repair ships can be mobilized. If multiple cable systems require restoration, the timeline extends. Historical patterns suggest a repair window of one to four weeks, but geopolitical factors could lengthen it.

Long-term (months): Even after physical repairs, carriers may keep traffic on alternative paths for days while they test restored segments, leading to a tail of residual latency. The industry is likely to renew calls for greater repair capacity and streamlined permitting.

What IT Teams Can Do Right Now

Azure administrators and cloud architects should take these immediate steps:

Check Service Health: Log into the Azure portal and review any incident advisories linked to your subscriptions. Subscribe to alerts for real-time updates.
Identify affected regions: Map your critical workloads to see which cross-region flows traverse the Middle East. Use tools like Network Watcher or third-party monitoring to quantify latency changes.
Tune application timeouts: Increase client-side and server-side timeouts, and implement exponential backoff with jitter to prevent retry storms.
Reschedule bulk transfers: Defer non-urgent backups, data migrations, and large batch jobs until the network stabilizes.
Leverage caching and CDNs: Serve static content from edge nodes closer to users to reduce transcontinental calls.
Run a failover drill: Simulate degraded network conditions to see how your applications behave under increased latency. Validate that automatic failover to other regions works as expected.

In the medium term, consider migrating latency-critical workloads to regions outside the affected corridor, if data sovereignty and architectural constraints permit.

Systemic Weaknesses Exposed

This incident highlights several structural vulnerabilities in the cloud’s physical backbone:

Correlated path risk: Logical redundancy (multi-region deployment) does not guarantee physical diversity. Many supposedly independent routes may share the same submarine cable chokepoint, leading to correlated failures.
Repair ship scarcity: The limited number of specialized vessels means that even well-funded operators are subject to a backlog of repairs when multiple faults occur.
Geopolitical friction: Permitting delays in contested waters add a layer of unpredictability that cannot be eliminated by technology alone.
Application brittleness: Too many services are built with an assumption of low, stable latency. Hardened retry logic, circuit breakers, and eventual consistency patterns should be standard practice.

Beyond the Immediate Fix

While Microsoft’s traffic engineering will blunt the worst effects, the only permanent cure is physical repair. Until then, enterprises must treat latency as a managed risk. This means continuously validating the true physical diversity of your cloud architecture, investing in active-active multi-region designs, and maintaining clear escalation paths with your cloud provider.

Industry and policymakers also have roles to play. Incentivizing construction of new repair ships, streamlining international permitting processes, and encouraging the deployment of additional cable systems along diverse routes would all reduce the recurrence of such disruptions.

The cloud is software abstracting away hardware, but the hardware remains a web of glass fibers at the bottom of the sea. When those fibers break, the abstraction leaks — and the only answer is a combination of engineering ingenuity and logistical sweat. For now, Azure customers should brace for intermittent sluggishness, adjust their operations accordingly, and keep a close eye on Microsoft’s twice-daily status page.