Windows Server 2025 KB5062553 Update Sparks IRQL BSOD on AMD EPYC Servers, HPE DL325 Hit

Server administrators managing HPE ProLiant DL325 systems are reporting critical system crashes after applying the July 2025 cumulative update for Windows Server 2025. The update, cataloged as KB5062553 (OS Build 26100.4652), triggers a reproducible Blue Screen of Death with the stop code IRQL_NOT_LESS_OR_EQUAL, pointing to ntoskrnl.exe as the failing module. The crash strikes within minutes of login, effectively knocking affected servers offline and disrupting production workloads. Independent reports from Microsoft Q&A, Reddit, and industry blogs confirm the issue extends to other AMD EPYC-based platforms, including Supermicro H12 boards, turning an isolated hardware quirk into a broader compatibility warning for data center operators.

The July 2025 Update and Its Immediate Fallout

The July 8, 2025 Patch Tuesday release for Windows Server 2025 delivered the usual mix of security fixes and quality improvements. KB5062553, a cumulative update, updates the OS to build 26100.4652 and is designed to be serviced via Windows Update, WSUS, or direct catalog download. Microsoft also shipped an out-of-band update, KB5064489, on July 13 to address a narrow boot issue affecting certain Azure VM configurations. However, this OOB fix does not remediate the IRQL crash reported on bare-metal AMD EPYC servers.

On affected HPE ProLiant DL325 Gen10 (EPYC 7402P) and DL325 Gen10 Plus v2 (EPYC 7443P) systems, the sequence is grimly consistent: after installing KB5062553, the server boots normally, presents the login screen, and then – within two to three minutes of user activity – throws an IRQL_NOT_LESS_OR_EQUAL bugcheck in ntoskrnl.exe. Reinstalling the OS and reapplying only the July update reproduces the crash, ruling out transient file corruption or conflicting third-party patches. The out-of-band KB5064489, when layered on top, changes nothing; the servers still blue-screen post-login.

IRQL_NOT_LESS_OR_EQUAL: A Kernel-Level Memory Access Violation

For experienced Windows administrators, STOP 0x0000000A is a familiar but always unwelcome visitor. The error means a driver or kernel component attempted to access pageable memory at an elevated Interrupt Request Level (IRQL) where such access is forbidden, or referenced an invalid memory address entirely. When ntoskrnl.exe is flagged as the faulting module, it often signals that a third-party driver made an illegal call into the kernel, exposing a latent incompatibility. Common culprits include faulty network or storage drivers, outdated firmware microcode, or race conditions triggered by OS scheduler changes.

The July 2025 update likely altered kernel behavior – perhaps in memory management or interrupt handling – in a way that existing AMD EPYC platform drivers or firmware were unprepared for. The fact that the crash occurs only after login suggests that user-mode services or drivers loaded during that phase are involved. This hypothesis aligns with the broader pattern: BIOS versions, chipset drivers, and device-specific firmware become critical variables that, when out of sync with the latest OS kernel, can precipitate a crash.

Impact: Production Servers at Risk

The practical consequences are severe. A post-login BSOD means the server never reaches a stable state where remote management tools can run reliably. Unattended remediation scripts are useless; administrators must physically console into the machine or rely on out-of-band management (iLO) to intervene. For fleets of identical hardware, the July update effectively acts as a self-destruct mechanism, barring those servers from production until the patch is rolled back.

This is not a VM-specific glitch – the incidents affect physical hosts, which typically run mission-critical workloads. The inability to apply security updates without causing downtime creates a painful trade-off between stability and vulnerability exposure. Organizations that have already deployed KB5062553 to their DL325 estates may find themselves in an emergency rollback situation, especially if the hardware is not easily isolated from the network.

Immediate Mitigation: Roll Back and Block the Update

The only reliable workaround confirmed by affected administrators is to uninstall KB5062553 and defer it indefinitely. This can be done using the command:

wusa /uninstall /kb:5062553

Followed by a reboot. Systems that stabilise after rollback must have the update blocked in WSUS, Windows Update for Business, or Group Policy to prevent automatic reinstallation. If the server is already in a crash loop, booting into Safe Mode (which loads only essential drivers) may provide a window to uninstall the update.

Diagnostic Checklist for Affected Environments

Administrators should treat any crashed server as a forensic opportunity. Collect the following before rolling back, if possible:

Memory dumps: Configure for complete dumps (System Properties > Advanced > Startup and Recovery) and reproduce the crash. Archive the .dmp file.
System event logs: Extract Application, System, and Security logs immediately after a crash.
Driver inventory: Run driverquery /v to list all kernel-mode drivers and their versions.
Firmware details: Record BIOS, iLO, RAID controller, and NIC firmware revisions. On HPE hardware, use the iLO web interface or hpasmcli.
Hardware configuration: CPU model, DIMM population, and PCIe cards can all be relevant.

The memory dump is the single most valuable artifact. Analysing it with WinDbg (!analyze -v) will reveal the exact instruction pointer, the call stack, and any third-party driver referenced by the faulting address. This information is crucial when opening a support case with Microsoft and HPE.

Vendor Response and the Search for a Root Cause

Microsoft’s Windows Server 2025 release-health dashboard acknowledges several July-related issues, and KB5062553’s support page provides installation instructions – but neither currently lists this specific IRQL regression as a known issue. The official KB documentation however confirms the update is cumulative and may require specific installation order when using MSU files, a detail that does not bear on the crash but underscores the complexity of the servicing stack.

Community and blog reports, including a detailed thread on Microsoft Q&A and an article on BornCity, have become the primary aggregators of affected user experiences. Microsoft and HPE have responded to individual cases by requesting memory dumps and firmware inventories, a standard process for OEM-specific crashes. Publicly, no root cause has been announced. Speculation centres on AMD chipset drivers and microcode, but until engineering teams publish a root cause analysis, that remains unverified.

Long-Term Strategies: Patching in the Real World

This incident reinforces several timeless data center disciplines:

Canary testing is mandatory: Mirrored staging hardware must receive updates and run production-like loads for at least 48–72 hours before wider deployment.
Centralised update control: Tools like WSUS, Microsoft Endpoint Configuration Manager, or Azure Update Manager should enforce phased rollouts with automatic pause capabilities.
Vendor intelligence: Subscribe to HPE’s proactive notifications and Microsoft’s release-health RSS. The relationship between OS updates, driver versions, and firmware has never been more fragile.
Automated validation: Synthetic transaction checks and health probes can catch a BSOD-prone patch before it reaches the majority of servers.

For shops that cannot tolerate any downtime, the immediate decision is to keep production on the June 2025 baseline and isolate those systems from the network. Meanwhile, collaboration with HPE and Microsoft support should intensify, providing dumps and configuration data to accelerate a fix.

What Administrators Should Do Now

Pause July 2025 update deployment on all AMD EPYC-based HPE DL325 and related systems. Extend the hold to any Supermicro H12 or similar EPYC hosts if reports emerge internally.
If a server is already crashing, boot into Safe Mode, uninstall KB5062553, and block it. Do not re-apply the patch until a fix is validated.
Gather evidence before rolling back: dumps, logs, firmware versions, and driver lists. Open a premier support case with Microsoft and a case with HPE, attaching all artifacts.
Update firmware and drivers in a lab environment first. HPE may release Model-specific advisories; check the HPE Support Center daily for new DL325 Gen10 firmware bundles.
Monitor the Windows Server 2025 release-health page and the BornCity follow-up article for any official mitigation or root cause disclosure.

The July 2025 cumulative update cycle has once again demonstrated that even routine Patch Tuesday releases can carry hidden hardware-specific time bombs. While the industry waits for a verified solution, server administrators must exercise prudent caution, balancing the urgency of security updates against the certainty of production stability. The machines will not patch themselves, but a downed server patches no vulnerabilities at all.