Hyper-V Cluster CSV EventID 5120 in 2018... Wait, what ?

Submitted byOrsiris de Jong onven 13/04/2018 - 12:07

Some months ago I inherited of a Hyper-V infrastructure in pretty bad shape, where I had to reinstall the hypervisors one by one, adding FC cards for redundancy, adding network card redundancy etc.

Ever since I got to manage that infrastructure, I got hit by the following error messages while perfoming backups:

  • System Error 5120 Cluster Shared Volume "CSV Name" has entered a paused state because of '(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished.
  • Unexpected failure. Error code: 48F@01000003
  • 5120 CSV entered paused state (c000020c)
  • 5142 CSV no longer accessible (1460)
  • ntfs - disk has been surprise removed
  • 1069 cluster resource failed, error 0x3 ('The system cannot find the path specified.').

Since I've almost outruled any possible physical issues, I followed a lot of threads where people experienced that issue. Most of these threads were 2013ish, and suggested to install some patches that I already had included in my Windows Server 2012 R2, along with disabling ODX via a registry key (HKLM\system\currentcontrolset\control\filesystem\FilterSupportedFeaturesMode set to 1 in order to disable ODX).

I also disabled parallel backups which made the issues a bit least frequent.

In the end, nothing helped to get rid of these issues. As a side effect, most of the Unix machines on that Hyper-V infrastructure freezed randomly while backing up.

After having thought of about every possible issue that I could find, I noticed that some luns never got those nasty c00000b5 / c0000128 errors.

Tracking down the differences between those luns, I noticed that all failing luns had VM disk and configuration files spread across other luns.

From an operational point of view, this is far from any hyper-v best practices.

I identified all machine disk and configuration file paths with the following powershell command:

get-vm * |sort-object| fl Name,path,configurationlocation,snapshotfilelocation,@{L="Disks";E={$_.harddrives.path}}

After having manually moved all files from the same VM to a unique lun... TADA !

All of those nasty errors disappeared !

Btw, quick backup tip: Ensure that the Heartbeat link isn't saturated while backing up (I/O data goes through the network while backing up). I've upgraded the Heartbeat from a single 1Gbits link to a redundant 10Gbits link, which really made a difference in backup speeds and of course live migrations.