Tuesday, November 5, 2013

Case of ClusterStorage.000




Recently I worked on an issue: after a reboot of one of the cluster nodes, virtual machines couldn’t migrate back on this node anymore. Cluster events log contained some errors like these ones:

Cluster resource 'SCVMM pxe Configuration' of type 'Virtual Machine Configuration' in clustered role 'pxe' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

The Cluster service failed to bring clustered service or application 'pxe' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
 
I checked ClusterStorage folder and it turned out that there were three ClusterStorage folders with suffixes 000 and 001. 

It all looked like a good reason to dug in to a cluster log

00000cfc.000017c8::2013/11/01-20:00:36.081 INFO  [DCM] Cluster Shared Volume Root is C:\ClusterStorage
00000cfc.000017c8::2013/11/01-20:00:36.081 INFO  [DCM] UpdateClusDiskMembership(enter): nodeSet (1 2 3)
00000cfc.000017c8::2013/11/01-20:00:36.081 INFO  [DCM] CsvFs Listener already started...
00000cfc.000017c8::2013/11/01-20:00:36.081 INFO  [DCM] CsvFlt Listener already started...
00000cfc.000017c8::2013/11/01-20:00:36.081 INFO  [DCM] NFlt Listener already started...
00000cfc.000017c8::2013/11/01-20:00:36.081 INFO  [DCM] DeleteCsvShare: remove csv blockstream C:\ClusterStorage:{db19d832-b034-46ed-a6c5-61e0ebe370d1}
00000cfc.000017c8::2013/11/01-20:00:36.081 WARN  [DCM] Failed to delete csv share CSV$ status 2310
00000cfc.000017c8::2013/11/01-20:00:36.097 WARN  [DCM] rename attempt C:\ClusterStorage => C:\ClusterStorage.000, status 183
00000cfc.000017c8::2013/11/01-20:00:36.113 WARN  [DCM] Renamed existing C:\ClusterStorage to C:\ClusterStorage.001
00000cfc.000017c8::2013/11/01-20:00:36.128 INFO  [DCM] CreateRootDirectory: keeping open handle HDL( bb4 ) to CSV root
00000cfc.000017c8::2013/11/01-20:00:36.128 INFO  [DCM] create CSV stream file C:\ClusterStorage:{db19d832-b034-46ed-a6c5-61e0ebe370d1}

Then I checked EMC PowerPath – and it contained some dead path to our old SAN array. I deleted them, stopped cluster service on the node, and deleted ClusterStorage.000 and .001 folders. Then I started cluster service again. Issue resolved!

Another quite similar issue once happened with our file cluster - again, the culprit was an old csv record that was not deleted correctly.

So, if you'll face similar issues, all you need to do is to delete unnecessary clusterstorage folders when cluster service is stopped and delete obsolete links to old array in your multipath software so that it won't be accidentally recreated.

Hope that this will be helpful for you.

4 comments:

  1. Thank you for your post really helpful

    ReplyDelete
  2. This process worked for me.

    In summary:
    (1) Migrate Roles and Storage off the Node with errors.
    (2) Stop the Cluster Service
    (3) Delete or rename the C:\ClusterStorage folder on the Node with errors.
    (4) Restart the Cluster Service
    (5) Migrate Roles and Storage back to the Node with errors.

    ReplyDelete