Issue:
In a 5 node Hyper-v 2012R2 cluster, all of sudden VM backups are failing on only one node(HOST2) i.e., backup team unable to take backup if any VM hosted on HOST2.
Observation:
- When backup team is firing VM level backup on the HOST2, Backup is getting terminated with the VSS snapshot error..
- If VM’s migrated to other node then backup is getting success for the same VM
- Observed issue not specific to VM or any cluster shared volume -> Issue is occurring only if VM’s hosted on HOST2
Troubleshooting:
- As issue specific to HOST2, tested VM backup with windows native backup tool -> Unable to take backup , terminating while creating VSS snapshot.
- Created new VM on local D drive -> Tested with Windows backup tool -> Backup is getting success with windows backup tool if VM hosted on local drive, VM backup failing only if it is on Cluster shared storage
- As issue specific to one server & CSV writer on HOST2 -> Started troubleshooting from the side of CSV writer
- Done deep level analysis of event logs -> which indicates towards CSV writer unregistered -> Check below screenshot
- Run the command “vssadmin list providers” on HOST2 and compared with other servers -> it has been observed that provider “Microsoft CSV Shadow Copy Provider” is missing from HOST2 ->Screenshot attached
- As CSV provider is missing on problematic HOST2 -> Fixed issue by exporting CLSID provider from working server and imported to HOST2 ->Check below screenshot
- Post import , ran the “vssadmin list providers” -> Now provider list is same as working servers
- Backup is working fine post fixing all..
Error Screenshots
Volume Shadow Copy Service (VSS) provides the ability to create a point in time image (shadow copy) that can be used to perform backups. In our environment, backup of VM failed immediately which was hosted on HOSt2 node, once it shows as “Snapshot Processing”. This means, snapshot operation is not happening. Provider ID(400a2ff4-5eb1-44b0-8a05-1fcac0bcf9ff) which is reflecting in Event viewer logs is related to MS CSV Shadow Copy Provider, which is not existing in registry editor as it might have unregistered.
Working Server(HOST3)
Not Working(HOST2) ->CLSID is missing
Final Screenshot