Follow me on Twitter

Monday, October 30, 2017

Updating a vSAN Cluster using VUM

I wanted to shed some light on what some may think is a mystery or difficult task. How do I update a vSAN cluster using vSphere Update Manager (VUM)? The quick and simple answer is easily! The great thing about vSAN is that you already know how to upgrade and patch it since it is built into the ESXi hosts and vCenter web client that you already use daily. I will take you through the process of upgrading a host in my home lab. Remember as with any upgrade make sure that your hardware is supported and on the VMware Hardware Compatibility List (HCL) before proceeding with ANY upgrade.

The first step is to make sure you have a baseline attached to your cluster and/or Hosts. You can navigate to the update manager tab in the vCenter Web Client to do this. A baseline is a group that you setup for either patches, or upgrades and then you assign it to physical ESXi hosts or Clusters. You can see here that we have baselines created and VUM tells us that this particular host that we have highlighted is not compliant with this baseline and needs to be patched in order to be in compliance. We will go ahead and create a baseline for this host so you can see that process.

For this example I am going to create a baseline for critical host patches. I will check that box when creating the baseline, in this case since the critical host patches is a predefined baseline I don't need to go any further to assign to my hosts, another way it can be done is by right clicking a cluster or a host and then choosing to attach a baseline that way as well.

Here you can see the current patch level on the host is ESXi 6.5u1 build 5969303 in its non compliant state.

Now that we have created a baseline and told VUM to scan for updates and can see that we are not in compliance, we can start the process of upgrading. What I like to do first is enter the host into maintenance mode, VUM will do this for you but I prefer to do this step manually. 

As we enter maintenance mode we are presented with a choice on what to do with the vSAN data that resides on this host, while in maintenance mode. We can evacuate all data to other hosts in the cluster, a very nice addition in vSAN 6.6 is the ability to see how much data would need to be moved. Per VMware recommendations there is no need to do a full migration for only a patch, that will take hours, or days depending on what your data is like on the vSAN array. Ensure accessibility is sufficient in this case as the host will only be down for a few moments. If it is down for longer than that due to some sort of issue, be it hardware or software, depending on your SPBM and your vSAN node design, you should at least have 1 copy of the data and after 60 minutes if a host doesn't come back up vSAN will self heal back to full redundancy on the remaining hosts (provided you have at least a 4 node RAID 1 setup required for self healing or 5 nodes in RAID5)

Now that we have our host set to go into maintenance mode, migrate all VM's off of the host with vMotion, once the host has fully gone into maintenance mode you can click the remediate button to start the patching process. You will select your baseline as the critical host patches. In my case the same patch was also listed as a vSAN recommended patch as well. Click Next.

Here we can see how many patches this baseline will apply, this will be only 1 patch here to get to the latest version. Click Next.

Here we can see the actual patch name to be applied, what product the patch is for, and its actual release date.

On this next step we see that we have options to run this patch install as a scheduled task. We also see an ignore warnings box. I have always run my patches right away as I want to keep an eye on everything and make sure things come back up OK.

You can choose your VM options on this screen, but again the way I do this to make sure that ALL VM's are vMotioned off before I get to this point simply for more control, but this screen is where the host will vMotion VM's off the host and go into Maintenance mode if you choose to do it this way. In a vSAN environment I would recommend you do these options manually as well. It is great that the system can do it, I just prefer seeing each step through. Also on each one of these coming screens you will see the option to save the parameters you choose on that page as the default host remediation options going forward, so you do not have to choose them every time. This is a handy time saving feature I would recommend taking advantage of.

We see more options for the cluster level on this screen and we have to turn off DPM during the remediation as well as turn off FT, and HA admission control, if you do not do it in this step you will get a warning of it being on in a later step and it will prevent the remediation process from going any further. 

We are almost ready to hit finish, but at the top there is another step I always do, just as a precaution and that is to run the Pre-Check wizard, it will warn you if a service is on that will prevent Remediation from occurring.

Here is what the pre-check report looks like with HA admission control still turned on in the cluster, it will warn you that it is on and then it will give you a recommendation on what action to take to rectify the situation so that you can continue with the patching. 

From here on out the process is automatic. You will see the host do what is essentially a live image boot onto the new ESXi version and then it will download the installation script from Update Manager and install the new version. It will take a while for the host to do this and then reboot a second time where it will boot from the installed version. You can now see the version shows as build 6765664. It will take a while for vSAN to reinitialize the SSD's in my case and then will finish booting.

At this point the host will show back up in vCenter. You can now exit maintenance mode and the cluster will go back to normal functionality. You can go to the monitor tab and click on vSAN and then the Resyncing Components section and you will notice that the vSAN disks will be resyncing data that changed on the cluster disks while this host was offline. It shouldn't take long for this to complete, but you can watch the progress here and it will show you which components from VM's that are resyncing and how much data is left to resync from each component.

We can go back to the vCenter Web Client Update Manager page and run a Scan for Updates check and see that it is now compliant with the patch baseline that it wasn't compliant with prior to the patch cycle. One other thing, I also go back and make sure to turn back on the HA admission control that was turned off on the cluster during the remediation. I hope that helps answer some questions on the process and showed how easy it is to maintain patch levels on your vSAN clusters. Until next time. Cheers!

No comments:

Post a Comment