When we have critical applications inside our virtualized environment we cannot give us the chance to let them fail, considering that, HA is not an option since it will power off/shutdown the VM and power on in another node, that means “downtime“, thankfully VMware offers a continuous availability option called Fault Tolerance (FT).
VMware Fault Tolerance provides continuous availability for virtual machines by creating and maintaining a Secondary VM that is identical to, and continuously available to replace, the Primary VM in the event of a failover situation, both VMs exchange heartbeats to monitor each other’s status.
With this said, sounds like the best option to keep of applications running forever, but at which cost? well, we need to first know the requirements if we want to enable this feature.
Cluster Requirements for Fault Tolerance
- Fault Tolerance logging and VMotion networking configured. <—– Very important!!!!
- vSphere HA cluster created and enabled.
- vSphere HA must be enabled before you can power on fault-tolerant virtual machines or add a host to a cluster that already supports fault tolerant virtual machines.
Host Requirements for Fault Tolerance
- Hosts must use supported processors.
- Hosts must be licensed for Fault Tolerance.
- Hosts must be certified for Fault Tolerance.
- The configuration for each host must have Hardware Virtualization (HV) enabled in the BIOS.
Virtual Machine Requirements for Fault Tolerance
- No unsupported devices attached to the virtual machine.
- Incompatible features must not be running with the fault tolerant virtual machines.
- Virtual machine files (except for the VMDK files) must be stored on shared storage.
Acceptable shared storage solutions include Fibre Channel, (hardware and software) iSCSI, vSAN, NFS, and NAS.
Other Configuration Recommendations
- If you are using NFS to access shared storage, use dedicated NAS hardware with at least a 1Gbit NIC to obtain the network performance required for Fault Tolerance to work properly.
- The memory reservation of a fault tolerant virtual machine is set to the VM’s memory size when Fault Tolerance is turned on. Ensure that a resource pool containing fault tolerant VMs has memory resources above the memory size of the virtual machines. Without this excess in the resource pool, there might not be any memory available to use as overhead memory.
- To ensure redundancy and maximum Fault Tolerance protection, you should have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created.
Important: The failover of fault tolerant virtual machines is independent of vCenter Server, but you must use vCenter Server to set up your Fault Tolerance clusters.
More information on VMware’s Documentation.
The maximum number of fault tolerant VMs allowed on a host in the cluster. Both Primary VMs and Secondary VMs count toward this limit. The default value is 4.
it can be increased with the advanced setting called: das.maxftvmsperhost
The maximum number of vCPUs aggregated across all fault tolerant VMs on a host. vCPUs from both Primary VMs and Secondary VMs count toward this limit. The default value is 8.
it can be increased with the advanced setting called: das.maxftvcpusperhost
Important: Make sure you have enough resources available in your cluster before changing this setting and start enabling FT in more VMs.
The number of vCPUs supported by a single fault tolerant VM is limited by the level of licensing that you have purchased for vSphere. Fault Tolerance is supported as follows:
vSphere Standard and Enterprise. Allows up to 2 vCPUs
vSphere Enterprise Plus. Allows up to 8 vCPUs
Alright, when we made sure all requirements were meet, it is time to configure and enable FT.
For this demonstration, I am going to use HOL HOL-1910-01-SDC – Virtualization 101: Introduction to vSphere
Required: Enabling HA and FT logging (I enabled it on vmk0), FT test will be performed on VM TinyLinux-01.
- Step on Cluster Site A and click on the Configure tab.
- Step on vSphere Availability and click on Edit.
- Click on Turn ON vSphere HA and then OK.
Enable Fault Tolerance logging on vmk0
- Select esx-01a.corp.local and click on the Configure tab.
- Go to VMkernel adapters and edit vmk0.
- Click on Fault Tolerance logging.
- Repeat the steps on each node.
Enable FT on TinyLinux-01
- Right-click on TinyLinux-01 VM.
- Go to Fault Tolerance
- Click on Turn On Fault Tolerance.
4. Select ds-iscsi01 datastore.
5. Select the secondary host.
Important: Note that there is a warning saying that the primary machine lives in the same datastores where the second one will live, in a real-life scenario, make sure to place the secondary VM in another datastore visible to both nodes.
6. Review and Finish the wizard.
The FT configuration will start on TinyLinux-01 VM
Once finished, you will see that the TinyLinux-01 VM will have a darker color and the status Faul Tolerance status will change to Protected.
And how do we know for sure this works?, well go ahead and failed the node, (Just kidding), instead of causing a node failure you can simulate it by performing a failover test.
TinyLinux-01‘s primary VM is running in esx-01a.corp.local, by inducing this failover TinyLinux-01‘s secondary VM in esx-02a.corp.local will be now the primary VM.
Here is the Test Failover procedure:
- In the vSphere Web Client, browse to the Primary VM for which you want to test failover.
- Right-click the TinyLinuyx-01 and select Fault Tolerance > Test Failover.
- View details about the failover in the Task Console.
I ran the ping command from TinyLinux-01 VM to the jump box’s IP and we did not miss a single ICMP request.
Now the primary VM lives on esx-02a.corp.local and the secondary VM lives in esx-01a.corp.local
And now, you will have highly available VMs that are able to remain up during host failures,
Hope you enjoyed this post and don’t forget to share and comment.