History of a High Availability / Fault Tolerance solution

About eleven years ago I started my IT-career as an OS/2-system engineer. It was shortly after that when I had my first experience with an high availability solution to service a mail system hosted on OS/2 Warp Server (version 2 I believe).

Vinca1

The solution was a solution called Vinca CO Standby Server. Years went by and in the meantime I switched from OS/2 to Windows NT/2000 and Virtualization with VMware workstation 4 and gsx. I even tried ESX 1.0 at the time, but because of SCSI dependencies and the planned use for it (home lab) I put that on hold.

Later on I took the turn to virtualization and started working with VI3. This was the time I bumped into high availability solutions again (VMware High Availability). What a surprise, VMware HA seemed to be “Legato Co-StandbyServer” formerly known as Vinca!

legato          EMC_Legato_logo_small

Just took a minute to google on these companies, guess what. Both domain names redirect to EMC (mother of VMware, still).

Vinca now refers to EMC AutoStart and www.legato.com has some links to the new locations of the different Legato products.

Looking at the concept of High Availability not much changed in the years. You’ll still need two servers a network connection (heartbeat) and some sort of online/real time replication between them or shared storage. Back in the early days a setup of such a system was definitely something you wanted to be prepared for. And getting a buzz from the beeper at night stating the cluster failed over and did not come up well was not something to be looking forward to.

The focus of availability shifted a little, it used to be at application level (replication of data and still can be with solutions like Microsoft Clustering with use of central storage which is a common good nowadays), now it’s focus is on (virtual) server level.

If I told you back then that in the future the only thing you had to do is mark a select box in a property of a server and that one is running in fault tolerance mode, you would declare me insane. But hear hear, Technology with a capital T did it again and automated the setup of a fault tolerance cluster environment at hardware level- virtual hardware in this case. VMware introduced VMware Fault Tolerance.

VMware Fault Tolerance or FT in short, completely duplicates a virtual machine by cloning it (asynchronously) and by duplicating instructions (synchronously), so that they will be processed by two VM’s (one invisible running on another host) at the same time. The result is two completely identical machines that are protected against (physical) hardware failures. It is important to know that there is no application awareness. If the first vm runs in a windows blue screen or gets a system halt, the second one – it’s counterpart in the FT-team – also suffers from that.

Haven’t been able to test it in real life yet because of the lack of (compatible)hardware in my home lab. The environments I currently work with also do not meet the requirements of FT. So I will have to be creative…

More on this later…

1 Comment

  1. Identical servers is just the beginning of what you’ll need to implement VMware ft. VMware has a check list of about 25 non-trivial config requirements before FT can be used. And once you have it, you’ll be limited to running applications on a single core per server because no software-based fault tolernant product can support SMP. Not too many true mission-critical apps can live with such a limitation. The SMP shortcoming will be overcome, and it won’t be VMware bringing it to market.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.