June 11, 2021

How reliably does my high-availability solution work in an emergency?

AuthorHans-Joachim Krüger

Potecting software systems and data against failures is generally accepted and elementarily important for business operations, whether it is simple backup solutions or high-end backups in the form of high-availability solutions (HA solutions).

But in an emergency, how do I know that my HA solution will actually work?

When do HA solutions fail?

Here are our top five reasons why a high-availability solution fails in an emergency:

TOP1: The HA solution has never worked
The implementation has caused problems and errors from the beginning; it's just that no one has ever seen them and fixed them. This makes the whole solution useless.

TOP2: A new database version was implemented
But with the new database version, the changes to the HA solution were not performed. Thus, the HA does not function after the conversion.

TOP3: New file systems were installed
Due to increased data volume, new, larger file systems were set up. Again, the HA solution was not adapted and does not work anymore.

TOP4: Additional nightly backup was set up
Setting up an additional backup system is very good. But it must also take into account the existing backup solutions. And this often does not happen. So probably the nightly backup works, but the HA solution does not work anymore.

TOP5: An operating system update has been applied
Such an update also has an impact on the HA solution, which no longer works in the event of an emergency.

Fire drills for prevention

I can only find out whether my HA solution really works in an emergency if I test it regularly. Fire drills have proven to be a good way to do this. Such drills should take place once or twice a year. You can either do them yourself in your IT or get help from your IT service provider. At a time when operations are low, e.g. on Saturday morning, a test switchover of the HA solution to the failover system takes place.

With the successful switchover, the applications, e.g. SAP, are also tested to see if they can run with the failover system. If there are problems with the switchover or with the application, improvements, corrections, or optimizations must be made. If the switchover is successful, you now have the certainty that your HA solution works.

For all fire drills, document the test switchover for management, internal audit, or whoever might ask. At the conclusion of the exercise, you will need to reset the HA solution.

Also relevant for other backup types

What is described here as a fire drill for an HA solution, we also recommend for daily backup and disaster recovery solutions. Because nothing is as constant as change. Such exercises help to support change and give your IT security.

Recommended article
September 23, 2022 Data loss: How to protect your data and IT

All blog articles