April 20, 2022

Libelle IT Glossary Part 8: High Availability - Unrestricted Operation Despite System Failure

AuthorDaniel Krüger

Especially within the IT industry the topic "high availability" is gaining attention. Especially during operation, a system failure can quickly become expensive. Therefore, ensuring the availability of their systems and system components is a "MUST" for companies. In this context, the establishment of a high availability concept is urgently required. Ideally, there should be no or only a short interruption during a failure. But how do companies define high availability?

Definition of high availability

High availability (HA) describes the permanent availability of a system or component in a company's (IT) infrastructure, even an  event of a failure.
For this purpose, companies often use specialized high availability software solutions, which thus protect not only against the consequences of hardware and application errors, but also against the consequences of elemental damage, sabotage or data loss due to human error.

Is high availability measurable?

To have a rough idea of how high availability currently standswithin a company, people often resort to this old familiar formula:

Availability = (minutes per month - minutes of downtime) * minutes per month.

But do companies have to guarantee a 100% high availability or is a partial solution also sufficient and how can companies achieve this? In our eighth part of the Libelle IT Glossary, we clarify precisely these questions.

AEC: How much high availability does a company need?

Especially in terms of costs, IT managers as well as  administrators within companies ask themselves how highly available their systems really need to be and what downtimes are tolerable on a daily basis.

With the help of the Availibilty Environment (AE), which was created by the analysts of the Havard Research Group, companies can classify their "availability environments". This is also referred to as the Availibility Environment Classifications (AEC) classification. The individual factors highlight the impact that the failure of the corresponding services and systems has on the company and the end users / customers:

AE-0: Here, without it supposedly becoming business-critical, operations can be interrupted. The availability of data is therefore not business-critical within ongoing operations. In the event of a failure, this means for end users that data could be lost or damaged.

AE-1: This availability class is about business functions, these can be interrupted. However, it must be ensured that the system continues to guarantee data availability. From the customer's point of view, there is an unforeseen interruption of work and shutdowns that cannot be controlled. However, the integrity of the data is still guaranteed. Data is available as a backup copy on redundant storage. To detect incomplete transactions or to restore the data with the help of the backup, file systems with journal functions or corresponding log functions (log-based) are often used.

AE-2: At this availability level, there may be only minimal interruption of services within the business functions. A precise time frame is defined. Users experience a brief interruption, after which they can log on again immediately. However, in individual cases, you may have to run some transactions again with the help of the log files, and you may notice a deterioration of  performance.

AE-3: This class of high availability is about business functions that have to run without interruption. This applies to a precisely defined time window, mostly for certain hours of a day as well as most days of a week in a fiscal year. Thisway, everyone can work constantly and without interruptions. Nevertheless, it may happen that a transaction has to be repeated, but the user will not notice this by an interruption in service, but at most by loss of performance.

AE-4: All business functions require continuous operation of IT and services. For users, therefore, any errors that may occur must be communicated in a completely transparent manner. This means that systems must ensure 24x7 operation and uninterrupted work. Often, companies use software that mirrors the data and systems for this purpose.

How to achieve high system availability

To minimize or avoid interruptions to the business, a highly available system should be able to recover quickly from any type of failure condition. The following points can help and support to ensure High Availability in your business:

  • Finding and eliminating single points of failure or system nodes.
  • All systems and data should ensure easy recovery
  • Use load balancing to distribute application and network traffic across servers or other hardware components.
  • Continuously monitor the health of the backend server
  • Distribute resources in the event of power outages or natural disasters (geographically)
  • Implement reliable crossover or failover solutions related to storage
  • Establish a system that immediately detects failures
  • Designing a system part for high availability and ensuring functionality through regular testing

Prevent the emergency with Libelle BusinessShadow®

With our Libelle BusinessShadow® solution for high availability and disaster recovery, you can mirror SAP® landscapes and other application systems on a time-shift basis. Your company is thus protected not only against the consequences of hardware and application errors, but also against the consequences of natural hazards, sabotage or data loss due to human error.

Would you like to learn more about IT terms? For example, what exactly business continuity means or what  the difference between productive, development and QA systems is? Then visit our Libelle IT Glossary or follow us on LinkedIn.

Recommended articles
December 22, 2022 Libelle IT Glossary Part 22: What is DevOps?
September 23, 2022 Data loss: How to protect your data and IT

All blog articles