Building a culture of high availability

By January 18, 2018

IT Operations, Software Defined Storage Technology, High Availability, Mainframe Skills

What customer doesn’t expect the applications they use to work all the time?

What business doesn’t want high availability from its IT systems?

In today’s world, both customers and businesses have high expectations. Customers want to bank and shop when it’s convenient for them, and business leaders want the systems providing these services to be always available. But delivering high availability — even with the latest advancements in technology — can remain elusive, because technology alone can’t provide it.

Yes, the mean time between failures on some IT components can now be measured in decades, but we should never forget the adage that “eventually all hardware will break, and eventually all software will work.” Businesses invest in reliable technology, but some stop there. Their technology reliability expectation becomes their availability plan. We come to depend on those services, expecting them to always be there, so what happens when something does break? Is your business ready for that?

If simply buying better technology isn’t the answer, what is it? It’s simple really. Assume everything can fail, and some of it will. Create an IT culture in which everyone keeps asking (and answering) these questions:

  • What can I do to minimize failures?
  • What will happen if something does fail?
  • How do I minimize the impact when it fails?

High availability requires the right recipe of technology, people and processes built around a culture that not only supports high availability but strives for it. Without this, IT organizations will forever be putting a band aid on their outages.

How to create a culture of high availability

A culture of high availability has to start at the top of the business, which requires organizational objectives that support the goal of achieving near-zero downtime. It means everyone in IT is driven to achieve zero downtime, including architecture design, application development, system administration, operations and so on. An application team that’s driven to roll out new features won’t be focused on exploiting the resiliency in their technology. A system administration team that’s not given maintenance windows won’t be able to keep firmware current. An operations team that only monitors components may not find broken services until the customer calls.

This doesn’t mean everyone in IT owns service availability. There should be a role within the IT organization for that. That owner needs to be proactively focused on building, maintaining and delivering highly available services.

Organizations must make the proper investment to achieve zero downtime. All applications are not created equal, and they don’t necessarily require the same investment, but the business and IT must have a clear understanding of what value each service brings and how to invest in each to deliver what’s expected. Does the business side of your company understand the cost of downtime for the services they deem critical? If not, how can they be sure they’ve made the right investment? The services that the business expects to be “always on” require the right investment.

Continuous improvement

A strong service management framework can help set the stage for continuous improvement; however, truly achieving it will depend on the culture of the organization. Does your organization have objectives that lead to continuous improvement? Do those targets change over time? Are all failures (even those that don’t trigger a service outage) inspected to see what can be improved? Does your business have the right metrics in place to provide a warning when things are headed in the wrong direction before an outage occurs?

Experience has shown that many IT outages can be traced back to process errors. In some cases, the percentage of outages from process errors can be as high as 50 percent. Ignoring, or failing to fix, the process errors or gaps you have experienced can only lead to them reoccurring.

Although there are several service management frameworks, a common one seen in IT today is the Information Technology Infrastructure Library (ITIL). While ITIL may not provide all the answers, adopting a particular service management framework with proper education and strong management support can help speed the creation of the right culture for achieving high availability.

Where to start

One way you can assess where your business stands is to use an independent team to review and assess your technology landscape and service management framework. This can help you identify gaps and single points of failure and determine what actions will close those gaps.

The High Availability Center of Competency (HACoC) in IBM Systems Lab Services is a team built exactly for this purpose. To contact us, please send us an email.

[autopilot_shortcode]