Reality Check: Five Nines: A Telecom Myth
The coveted “five nines” (99.999 percent uptime) is a telecom myth. (Isenberg, David. “The Myth of Five Nines.”) No data, study, facts, or evidence even remotely suggest that any residential telecom service has ever performed to such a level.
A telecom service is a complete service – such as a telephone call – that includes the extension of that service from the subscriber to another subscriber, a Web server, etc. Five nines is an admirable ideal, but it is not a practical goal for a residential service because most of that first or last mile to and from the subscriber is vulnerable to any number of service-impacting events.
Any claims to such availability are misleading. Far from five nines, telcos do not even monitor subscriber lines for voice service interruptions. Instead, they rely on subscriber reporting.
Telcos do not even have a reasonable means of intelligently detecting a service outage affecting a customer. When telcos make such claims, they may be based on the availability of some subsystem of their service, such as E911 availability. Even then, a single subscriber outage of 6 minutes would mean that that service was not universally five nines for all of a telco’s subscribers.
The truth is that five nines is a standard that the telcos like to envision themselves meeting. Five nines of availability measured over a one-year period would mean no more than about 5 minutes and 15 seconds of downtime. (See Table 1.)
Precisely because it is an unreasonable goal, the myth of five nines contributes to poor service. Only with realistic goals can operators strive to improve service.
Residential telephony service operates more typically in the 98 percent to 99.95 percent range, depending on geography and weather conditions. A difficult but reasonable goal would be 99.95 percent average across all subscribers nationwide. In some areas, a higher goal of 99.95 percent to 99.99 percent may be possible.
Measuring availability of subsystems can also be useful. In particular, subsystem availability can often be more easily attributed to the efforts of a team within an operator’s organization. This ownership can make setting, achieving and performing to an objective goal possible and measurable.
Among MSOs tracked in calendar year 2007 (see http://royal.pingdom.com), Comcast was one of only a few Web sites nationally to maintain five nines or better availability. By contrast, Google had about 7 minutes of downtime that same year, placing it in the reasonable range of three to four nines.
Increasing availability is the same as decreasing outages or downtime. That means reducing system failures and planned outages. The least costly method of increasing availability is to reduce the time to repair.
Increasing system reliability (the probability of a failure) is another element of increasing system reliability. The system is composed of all of the subsystems required to make a service work. Complex systems can employ many functions to reduce failure probability for the system, while still allowing for the reasonable failure of components, subsystems and even human error.
Holding engineers and operators responsible for system availability is reasonable if the goals are reasonable and quantifiable. Most operators could improve their service availability by beginning with separate but parallel efforts to improve subsystem availability.
With no industry-wide data or studies available, it is not reasonable to compare service with the telcos’ myth. As our peers at Comcast have demonstrated, cable operators can lead the way by demonstrating our commitment to service in ways that can be objectively measured.
Victor Blake is an independent consultant. Reach him at [email protected].