June 1, 2005
Creating a NOC from Scratch
Susquehanna Shows how It's Done
By Art Cole
Jeffrey Tate had a problem.
As senior vice president of engineering and technology at Susquehanna Communications (SusCom), a mid-sized operator headquarterd in central Pennsylvania with operations in six states, Tate was charged with bringing several recently acquired systems into his current network. This was headache enough, but one of the new systems, serving Carmel, PA, had a circuit-switched voice operation up and running, which meant he was faced with the prospect of adding an entirely new level of monitoring and control.
Tate's solution? Build an entirely new network operations center (NOC).
"The trigger for the NOC was most definitely the acquisition of the Carmel system," he says. "We were of a mind that we simply could not operate a telephone service with the system we had in place."
SusCom's existing monitoring and control facilities consisted largely of independent Cheetah systems located within each regional headend. Not only was it costly to maintain, but due to the amount of manpower needed to monitor each network independently, it only provided reasonably reliable coverage for about two-thirds of the subscribers.
The first order of business, therefore, was to design a new facility that would not only enhance coverage, but reduce costs, as well. The easy answer was to centralize monitoring and control, but exactly how this was to be executed remained to be determined.
As with any new facility, location was a prime consideration. An existing headend could have fit the bill, but it would have meant cramming even more equipment into already tight real estate. The company decided that a standalone building was in order. And because Susquehanna now had voice services to consider (not only the newly acquired circuit-switched system, but planned VoIP rollouts as well) durability was a prime consideration.
"We chose a former bank. It's a pretty sturdy building," says John Faulkner, NOC manager at SusCom, who oversaw construction of the York, PA, facility and now runs the day-to-day operations at what he prefers to call the "Network Reliability Center" (NRC).
To begin with, a robust and fault-proof power supply was a must. "The NRC has everything exclusively," Faulkner says. "It has its own power, heating and air conditioning, as well as its own network connectivity." Redundant servers in York and Williamsport, PA, also ensure continuity in case of a major failure.
With network reliability taken care of, the next step was to create a facility that would reduce costs and provide greater service to customers. "We were looking to change the way we deal with customers entirely," Faulkner says.
"In the old days, we relied on customers to alert us to outages. The philosophy behind the NRC was to become more preventive and proactive."
A key piece of equipment was the alarm management software. SusCom selected the Spectrum service assurance software from Aprisma, now a unit of Computer Associates. What impressed SusCom most about the Spectrum software was its ability to handle both Internet control message protocol (ICMP) and the more intelligent simple network management protocol (SNMP) data. This allowed operators to begin communicating immediately with the large installed base of ICMP devices on the network while offering the ability to easily upgrade to more advanced equipment.
"ICMP is a simple network management protocol," Faulkner explains. "ICMP sends out a (data) package looking for a particular device, and if it doesn't respond, an alarm is sounded. SNMP utilizes intelligent devices to transmit information back, so we can see where high bandwidth utilization is and where disc space is available. And if something does go wrong with a device, we can see exactly when it went down and what steps occurred just before it happened."
Trent Waterhouse, vice president of marketing for Spectrum, says Aprisma developed the software as a means to cut the cost of network operations.
"Lots of tools can tell that you have a problem, but not what caused it or who is impacted by it," he says. "But if you know who is affected, you can prioritize response efforts."
It's common knowledge that truck rolls eat into profits, so anything that can pinpoint faults, and even correct them remotely, shores up the bottom line.
"We've seen as much as 40 percent reductions in the number of truck rolls through the use of Spectrum," Waterhouse says.
Now that the monitoring and alarm systems were in place, the next step was to establish the processes and procedures to provide an effective response. For help with that, the company turned to Greenwhich Technology Partners, of Parsippany, NY.
GTP devised a thorough process by which problems are detected, preferably by the Spectrum software rather than customer complaint, triggering an alarm and a tracking number. The alarm or alarms are then analyzed and a trouble ticket is issued.
A key piece of software for this stage is The Magic Trouble Ticketing System, developed by Remedy, a unit of BMC Software. It's a highly intuitive package that keeps track of the status of service calls to ensure that nothing falls through the cracks. It also sets up a series of priority definitions based on the urgency of the problem—from a Priority 1 network outage to a Priority 5 nondetectable error that may lead to service degradation if not checked.
Once an alarm is sounded, the next step is to investigate and resolve the problem. The Spectrum software provides root-cause analysis and assists technical staff in isolating the problem. It is also vital to set up notification procedures to ensure that the status of the problem is communicated to the proper groups or individuals, from customer care to the engineering staff to senior management, if necessary.
The final step is to verify that the problem has been resolved and to close the trouble ticket. The most crucial point at this stage is solid communication with the customer to ensure that the problem has been resolved satisfactorily.
"The overall objectives are the same across any organization," says Dan Stavola, practice director for enterprise management at GTP. "You get an event, and you need to handle that event and minimize the impact to the business. The systems that were being used to capture events and manage information at SusCom were not the same tools that the Help Desk was using, so we had to customize the process and create a workflow and synergy between the Help Desk and the NRC. You don't want to design a process around tools. You want to design the process and then enable the tools."
One of the kinks in SusCom's case was the presence of the voice service in Carmel. Not only did it require a higher level of performance than video and data, it also added a third layer of digital information to the mix. In the end, the most feasible approach was to funnel all voice, video and data services into one level for monitoring purposes. Once the system determines what type of data is affected, responses can be prioritized.
"With voice, we react as soon as an individual alarm sounds," Faulkner says. "With data and video, some things will reset themselves, so we give them a little time to react."
Faulkner says the new facility has greatly improved communications between the engineering and maintenance staff, as well as with the customer. Technicians in the center can now conduct real-time fault isolation and diagnostics so as to proactively identify problems before the subscriber even notices. When complaints do surface, the fault can be quickly identified, and customer service personnel can provide more information as to what the problem is and how and when it will be repaired—a great improvement over the traditional "we'll look into it" approach.
Diagnostic tools within the Spectrum package are also much stronger than were previously available to Susquehanna. Real-time fault isolation offers technicians a greater ability to get malfunctioning routers back online remotely. But even if repairs are needed in the field, problems can be located much more quickly.
"When we send a tech out, instead of him searching up and down the network to find out what's wrong, the goal is to have the problem isolated with the right tools," Faulkner says. "We can also look at problems after the fact. We conduct a post mortem to determine if there is any commonality between faults, so as to narrow down the root cause of the problem and eliminate it."
Faulkner says the most important aspect of building a NOC from scratch is planning. Solid knowledge of what your capabilities are and how you would like to improve on them is crucial.
"This isn't a job to be completed within a month or two," he says. "For us, it was a year-long project. Be methodical and meticulous. There is a lot about your own organization you will learn."
It's that learning curve, however, and the knowledge that the technical staff gains from it, that is most valuable in the long run.
Art Cole is a contributor to Communications Technology. Reach him at email@example.com.