Predictive Alerting
By Chris Bastian, Senior Vice President/CTO, SCTE/ISBE
Proactive Network Maintenance (PNM) technology is making great strides in locating faults and defects in the MSO’s network. As an example, with full bandwidth capture, network operators are now able to see the entire RF spectrum from the CPE’s point of view and detect defects and faults such as ingress, excessive tilt, spectrum suck-outs, and standing waves. An area of the network that was once invisible to the network manager can now be monitored in granular detail.
Wouldn’t it be great if a network operator could go to the next step and actually predict with great accuracy when a fault is likely to occur? If we could gracefully take equipment out of service and replace it—before it causes a customer-affecting event? Customer satisfaction scores would dramatically soar.
Other industries have already implemented such predictive analysis, including the military and transportation systems. It is commonplace in certain hardware-specific industries, such as network attached storage, where predicting spinning disk failure becomes a critical function to eliminate data loss and avoid business continuity issues. Even in telecommunications, there are papers dating back to the 1970s stating the plausibility of creating such a predictive alerting system. Thus, it is not a new idea; however, the cable industry has been slow to implement its principles. SCTE/ISBE Cable-Tec Expo 2016 will zoom in on PNM and its significance in late September in Philadelphia.
The below diagram shows the fundamental steps in the predictive alerting process:
Data Collection—Gathering network management information from the typical network elements, such as routers, CMTSes, fiber nodes, and cable modems, as well as from other sources such as customer service and field technician tickets, and social media.
Data Filtering and Analysis—Filtering out the vast quantity of unimportant data and focusing in on the events that truly contribute to equipment and service reliability and availability. Developing rules for how events are linked or “chained,” such as correlating a network alarm with a customer service ticket.
Prediction—The forecast function, where real-time events and conditions are compared against time-proven signatures of “healthy” versus “unhealthy” sequences. Each signature can be characteristic of a specific set of network anomalies.
Proactive Control—Based on the highly likely prediction, controlling the network by routing traffic away from the affected area and restoring network health, either by re-starting or replacing the impacted network element.
Most network operators have implemented the traditional data collection and analysis functions into their network management systems. Operators and vendors are now developing and fine tuning the prediction and proactive control functions. By looking at every available source that indicates network health—from SNMP traps to customer tickets to chat sessions—and comparing to known baselines, a cyclic, closed, predictive alerting process is established.
It is high time that telecommunications networks fully harness this valuable methodology.