Voice, video and data services offered by cable operators are continually increasing in complexity, and service assurance systems are struggling to keep pace. Such complexity derives from the scaling of video on demand (VOD) systems, extension of VOD system functionality and deployment of switched digital video (SDV) systems.
Video assurance addresses this challenge by enabling the SDV system to monitor and measure all operations. A video assurance system can detect and characterize numerous issues affecting SDV services, such as bandwidth exhaustion, video tiling, channel change response time and underlying plant issues. It also can allow operations personnel to understand a problem quickly, to remedy the situation, and then to take steps to avoid future instances of the problem.
Video assurance for SDV encompasses control plane interaction; video plane monitoring; configuration management; trend analysis and capacity planning; and device interoperability. Each of these areas warrants its own set of assurance tools for use by customer service representatives (CSRs), field technicians, network operations center (NOC) personnel and deployment engineers, but note that all tools must be tightly integrated to afford rapid coordinated action. Use cases CSRs must be able to quickly translate subscriber account information into inventory data regarding customer premises equipment (CPE). The CSR then uses CPE detail to interrogate the video assurance system to establish the locality of the service-impacting event. NOC personnel become involved in cases where the service is interrupted or degraded across a group of subscribers. In these situations, the NOC technician must use the video assurance system to investigate patterns in the outage data and deduce correlations. Also, device and service misconfigurations often compromise services in a way that initially may be indistinguishable from a device failure.
Field engineers validate installations and troubleshoot video plane problems. In the first case, field technicians are able to record device settings at the time of equipment installation, after the service has been enabled and successfully accepted by the subscriber. The second use case involves control and/or video plane issues in which the field technician, CSR and NOC engineer are able to work together to diagnose a problem of ambiguous origin that is reproducible at one or more subscriber locations in close proximity.
The last set of use cases involves design engineers who look at usage trends in order to plan capacity enhancements and who often handle service escalations when NOC personnel are unable to identify the cause of a service event. SDV tiling A particularly insidious form of service degradation is video tiling, especially where it is accompanied by audio issues. Video tiling problems may arise from improper statistical multiplexing of video streams as well as from poor HFC plant hygiene. However, SDV introduces further potential sources for video quality issues.
Consider the following, practical example: A digital cable subscriber calls customer service and reports video tiling across some number of channels. A possible diagnosis follows:
1. The CSR talks with the subscriber o determine whether the channels experiencing tiling are broadcast channels or channels that are subject to switching. The CSR determines that broadcast channels are unaffected and only some of switched channels appear impacted.
2. The CSR reports the issue to the NOC to determine if others have observed the problem. If the problem was previously reported and a fix is in progress, the CSR reports this to the subscriber. The NOC uses the information reported by the CSR to attempt correlate the problem.
3. The NOC determines that the problem relates to a particular SDV service group and a specific edge quadrature amplitude modulation (QAM) modulator. The CSR uses the SDV management console to make this determination.
4. The NOC uses the edge QAM management console to look at status for the implicated edge QAM modulator. The NOC engineer determines that the device is alarming because of an overload on one of its QAM channels. The NOC engineer determines which QAM channel.
5. The NOC engineer uses the SDV management console again to retrieve the list of video channels being fed through the QAM modulator.
6. The NOC compares the list of channels reported by the SDV management system with that ascertained directly from the edge QAM management console to see if a one-to-one correspondence exists.
7. The NOC engineer determines that the edge resource manager (ERM) responsible for edge QAM video channel allocation and the edge QAM modulator itself have gotten out of synchronization.
A possible solution is that the NOC engineer first uses the SDV management console to resynchronize the ERM with the edge QAM modulator; second, verifies through the edge QAM management console that the edge QAM modulator is now in synchronization with the ERM; and third, uses the SDV and edge QAM management console event histories to investigate past video channel allocations and de-allocations and to determine at what point the ERM and the edge QAM modulator became unsynchronized. Lessons From the preceding use cases, one may make several important observations about the nature of video assurance.
Monitoring is "multi-dimensional". Two fundamental types of monitoring inputs exist: device-level and service-level. The edge QAM management console in the troubleshooting example embodies device-level monitoring. The device-level management console may also take the form of a Web-based interface that allows similar operations to be performed. The SDV management console represents service-level monitoring that allows logical functions to be performed, such as reporting of service group bandwidth, spanning multiple physical devices and monitoring/resynchronization of video channels. These service-level monitoring and management aspects may also be performed using simple network management protocol (SNMP) or Web-based interfaces.
Two basic types of monitoring outputs exist: alarms and drill-down interfaces. Alarm outputs generally take the form of SNMP or Internet protocol (IP) detail record (IPDR) interfaces that devices or the service assurance system may use to report important error conditions. Drill-down menus, provided through a management console, offer an effective means to characterize and diagnose service-affecting events. NOC personnel may be trained to check the management console periodically, during lulls in trouble ticket response, to investigate and resolve low priority service issues that, if neglected, may lead to more severe problems later. Moreover, drill-down menus may incorporate the tools necessary to establish correlations and dependencies between alarm information.
Service-level monitoring and management must be seamless. Video assurance management consoles must provide an immediate ability to invoke operations to resolve issues that have been diagnosed. In addition, the console must provide instantaneous feedback on the effects of the resolution in order to determine if the intended outcome occurred. Components, interfaces and architecture The challenges associated with video service assurance can be summarized in three areas of complexity: (1) the number of components participating in service delivery; (2) the intricacies of the interfaces between the numerous components; and (3) variations between system architectures.
Much has been written about SDV components, but in general the control plane elements may be enumerated and described as follows:
Set-top box SDV client: This device and resident application is responsible for translating from key presses on the subscriber remote control and program selections from the electronic program guide (EPG) to channel change request messages sent across the interactive network to the digital cable headend. Upon receipt of a channel change response from the headend, the SDV client tunes the set-top box device to the appropriate video channel.
Interactive network: This set of devices provides the control channel for communications between the digital cable headend and customer premises. The set of devices includes legacy equipment, including forward path modulators and reverse path demodulators, as well as a network gateway and digital controller. Ultimately, DOCSIS cable modem termination systems (CMTSs) and cable modems will come to replace this legacy equipment.
SDV session manager: This application resides in the cable headend and scales with the number of SDV clients, responding to channel change requests from the SDV client. The SDV session manager keeps track of which SDV clients are tuned to specific video channels.
ERM: This application also resides in the headend and scales with the number of QAM channels. The ERM responds to requests from the SDV session manager to allocate and de-allocate video channels on the edge device. Allocation occurs when a subscriber is the first in the service group to tune to the video channel. Conversely, de-allocation may occur, subject to bandwidth policy considerations, when a subscriber is the last in the service group to tune away from the video channel.
Edge device: The edge device or edge QAM modulator is responsible for responding to ERM requests for channel allocation and de-allocation. The edge QAM modulator executes Internet group management protocol, version 3 (IGMPv3) joins and leaves on its "upstream" Ethernet interface and routes the incoming, IP-encapsulated MPEG-2 transport streams to a particular QAM channel frequency and MPEG program number.
IP switch: The IP switch responds to the IGMPv3 joins and leaves from the edge device, allowing downstream transport of the video only when an edge device is actively routing the video channel.
In addition to the control plane elements, several video plane elements must be monitored and managed. These include the digital satellite receiver, groomer/demultiplexer, encoder, splicer and bulk encryptor.
At a base level, all of these components, whether hardware devices or application processes running on such devices, must be managed and monitored by the video assurance system.
In addition to monitoring device and process health and managing associated hardware, software and service configuration information, the service assurance apparatus must track all exception conditions thrown on the various interfaces between the application components. They include:
Channel change interface between SDV client and SDV session manager: This interface also supports a forced tune capability to move inactive SDV clients off a video channel so that the channel may be reclaimed and the associated bandwidth used for other purposes.
Mini carousel interface, also between SDV client and SDV session manager: This "one-way" interface delivers channel map information, specific for each service group, to the SDV clients. Depending on the service delivery design, the purpose of the carousel is to provide improved channel change performance through fast updates to the channel map in cases where channel change response over the interactive network slows under heavy load, and/or to provide a backup mechanism when the interactive network fails.
SDV session manager to ERM interface: This is responsible for requesting new channels into a service group and requesting the removal of unwatched channels from a service group.
ERM to edge device interface: This is responsible for allocating and de-allocating QAM channels according to resource availability.
Edge device to IP switch: This takes the form of IGMPv3.
Aside from the control plane interfaces, another set of interfaces connect the video plane elements, namely receiver to multiplexer/encoder, multiplexer/encoder to switch, switch to splicer, splicer to encryptor, encryptor to edge device, and edge device to set-top box.
A description of the manner by which video assurance systems collect exception conditions, recorded on the various control and video plane interfaces, is provided in the following section.
SDV system architecture, like VOD architecture, is marked by a variety of specifications. Two primary architectures exist today, one developed by Time Warner Cable and one developed by Comcast.
Further complicating matters, these architectures approach somewhat differently the implementation of QAM resource sharing between VOD and SDV services. Evolution Service delivery platforms are often designed with "best case" scenarios in mind and retro-fitted with exception handling only after engineers learn hard-earned lessons in the field.
Two notable characteristics of most service assurance systems are that they are designed significantly after the fact by a group of engineers that did not participate in the initial service delivery design, and that they are typically designed to scrape log files and query databases in inefficient ways to collect needed information.
Even in the best cases, designers of service delivery platforms expend little effort at putting service assurance "hooks" in place beyond UNIX syslog-style interfaces.
Several concrete steps can be taken to re-factor the existing video service assurance framework to achieve greater scalability, increased performance and improved responsiveness.
Operational metrics: The first step is to define derived operational metrics required for historical reporting and trend analysis. Three metrics useful for measuring SDV system performance are bandwidth utilization, channel change latency and video quality. Video quality may be measured using a number of quantitative approaches. Bandwidth utilization trends can be used to predict needed capacity enhancements on both video delivery and interactive networks and to determine service denial probabilities on heavily loaded systems. Channel change latencies can be used to infer data traffic on heavily loaded interactive networks as well as data loss on noisy interactive networks.
Business metrics: The second step is to define derived business metrics. For SDV, these metrics may include channel change frequency, peak unique programs viewed on a service group, program audience share across all viewers, advertisements viewed and skipped, and many others. Other information, such as advertising traffic and billing verification logs, can be factored into the calculations as well.
Cached service metrics: The third step is to design the SDV session manager so as to allow it to serve as the nexus for service-level assurance data collection. The SDV session manager must be able to record all such atomic operations and export them to the SDV management console. In turn, the SDV management console must be able to cache such operations until such time as the data can be reduced to derived metrics for export to the video assurance system. This allows for reliable and efficient collection of all data.
Device metrics: The final step is to instrument the SDV control plane and video plane devices for open, standard SNMP and IPDR monitoring and the failsafe, syslog-style proprietary logging interfaces specified by vendors and operators.
With the ability to record data, filter the data for exception conditions and transform the raw data into derive metrics for reporting and analysis, the major remaining task is to integrate a root cause analysis (RCA) system. RCA employs rule sets that correlate raw data so that operations personnel may identify the original failure event leading to a sequence of dependent errors. RCA may also be closely integrated with logic to characterize affected customers and services associated with a device failure and with workforce management systems that may be used to schedule repair orders in an automated fashion according to potential revenue impact. Integration and interop The cable industry has several opportunities to continue making progress in converging management protocols so as to simplify interoperability and facilitate integration with service assurance systems.
The real-time streaming protocol (RTSP)-based protocols for edge resource management presented by CableLabs edge resource manager interface (ERMI) and RTSP-based SDV and VOD bandwidth management protocols present one opportunity. A second also involves the RTSP protocol: transitioning existing session setup and stream control protocols to RTSP to facilitate integration with streaming media applications that will form the basis for cable IP video. Conclusions Several steps must be taken to guarantee that service assurance systems keep pace with service complexity. First, efficient service assurance must empower multiple users to work together to improve system design, anticipate service-affecting problems, intercept and resolve outages before such outages widely impact subscribers, and quickly remedy service impacts from unexpected events. Second, a common service assurance framework across video, voice and data ensures coherent and consistent trouble resolution and planning. Finally, standardization of interfaces between components upon which services depend will substantially reduce service assurance complexity even as interoperability expands.
Joseph Matarese is SVP advanced technology for ARRIS. Reach him at firstname.lastname@example.org. Robert Cruickshank is VP OSS/BSS strategies for ARRIS. Reach him at email@example.com. This article dervies from a paper presented at the 2008 SCTE Conference on Emerging Technologies.