From the perspective of operators of complex networks, the reliability of the equipment that goes into those networks can vary significantly. It can be difficult for an operator with many systems across the country to have an accurate picture of the quality of widely deployed equipment.

A manufacturer that can demonstrate a system for tracking equipment returns from customers and a defined plan for handling problems will increase the degree of confidence in that vendor. Nearly all vendors have some form of tracking in place. An even higher level of customer confidence comes from seeing that the manufacturer has an effective monitoring program to track and remedy problems using a closed loop corrective action process for each product line. The highest confidence is reserved for vendors that discover and resolve problems quickly before the customer becomes aware that a problem even exists.

The need for networks to be absolutely reliable and rock-solid has continued to grow. In fact, the expectation is that the network will simply not go down. Customers notice even the briefest of service disruptions. Smart system operators know that they cannot afford to use unreliable equipment in their networks. Manufacturers demonstrate the quality of their people, their processes, their company and the products they provide by having systems in place to ensure that problems are quickly identified and taken care of. The manufacturers that don’t have effective systems in place for maintaining quality fall by the wayside as their customers abandoned them.

This article provides an overview of a highly effective return rate metric, which is an excellent method of monitoring equipment quality and staying on top of field problems. The article also discusses the difference between return rate and failure rate (FR), the way return rate changes over time (the "bathtub" curve), the need for several return rate metrics to track equipment performance over time, and how TL-9000 metrics also track return changes over time.
Triangle chart Return rates (RRs) are calculated from shipments and return material authorization (RMA) returns. Shipments are tracked by part number, shipment date, and, for some products, serial number (referred to as "serial number tracking"). The units shipped during a fiscal or calendar month are usually designated a "production lot." Returns are tracked by part number, serial number, shipment date (usually printed on the unit’s bar code label) and return date. For units with serial number tracking, the exact shipment date is known from the shipping record; otherwise, the shipment date is assumed to be the same as the production date.

It’s customary to assume that shipments are installed one month after they ship, all units shipped to a site are installed at that site, all failed units are returned for repair, and returns are received for repair in the month they fail.

A "triangle chart" (sometimes called a "Nevada chart") correlates production lots and the returns received from each production lot. In Figure 1, the columns on the left are production lot shipment year, month, and quantity. The rows at the top are return year and month. The body of the triangle chart shows return quantities tabulated by the date they were received and the production lot in which they were shipped. For example, in June 2007, 13,477 units were shipped, of which 10 were returned in February 2008.
RRs are based on the mean-time-between-return (MTBR), which is calculated from the triangle chart by dividing power on hours (production lot ship quantity times time in service) by number of returns. MTBR can be calculated using different shipment and return populations as long as they are consistent. The preceding triangle chart shows a 12 x 12 MTBR calculation as an example. The "12 x 12" nomenclature refers to a 12-month period of returns (August 2007 through July 2008) and a 12-month period of shipments (August 2007 through July 2008). MTBR can be calculated using other shipment and return quantities. Figure 2 shows a 3 x 24 MTBR calculation where the "3 x 24" nomenclature means 3 months of returns (May through July 2008) and 24 months of shipments (August 2006 through July 2008). MTBR can be expressed in hours, months, years, or any other unit of time by simple conversion and is converted to a RR using the following standard reliability equation: Since MTBR can be calculated for different periods, it is annualized (annualized return rate, ARR) by setting the time (t) equal to 1 year (12 months, 8,760 hrs, etc.) as follows: For the pervious triangle chart examples, the ARR is: The following simplified equation can also be used, although it is slightly less accurate. A full derivation of this equation is available from the authors by request. The relationship between ARR and MTBR can be used to build a lookup table to convert quickly and easily between MTBR in hours, months, or years or return rate using both the exponential and simplified equations. For an example, see Table 1. Failure rate RR and MTBR are based on all RMA returns, some of which are eventually determined to be "no problem found" or "customer damage." Failure rate (FR) and mean time between failure (MTBF) are a subset of RR and MTBR since they exclude these returns. Other than these exclusions, MTBF and annualized failure rate (AFR) are calculated using the same triangle chart tabulation and equations.

RR and MTBR can be calculated as soon as a unit is received for repair. However, FR and MTBF cannot be calculated until repair is complete and it is known if they are true failures, which may take some time. There are several common ways to address the lag between date received for repair and date of repair complete, all of which involve additional assumptions.
Annualized trends Figure 3 is a 12×12 graph of ARR, AFR and shipment quantities that measures out-of-box and early life reliability and shows if initial quality out of the factory is improving or needs improvement. A similar graph showing life-to-date by life-to-date ARR, AFR and shipment quantity shows steady state reliability and can often detect if a product is suffering from premature wear-out.

Several RR metrics are necessary to understand a production lot’s early life, steady state, and wear-out RRs. Change or add RR metrics if necessary to answer additional questions. A "one size fits all" RR metric may miss important trends or cause false alarms. "Bathtub" curve Several RR calculations are useful because the RR of each production lot changes over time. When a production lot is first installed, it enters its "early life phase," during which the RR starts high and decreases until it reaches steady state. Once the RR is at steady state, the production lot is in its "useful life phase." Eventually, equipment wears out, and the RR increases. A graph of RR during the early life, steady state, and wear-out phases looks something like a bathtub, as illustrated in Figure 4. Bathtub curves for systems equipment can be constructed from a triangle chart. Figure 5 of RR vs. months after shipment shows the bathtub curve for the triangle chart presented previously. The early life RR peaks at about 0.9 percent and reaches a steady state value of about 0.5 percent after about 13 months. Early life phases typically last 12 to 24 months, and the useful life phase usually lasts more than 15 or 20 years. The "hump" in this bathtub curve at 3 or 4 months after shipment is caused by the lag between actual and assumed install date and the lag between actual failure date and date received for repair. If we had actual install dates and replacement dates recorded in the operator’s plant, this curve would likely have its highest point in month 0 or 1 and reach steady state sooner. TL-9000 metrics The bathtub curve presented previously is based on a "horizontal slice" of the triangle chart and follows individual production lots across their life cycle phases. RRs calculated in accordance with TL-9000 Quality Management System Measurements Handbook assess changes in RR over time using a "vertical slice" of the triangle chart.

TL-9000 RRs are also based on a triangle chart and MTBR approach. The standard slices the triangle chart into 3 pieces; Early Return Index (ERI) = 1×0-6, One-year Return Rate (YRR) 1×7-18, and Long-term Return Rate (LTR) = 1×19-life to date. Figure 6 shows how the triangle chart is sliced for these three metrics. The TL-9000 RR equations can be derived from the simplified RR equation shown previously, and a full derivation is available by request. The TL-9000 RR equations are in the general form: For the preceding triangle chart, the early return index, one-year return rate, and long-term return rate are: A graph of the TL-9000 RRs shows the following somewhat erratic or "noisy" behavior, primarily because the sample size, several hundred thousand units, is relatively small for a one-month triangle chart calculation. (See Figure 7.) The TL-9000 standard increases sample size by combining products into product categories. For example, TL-9000 Product Category 3.2.5, Fiber to the User, includes Cisco Prisma II optical amplifiers, Prisma II Chassis, Prisma II 1,550 and 1,310 transmitters, Prisma II High Density 1,310 and 1,550 transmitters, optical switches, forward receivers, reverse receivers, etc. ERI, YRR, and LTR are calculated for all products together in this category with no distinction between (for example) chassis and optical amplifiers.

Vendors tabulate shipments and returns in a triangle chart to calculate MTBR and MTBF statistics, which are then converted to annualized RRs and FRs. RRs and FRs can be graphed to identify trends.

A production lot’s RR changes over time in the shape of a bathtub curve, and several metrics are necessary to characterize early life and steady state RRs and detect premature wear-out if it happens.

Early life and steady state RRs are also tracked by TL-9000 RRs, which are based on a triangle chart approach that calculates MTBR and converts it to a RR.
Conclusion A manufacturer that can clearly demonstrate to its customers that it is in full control of product quality and reliability throughout the entire life cycle will gain and maintain a reputation as a high-quality company. This becomes a key advantage in a highly competitive marketplace.

Andy Drexler is a staff reliability engineer for Cisco Systems. Reach him at drexlea@cisco.com. Ray Thomas is a principal engineer for Time Warner Cable. Reach him at ray.thomas@twcable.com.

The Daily

Subscribe

ESPN College Football Update

The college football Bowl Season is getting closer, but first we have to make it through conference championships.

Read the Full Issue
The Skinny is delivered on Tuesday and focuses on the cable profession. You'll stay in the know on the headlines, topics and special issues you value most. Sign Up