Reliability Surrogates for a Corporate Balanced ...

6 downloads 56905 Views 136KB Size Report
Press Briefing, Basking Ridge, NJ. April 14, 1998. “AT&T has stood for reliability and still does. It is at the core of our service and our customer relationship, and it ...
Reliability Surrogates for a Corporate Balanced Scorecard SRE98 Workshop, Ottawa, Canada, July 14-15, 1998 James Cusick Senior Technical Staff Member, AT&T 30 Knightsbridge Road, Piscataway, NJ 08854 [email protected]

Today’s Talk Introduction & Context ✦ Balanced Scorecard Background ✦ Components of a Scorecard ✦ A Scorecard Process ✦ Reliability Surrogates Considered ✦ Reliability’s Optimal Role Considered ✦

AT&T 2

So Many Metrics So Little Time ✦

Large-scale business operations require software intensive systems - which metrics are most useful to manage these systems?



What role should reliability measurement play in IT management?

AT&T 3

AT&T Frame Relay Outage Reliability Matters

“AT&T has stood for reliability and still does. It is at the core of our service and our customer relationship, and it is a top priority for every AT&T person and for me personally.” Mike Armstrong, AT&T CEO Press Briefing, Basking Ridge, NJ April 14, 1998

AT&T 4

Introducing the BBS Model ✦

Balanced Business Scorecard developed by Kaplan & Norton Financial In iti

at iv es

s re ts su ge ea ar M T

O bj ec tiv es

“Are we meeting shareholder expectations for value delivered?”

Customer

How well do we deliver software solutions to the business?

at iv es

s re ts su ge ea Tar M

In iti

Vision and Strategy

O bj ec tiv es

In iti

at iv es

O bj ec tiv es

“Do we meet business partner expectations?”

Process s re ts su ge ea Tar M

In iti

s re s su et ea rg M Ta

O bj ec tiv es

“How will we sustain our ability to change and meet future needs?

at iv es

Learning and Growth

Kaplan, S., et. al., “The Balanced Business Scorecard”, HBR, 1993.

AT&T 5

BM-CIO Scorecard Metrics Financial IT Spending vs. Revenue Development vs. Maintenance Labor Cost per Function Point

Customer Defects per Function Point System Availability Response Time

Development Process On Time Delivery Cycle Time Effort Variance SEI Level Distribution

Learning & Growth Attrition Rate Training Days per Employee

* Scope of reporting: Business Markets CIO “critical” systems with active development. These are the 1997 metrics. AT&T 6

Organization CIO Office Development Directorate A

Development Directorate n

Metrics Coordinators

Metrics Coordinators

System A

System A

System B

System B

System n

System n

Architecture Directorate Metrics Office

AT&T 7

BM-CIO Scorecard Process Data Collection Development Directorates (via Metrics Coordinators)

Data Analysis

Data Reporting Internal Review

Metrics Definition

VP Review

Benchmark Research

Operations Release Mgmt

Draft 1-n

Collection Requests Draft & Publish Scorecard

HR

Training

Published Scorecard

Template Collection

Finance Data Merge & Analysis

Scorecard Archives

Biz Staff Meeting Feedback (goals, tracking)

CIO Staff Meeting

* BM-CIO = Business Markets CIO AT&T 8

Measuring Reliability in DPMs From the AT&T Web Site (5/10/98) http://www.att.com/network/standrd.html “At AT&T's Network and Computing Services organization, one of the most important gauges of network reliability is Defects Per Million. This measurement is a statistically valid record of how many calls per million did not go through the first time because of a network procedural, hardware or software failure.” “During 1997, AT&T's Defects-Per-Million performance was 173, which means that of every one million calls placed on the AT&T network, only 173 did not go through the first time due to a network failure. That equals a network reliability rate of 99.98 percent for 1997.”

DPMs are now being used in ever widening metrics applications within AT&T.

AT&T 9

Reliability Surrogates 1997 Availability for Selected Systems in DPMs

JAN

FEB

MAR

APR

MAY

JUN

JUL

AUG

SEPT

OCT

NOV

DEC

OTHER MEASURES IN DPMs: - System Response Time - Batch Runs (e.g., bill production) - Customer Satisfaction AT&T 10

Using DPMs within SRE Process Modifications Require up-front specification of reliability expressed in DPMs which balance cost of development, cost of a failure, and other factors to set an appropriate reliability goal..

Architecture and Architecture Reviews Architecture phase must consider the DPM rating which the system must attain.

Software Reliability Engineering Techniques Operational Profiles, reliability modeling, profile testing with consequence factors, used to provide estimates of production availability and DPM rates.

Operational Feedback Verify application defect rates in production, make appropriate adjustments to process.

Training Rewards Executive Sponsorship

Proposal Status: Under Consideration AT&T 11

Which Reliability to Use? ✦

AT&T’s Operational Model – Defect’s per Million given an “operational” unit



HP’s FURPs model – Traditional reliability figures



SRE model(s)?

AT&T 12

Issues with Reliability & DPMs Is rolled-up reliability across systems meaningful? In what instances? ✦ Is Reliability or DPMs more understandable to management or non-technical audiences? ✦ What are the limitations to using DPMs? ✦ If failure rate per million is a reliability figure, can it be a predictor as well? ✦

AT&T 13

Wrap-Up Thanks for your attention … ✦ Thanks to the Scorecard team ... ✦

Any Questions?

AT&T 14