Data Management

4 downloads 50936 Views 6MB Size Report
Dec 3, 2006 - XML, Business Process Analysis, and other strategic areas. ... both to your data as well as your organization to successfully create and maintain premiere data management capabilities. ... the application of organization-wide data management ... their own data, and only 18 percent were very con- fident in ...
Do More With Your Data

Measuring Data Management Practice Maturity: A Community’s Self-Assessment Data Management & Information Quality



Philadelphia



November 2008

By Peter A. Aiken, PhD  Associate Professor Virginia Commonwealth University  Founding Director of Data Blueprint

About Data Blueprint Data Blueprint leverages decades of data management (DM) experience to develop and apply clientdesigned solutions to address information management requirements and needs. Data Blueprint clients gain immediate benefits due to the skillfully integrated and packaged data expertise designed to enhance organizational data management that moves organizations towards data empowerment with the following suite of Products and Services:



Data Assessments – we evaluate the current state of an organization and provide risk and opportunity information in Data Management Practices, Data Strategy, Data Quality Engineering, Data Risk, and Data Training.



Data Consulting/Engineering – we provide a continuum of specialized Data Blueprint skills in Enterprise Data Architecture, Data Quality, Audit, Cleansing, Migration, Integration, Metadata; XML, Business Process Analysis, and other strategic areas.



Data Instruction – we transfer knowledge to your organization through: classroom training, educational courses and workshops (some with accredited university credits), mentoring, conferences, and research lectures. Our goal is to empower you to manage your own data.

Data Blueprint’s methodology enables clients to learn and master processes, facilitated by our Products and Services, to achieve enhanced data management. The result of our team approach is empowerment – both to your data as well as your organization to successfully create and maintain premiere data management capabilities. The value to the organization is positive return on your investment in data – one of your most critical assets. “Data Blueprint has developed a unique technical approach…further, they have implemented an extremely difficult technical solution…a technical feat that no contractor has been able to implement in the more than three decades that out system has been running…” Tony Berta, Program Manager Technical Quality  Headquarters Defense Logistics Agency

Do more with your data. Call +1.804.521.4056 or email [email protected].

datablueprint.com • [email protected] • phone +1.804.521.4056 • fax +1.804.521.4004 Maggie Walker Business & Technology Center • 501 E Franklin St • Ste 414 • Richmond, VA 23219

R E S E A R C H F E A T U R E

Measuring Data Management Practice Maturity: A Community’s Self-Assessment Peter Aiken, Virginia Commonwealth University/Institute for Data Research M. David Allen, Data Blueprint Burt Parker, Independent consultant Angela Mattia, J. Sergeant Reynolds Community College

Increasing data management practice maturity levels can positively impact the coordination of data flow among organizations, individuals, and systems. Results from a self-assessment provide a roadmap for improving organizational data management practices.

A

s increasing amounts of data flow within and between organizations, the problems that can result from poor data management practices are becoming more apparent. Studies have shown that such poor practices are widespread. For example, • PricewaterhouseCoopers reported that in 2004, only one in three organizations were highly confident in their own data, and only 18 percent were very confident in data received from other organizations. Further, just two in five companies have a documented board-approved data strategy (www.pwc. com/extweb/pwcpublications.nsf/docid/15383D6E7 48A727DCA2571B6002F6EE9). • Michael Blaha1 and others in the research community have cited past organizational data management education and practices as the cause for poor database design being the norm. • According to industry pioneer John Zachman,2 organizations typically spend between 20 and 40 percent of their information technology budgets evolving their data via migration (changing data locations), con-

48

Computer

version (changing data into other forms, states, or products), or scrubbing (inspecting and manipulating, recoding, or rekeying data to prepare it for subsequent use). • Approximately two-thirds of organizational data managers have formal data management training; slightly more than two-thirds of organizations use or plan to apply formal metadata management techniques; and slightly fewer than one-half manage their metadata using computer-aided software engineering tools and repository technologies.3 When combined with our personal observations, these results suggest that most organizations can benefit from the application of organization-wide data management practices. Failure to manage data as an enterprise-, corporate-, or organization-wide asset is costly in terms of market share, profit, strategic opportunity, stock price, and so on. To the extent that world-class organizations have shown that opportunities can be created through the effective use of data, investing in data as the only organizational asset that can’t be depleted should be of great interest.

Published by the IEEE Computer Society

0018-9162/06/$20.00 © 2006 IEEE

Table 1. Data management processes.4

Process

Description

Focus

Data type

Data program coordination

Provide appropriate data management process and technological infrastructure Achieve organizational sharing of appropriate data

Direction

Program data: Descriptive propositions or observations needed to establish, document, sustain, control, and improve organizational data-oriented activities (such as vision, goals, policies, and metrics). Development data: Descriptive facts, propositions, or observations used to develop and document the structures and interrelationships of data (for example, data models, database designs, and specifications). Stewardship data: Descriptive facts about data documenting semantics and syntax (such as name, definition, and format). Business data: Facts and their constructs used to accomplish enterprise business activities (such as data elements, records, and files).

Organizational data integration Data stewardship Data development Data support operations Data asset use

Achieve business-entity subject area data integration Achieve data sharing within a business area Provide reliable access to data Leverage data in business activities

DATA MANAGEMENT DEFINITION AND EVOLUTION

Direction

Direction and implementation Implementation Implementation Implementation

accounting practices that have been practiced for thousands of years. As Figure 2 on the next page shows, data management’s scope has expanded over time, and this expansion continues today. Ideally, organizations derive their data management requirements from enterprise-wide information and functional user requirements. Some of these requirements come from legacy systems and off-the-shelf software packages. An organization derives its future data requirements from an analysis of what it will deliver, as well as future capabilities it will need to implement organizational strategies. Data management guides the trans-

As Table 1 shows, data management consists of six interrelated and coordinated processes, primarily derived by Burt Parker from sponsored research he led for the US Department of Defense at the MITRE Corporation.4 Figure 1 supports the similarly standardized definition: “Enterprise-wide management of data is understanding the current and future data needs of an enterprise and making that data effective and efficient in supporting business activities.”4 The figure illustrates how organizational strategies guide other data management proOrganizational strategies Implementation cesses. Two of these processes Guidance Data program —data program coordination coordination and organizational data inteIntegrated Goals gration—provide direction to models Organizational data integration the implementation processes —data development, data supStandard port operations, and data asset data Data Data use. The data stewardship prostewardship development cess straddles the line between direction and implementation. Application models All processes exchange feedand designs back designed to improve and Direction Business fine-tune overall data managedata Data support Data ment practices. operations asset use Feedback Data management has existed in some form since the 1950s Business value and has been recognized as a discipline since the 1970s. Data management is thus a young Figure 1. Interrelationships among data management processes (adapted from Burt discipline compared to, for Parker’s earlier work4). Blue lines indicate guidance, red lines indicate feedback, and green example, the relatively mature lines indicate data.

December 2006

49

Expanding Data Management Scope

1950-1970 1970-1990 1990-2000

2000 to present

Database development Database operation Data requirements analysis Data modeling Enterprise data management coordination Enterprise data integration Enterprise data stewardship Enterprise data use Explicit focus on data quality throughout Security Compliance Other responsibilities

RESEARCH BASIS Mark Gillenson has published three papers that serve as an excellent background to this research.5-7 Like earlier works, Gillenson focuses on the implementation half of Figure 1, adopting a more narrow definition of data administration. Over time, his work paints a picture of an industry attempting to catch up with technological implementation. Our work here updates and confirms his basic conclusions while changing the focus from whether a process is performed to the maturity with which it is performed. Three other works also influenced our research: Ralph Keeney’s value-focused thinking,8 Richard Nolan’s sixstage theory of data processing,9 and the Capability Maturity Model Integration (CMMI).10,11 Keeney’s value-focused thinking provides a methodological approach to analyzing and evaluating the various aspects of data management and their associated key process areas. We wove the concepts behind means and fundamental objectives into our assessment’s construction to connect how we measure data management with what customers require from it. In Stage VI of his six-stage theory of data processing, Nolan defined maturity as data resource management. Although Nolan’s theory predates and is similar to the CMMI, it contains several ideas that we adapted and reused in the larger data management context. However, CMMI refinement remains our primary influence. Most technologists are familiar with the CMM (and its upgrade to the CMMI), developed at Carnegie Mellon’s Software Engineering Institute with assistance from the MITRE Corporation.10,11 The CMMI itself was derived from work that Ron Radice and Watts Humphrey performed while at IBM. Dennis Goldenson and Diane Gibson presented results pointing to a link between CMMI process maturity and organizational success.12 In addition, Cyndy Billings and Jeanie Clifton demonstrated the long-term effects for organizations that successfully sustain process improvement for more than a decade.13 CMMI-based maturity models exist for human resources, security, training, and several other areas of the software-related development process. Our colleague,

Figure 2. Data management’s growth over time.The discipline has expanded from an initial focus on database development and operation in the 1950s to 1970s to include additional responsibilities in the periods 1970-1990, 1990-2000, and from 2000 to the present.

formation of strategic organizational information needs into specific data requirements associated with particular technology system development projects. All organizations have data architectures, whether explicitly documented or implicitly assumed. An important data management process is to document the architecture’s capabilities, making it more useful to the organization. In addition, data management • must be viewed as a means to an end, not the end itself. Organizations must not practice data management as an abstract discipline, but as a process supporting specific enterprise objectivesæ in particular, to provide a shared-resource basis on which to build additional services. • involves both process and policy. Data management tasks range from strategic data planning to the creation of data element standards to database design, implementation, and maintenance. • has a technical component: interfacing with and facilitating interaction between software and hardware. • has a specific focus: creating and maintaining data to provide useful information. • includes management of metadata artifacts that address the data’s form as well as its content. Although data management serves the organization, the organization often doesn’t appreciate the value it provides. Some data management staffs keep ahead of the layoff curve by demonstrating positive business value. Management’s short-term focus has often made it difficult to secure funding for medium- and long-term data management investments. Tracing the discipline’s efforts to direct and indirect organizational benefits has been difficult, so it hasn’t been easy to present an articulate business case to management that justifies subse50

Computer

quent strategic investments in data management. Viewing data management as a collection of processes, each with a role that provides value to the organization through data, makes it easier to trace value through those processes and point not only to a methodological “why” of data management practice improvement but also to a specific, concrete “how.”

Brett Champlin, contributed a list of dozens of maturity measurements derived from or influenced by the CMMI. This list includes maturity measurement frameworks for data warehousing, metadata management, and software systems deployment. The CMMI’s successful adoption in other areas encouraged us to use it as the basis for our data management practice assessment. Whereas the core ideas behind the CMMI present a reasonable base for data management practice maturity measurement, we can avoid some potential pitfalls by learning from the revisions and later work done with the CMMI. Examples of such improvements include general changes to how the CMMI makes interrelationships between process areas more explicit and how it presents results to a target organization. Work by Cynthia Hauer14 and Walter Schnider and Klaus Schwinn15 also influenced our general approach to a data management maturity model. Hauer nicely articulated some examples of the value determination factors and results criteria that we have adopted. Schnider and Schwinn presented a rough but inspirational outline of what mature data management practices might look like and the accompanying motivations.

RESEARCH OBJECTIVES Our research had six specific objectives, which we grouped into two types: community descriptive goals and self-improvement goals. Community descriptive research goals help clarify our understanding of the data management community and associated practices. Specifically, we want to understand • the range of practices within the data management community; • the distribution of data management practices, specifically the various stages of organizational data management maturity; and • the current state of data management practicesæ in what areas are the community data management practices weak, average, and strong? Self-improvement research goals help the community as a whole improve its collective data management practices. Here, we desire to • better understand what defines current data management practices; • determine how the assessment informs our standing as a technical community (specifically, how does data management compare to software development?); and • gain information useful for developing a roadmap for improving current practice. The CMMI’s stated goals are almost identical to ours: “[The CMMI] was designed to help developers select

Table 2. Organizations included in data management analysis, by type.

Organization type Local government State government Federal government International organization Commercial organization

Percent 4 17 11 10 58

process-improvement strategies by determining their current process maturity and identifying the most critical issues to improving their software quality and process.”10 Similarly, our goal was to aid data management practice improvement by presenting a scale for measuring data management accomplishments. Our assessment results can help data managers identify and implement process improvement strategies by recognizing their data management challenges.

DATA COLLECTION PROCESS AND RESEARCH TARGETS Between 2000 and 2006, we assessed the data management practices of 175 organizations. Table 2 provides a breakdown of organization types. Students from some of our graduate and advanced undergraduate classes largely conducted the assessments. We provided detailed assessment instruction as part of the course work. Assessors used structured telephone and in-person interviews to assess specific organizational data management practices by soliciting evidence of processes, products, and common features. Key concepts sought included the presence of commitments, abilities, measurements, verification, and governance. Assessors conducted the interviews with the person identified as having the best, firsthand knowledge of organizational data management practices. Tracking down these individuals required much legwork; identifying these individuals was often more difficult than securing the interview commitment. The assessors attempted to locate evidence in the organization indicating the existence of key process areas within specific data management practices. During the evaluation, assessors observed strict confidentialityæ they reported only compiled results, with no mention of specific organizations, individuals, groups, programs, or projects. Assessors and participants kept all information to themselves and observed proprietary rights, including several nondisclosure agreements. All organizations implement their data management practice in ways that can be classified as one of five maturity model levels, detailed in Table 3 on the next page. Specific evidence, organized by maturity level, helped identify the level of data management practiced. December 2006

51

Table 3. Data management practice assessment levels.

Level

Name

Practice

1

Initial

The organization lacks the necessary processes for sustaining data management practices. Data management is characterized as ad hoc or chaotic.

2

Repeatable

3

Defined

The organization might know where data management expertise exists internally and has some ability to duplicate good practices and successes. The organization uses a set of defined processes, which are published for recommended use.

4

Managed

5

Optimizing

The organization statistically forecasts and directs data management, based on defined processes, selected cost, schedule, and customer satisfaction levels. The use of defined data management processes within the organization is required and monitored. The organization analyzes existing data management processes to determine whether they can be improved, makes changes in a controlled fashion, and reduces operating costs by improving current process performance or by introducing innovative services to maintain their competitive edge.

For each data management process, the assessment used between four and six objective criteria to probe for evidence. Assessed outside the data collection process, the presence or absence of this evidence indicated organizational performance at a corresponding maturity level.

ASSESSMENT RESULTS The assessment results reported for the various practice areas show that overall scores are repeatable (level 2) in all data management practice areas. Figure 3 shows assessment averages of the individual response scores. We used a composite chart to group the averages by practice area. Such groupings facilitate numerous comparisons, which organizations can use to plan improvements to their data management practices. We present sample results (blue) for an assessed organization (disguised as “Mystery Airline”), whose management was interested in not only how the organization scored but also how it compared to other assessed airlines (red) and other organizations (white). We grouped 19 individual responses according to the five data management maturity levels in the horizontal bar charts. Most numbers are averages. That is, for an individual organization, we surveyed multiple data management operations, combined the individual assessment results, and presented them as averages. We reported assessments of organizations with only one data management function as integers. 52

Quality and results predictability

Computer

The organization depends entirely on individuals, with little or no corporate visibility into cost or performance, or even awareness of data management practices. There is variable quality, low results predictability, and little to no repeatability. The organization exhibits variable quality with some predictability. The best individuals are assigned to critical projects to reduce risk and improve results. Good quality results within expected tolerances most of the time. The poorest individual performers improve toward the best performers, and the best performers achieve more leverage. //correct?//. Reliability and predictability of results, such as the ability to determine progress or six sigma versus three sigma measurability, is significantly improved.

The organization achieves high levels of results certainty.

For example, the data program coordination practice area results include: • Mystery Airline achieved level 1 on responses 1, 2, and 5, and level 2 on responses 3 and 4. • The airline industry performed above both Mystery Airline and all respondents on responses 1 through 3. • The airline industry performed below both Mystery Airline and all respondents on response 4, and well below all respondents and just those in the airline industry on response 5. Figure 3f illustrates the range of results for all organizations surveyed for each data management processæ for example, the assessment results for data program coordination ranged from 2.06 to 3.31. The maturity measurement framework dictates that a data program can achieve no greater rating than the lowest rating achievedæ hence the translation to the scores for Mystery Airline of 1, 2, 2, 2, and 2 combining for an overall rating of 1. This is congruent with CMMI application. Although this might seem a tough standard, the rating reflects the adage that a chain is only as strong as its weakest link. Mature data management programs can’t rely on immature or ad hoc processes in related areas. The lowest rating received becomes the highest possible

(a) Response 1

0

1

2

3

4

1

0

(b)

1

2

Response 6

3.15

3 2.98

2.72

2.66

1

Response 2

0.98 2.34

2 3.11

2 2.05

Response 8

2.06

2.57

2

Response 4

1.09

2

Response 9

2.88

3.08 2.18

1

Response 5

(c)

2

Response 7

2.98 2.57

Response 3

4

2

Mystery Airline Airline industry All respondents

3.14 3.31

0

1

2

3

4

0

(d)

1

2

2

Response 10-a

Response 11

0.965 2.40

0.89 2

Response 12

3.04 2.15 1 0.97

Response 10-c

1.2 1.57 2

Response 13

1.98

1.05 2.01

2

Response 10-d

2

Response 14

1.1 2.21

0.79

2.46

2

Response 10-e

4

2.33

2

Response 10-b

3 2

2

Response 15

1./05 2.23

1.14 2.25

2

Response 10-f

0.96 2.13

(e)

0

1

2

3

4

3 2.89

Response 16 2.66

(f) 5.00

Data Data Data support stewardship development operations results results results

4.00 3

Response 17

Enterprise data Data program coordination integration results results

3.11 2.66

3.31

3.00

2.66 2.28

3 3.04

Response 18

2.00

2.06

2.04

1.00

2.18

2.46

2.66

2.04

1.98 1.57

3

Response 19

1.11 2.17

0.00

Figure 3. Assessment results useful to Mystery Airline: (a) data program coordination, (b) enterprise data integration, (c) data stewardship, (d) data development, (e) data support organizations, and (f) assessments range.

overall rating. This also explains why many organizations are at level 1 with regard to their software development practices. While the CMMI process results in a single overall rating for the organization, data management requires a more fine-grained feedback mechanism. Knowing that some data management processes perform better than others can help an organization develop incentives as well as a roadmap for improving individual ratings. Taken as a whole, these numbers show that no data management process or subprocess measured on average higher than the data program coordination process, at 3.31. It’s also the only data management process that performed on average at a defined level (greater than 3). The results show a community that is approaching the ability to repeat its processes across all of data management.

Results analysis Perhaps the most important general fact represented in Figure 3 is that organizations gave themselves relatively low scores. The assessment results are based on self-reporting and, although our 15-percent validation sample is adequate to verify accurate industry-wide assessment results, 85 percent of the assessment is based on facts that were described but not observed. Although direct observables for all survey respondents would have provided valuable confirming evidence, the cost of such a survey and the required organizational access would have been prohibitive. We held in-person, follow-up assessment validation sessions with about 15 percent of the assessed organizations. These sessions helped us validate the collection method and refine the technique. They also let us gauge the assessments’ accuracy. December 2006

53

Community descriptive research goals Table 4. Assessment scores adjusted for self-reporting inflation.

Response

Adjusted average

1 2 3 4 5 6 7 8 9 10 a 10 b 10 c 10 d 10 e 10 f 11 12 13 14 15 16 17 18 19

1.72388 1.57463 1.0597 1.8806 2.31343 1.66418 1.33582 1.57463 1.1791 1.40299 1.14925 0.97761 1.20896 1.23134 1.12687 1.32836 0.57463 1.00746 1.46269 1.24627 1.65672 1.66418 1.04478 1.17164

Although the assessors strove to accurately measure each subprocess’s maturity level, some interviews inevitably were skewed toward the positive end of the scale. This occurred most often because interviewees reported on milestones that they wanted to or would soon achieve as opposed to what they had achieved. We suspected, and confirmed during the validation sessions, that responses were typically exaggerated by one point on the five-point scale. When we factor in the one-point inflation, the numbers in Table 4 become important. Knowing that the bar is so low will hopefully inspire some organizations to invest in data management. Doing so might give them a strategic advantage if the competition is unlikely to be making a similar investment. The relatively low scores reinforce the need for this data management assessment. Based on the overall scores in the data management practice areas, the community receives five Ds. These areas provide immediate targets for future data management investment.

WHERE ARE WE NOW? We address our original research objectives according to our two goal categories. 54

Computer

First, we wanted to determine the range of practices within the data management community. A wide range of such practices exists. Some organizations are strong in some data management practices and weak in others (the range of practice is consistently inconsistent). The wide divergence of practices both within and between organizations can dilute results from otherwise strong data management programs. The assessment’s applicability to longitudinal studies remains to be seen; this is an area for follow-up research. Although researchers might undertake formal studies of such trends in the future, evidence from ongoing assessments suggests that results are converging. Consequently, we feel that our sample constitutes a representation of community-wide data management practices. Next, we wanted to know whether the distribution of practices informs us specifically about the various stages of organizational data management maturity. The assessment results confirm the framework’s utility, as do the postassessment validation sessions. Building on the framework, we were able to specify target characteristics and objective measurements. We now have better information as to what comprises the various stages of organizational data management practice maturity. Organizations do clump together into the various maturity stages that Nolan originally described. We can now determine the investments required to predictably move organizations from one data management maturity level to another. Finally, we wanted to determine in what areas the community data management practices are weak, average, and strong. Figure 4 shows an average of unadjusted rates summarizing the assessment results. As the figure shows, the data management community reports itself relatively and perhaps surprisingly strong in all five major data management processes when compared to the industry averages for software development. The range and averages indicate that the data management community has more mature data program coordination processes, followed by organizational data integration, support operations, stewardship, and then data development. The relatively lower data development scores might suggest data program coordination implementation difficulties.

Self-improvement research goals Our first objective was to produce results that would help the community better understand current best practices. Organizations can use the assessment results to compare their specific performance against others in their industry and against the community results as a whole. Quantities and groupings indicate the relative state and robustness of the best practices within each process. Future research can use this information to identify specific practices that can be shared with the

community. Further study of these areas will provide leverInitial Repeatable Defined ageable benefits. Data program coordination 2.06 2.71 3.31 Next, we wanted to deterEnterprise data integration 2.18 2.44 2.66 mine how the assessment inData stewardship 1.98 2.18 2.40 forms our standing as a techData development 1.57 2.12 2.46 nical community. Our research Data support operations 2.04 2.38 2.66 gives some indication of the claimed current state of data management practices. How- Figure 4. Average of unadjusted rates for the assessment results, by process. ever, given the validation session results, we believe that it’s best to caution readers that improve their data management practices. Organizations the numbers presented probably more accurately can use this data as a baseline from which to look for, describe the intended state of the data management describe, and measure improvements in the state of the practice. Such information can enhance their undercommunity. As it turns out, the relative number of organizations standing of the relative development of organizational above level 1 for both software and data management data management. Other investigations should probe are approximately the same, but a more detailed analy- further to see if patterns exist for specific industry or busisis would be helpful. Given the belief that investment ness focus types. Building an effective business case for achieving a cerin software development practices will result in significant improvements, it’s appropriate to anticipate sim- tain level of data management is now easier. The failure ilar benefits from investments in data management to adequately address enterprise-level data needs has practices. hobbled past efforts.4 Data management has, at best, a Finally, we hoped to gain information useful for devel- business-area focus rather than an enterprise outlook. oping a roadmap for improving current practice. Likewise, applications development focuses almost Organizations can use the survey assessment information exclusively on line-of-business needs, with little attento develop roadmaps to improve their individual data tion to cross-business-line data integration or enterprisemanagement practices. Mystery Airline, for example, wide planning, analysis, and decision needs (other than could develop a roadmap for achieving data management within personnel, finance, and facilities management). improvement by focusing on enterprise data integration, In addition, data management staff is inexperienced in data stewardship, and data development practices. modern data management needs, focusing on data management rather than metadata management and on synSUGGESTIONS FOR FUTURE RESEARCH taxes instead of semantics and data usage. Additional research must include a look at relationships between data management practice areas, which could indicate an efficient path to higher maturity levew organizations manage data as an asset. Instead, els. Research should also explore the success or failure most consider data management a maintenance cost. of previous attempts to raise the maturity levels of orgaA small shift in perception (from viewing data as a nizational data management practices. cost to regarding it as an asset) can dramatically change One of our goals was to determine why so many orga- how an organization manages data. Properly managed nizational data management practices are below expec- data is an organizational asset that can’t be exhausted. tations. Several current theses could spur investigation Although data can be polluted, retired, destroyed, or of the root causes of poor data management practices. become obsolete, it’s the one organizational resource that For example, can be repeatedly reused without deterioration, provided that the appropriate safeguards are in place. Further, all • Are poor data management practices a result of the organizational activities depend on data. organization’s lack of understanding? To illustrate the potential payoff of the work presented • Does data management have a poor reputation or here, consider what 300 software professionals applying track record in the organization? software process improvement over an 18-year period • Are the executive sponsors capable of understanding achieved:16 the subject? • How have personnel and project changes affected • They predicted costs within 10 percent. the organization efforts? • They missed only one deadline in 15 years. • The relative cost to fix a defect is 1X during inspecOur assessment results suggest a need for a more fortion, 13X during system testing, and 92X during malized feedback loop that organizations can use to operation.

F

December 2006

55

• Early error detection rose from 45 to 95 percent between 1982 and 1993. • Product error rate (measured as defects per 1,000 lines of code) dropped from 2.0 to 0.01 between 1982 and 1993. If improvements in data management can produce similar results, organizations should increase their maturity efforts. ■

Acknowledgments We thank Graham Blevins, David Rafner, and Santa Susarapu for their assistance in preparing some of the reported data. We are greatly indebted to many of Peter Aiken’s classes in data reengineering and related topics at Virginia Commonwealth University for the careful work and excellent results obtained as a result of their various contributions to this research. This article also benefited from the suggestions of several anonymous reviewers. We also acknowledge the helpful, continuing work of Brett Chaplin at Allstate in collecting, applying, and assessing CMMI-related efforts.

References 1. M. Blaha, “A Retrospective on Industrial Database Reverse Engineering Projects—Parts 1 & 2,” Proc. 8th Working Conf. Reverse Eng., IEEE Press, 2001, pp. 147-164. 2. J. Zachman, “A Framework for Information Systems Architecture,” IBM Systems J., vol. 26, 1987, pp. 276-292. 3. P.H. Aiken, “Keynote Address to the 2002 DAMA International Conference: Trends in Metadata,” Proc. 2002 DAMA Int’l/Metadata Conf., CD-ROM, Wilshire Conf., 2002, pp. 1-32. 4. B. Parker, “Enterprise Data Management Process Maturity,” Handbook of Data Management, S. Purba, ed., Auerbach Publications, CRC Press, 1999, pp. 824-843. 5. M. Gillenson, “The State of Practice of Data Administration— 1981,” Comm. ACM, vol. 25, no. 10, 1982, pp. 699-706. 6. M. Gillenson, “Trends in Data Administration,” MIS Quarterly, Dec. 1985, pp. 317-325. 7. M. Gillenson, “Database Administration at the Crossroads: The Era of End-User-Oriented, Decentralized Data Processing,” J. Database Administration, Fall 1991, pp. 1-11. 8. R.L. Keeney, Value-Focused Thinkingæ A Path to Creative Decisionmaking, Harvard Univ. Press, 1992. 9. R. Nolan, “Managing the Crisis in Data Processing,” Harvard Business Rev., Mar./Apr. 1979, pp. 115-126.

10. Carnegie Mellon Univ. Software Eng. Inst., Capability Maturity Model: Guidelines for Improving the Software Process, 1st ed., Addison-Wesley Professional, 1995. 11. M.C. Paulk and B. Curtis, “Capability Maturity Model, Version 1.1,” IEEE Software, vol. 10, 1993, pp. 18-28. 12. D.R. Goldenson and D.L. Gibson, “Demonstrating the Impact and Benefits of CMM: An Update and Preliminary Results,” special report CMU/SEI-2003-SR-009, Carnegie Mellon Univ. Software Eng. Inst., 2003, pp. 1-55. 13. C. Billings and J. Clifton, “Journey to a Mature Software Process,” IBM Systems J., vol. 33, 1994, pp. 46-62. 14. C.C. Hauer, “Data Management and the CMM/CMMI: Translating Capability Maturity Models to Organizational Functions,” presented at National Defense Industrial Assoc. Technical Information Division Symp., 2003; www.dtic.mil/ ndia/2003technical/hauer1.ppt. 15. W. Schnider and K. Schwinn, “Der Reifegrad des Datenmanagements” [The Data Management Maturity Model], KPP Consulting; www.kpp-consulting.ch/downloadbereich/DM% 20Maturity%20Model.pdf, 2004 (in German). 16. H. Krasner, J. Pyles, and H. Wohlwend, “A Case History of the Space Shuttle Onboard Systems Project,” Technology Transfer 94092551A-TR, Sematech, 31 Oct. 1994.

Peter Aiken is an associate professor of information systems at Virginia Commonwealth University and founding director of Data Blueprint. His research interests include data and systems reengineering. Aiken received a PhD in information technology from George Mason University. He is a senior member of the IEEE, the ACM, and the Data Management Association (DAMA) International. Contact him at [email protected].

M. David Allen is chief operating officer of Data Blueprint. His research interests include data and systems reengineering. Allen received an MS in information systems from Virginia Commonwealth University. He is a member of DAMA. Contact him at [email protected].

Burt Parker is an independent consultant based in Washington, D.C. His technical interests include enterprise data management program development. Parker received an MBA in operations research/systems analysis (general systems theory) from the University of Michigan. He is a member of DAMA. Contact him at [email protected].

Angela Mattia is a professor of information systems at J. Sergeant Reynolds Community College. Her research interests include data and systems reengineering and maturity models. Mattia received an MS in information systems from Virginia Commonwealth University. She is a member of DAMA. Contact her at [email protected]. 56

Computer

Improving Data Management Practices

1 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Please Help with A Research Project!

Data Management Practices Assessment [email protected]

2 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

• • • • •

Peter Aiken

Full time in information technology since 1981 IT engineering research and project background University teaching experience since 1979 Seven books and dozens of articles Research Areas

– reengineering, data reverse engineering, software requirements engineering, information engineering, humancomputer interaction, systems integration/systems engineering, strategic planning, and DSS/BI



Director



Published Papers

– George Mason University/Hypermedia Laboratory (1989-1993) – Communications of the ACM, IBM Systems Journal, InformationWEEK, Information & Management, Information Resources Management Journal, Hypermedia, Information Systems Management, Journal of Computer Information Systems and IEEE Computer & Software



DoD Computer Scientist



Visiting Scientist



DAMA International Advisor/Board Member (http://dama.org)

– Reverse Engineering Program Manager/Office of the Chief Information Officer (1992-1997) – Software Engineering Institute/Carnegie Mellon University (2001-2002) – 2001 DAMA International Individual Achievement Award (with Dr. E. F. "Ted" Codd) – 2005 DAMA Community Award

• • •

Founding Advisor/International Association for Information and Data Quality (http://iaidq.org) Founding Advisor/Meta-data Professionals Organization (http://metadataprofessional.org) Founding Director Data Blueprint 1999

3 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

http://peteraiken.net

Contact Information:

Peter Aiken, Ph.D. Department of Information Systems School of Business Virginia Commonwealth University 1015 Floyd Avenue - Room 4170 Richmond, Virginia 23284-4000 Data Blueprint Maggie L. Walker Business & Technology Center 501 East Franklin Street Richmond, VA 23219 804.521.4056 http://datablueprint.com office :+1.804.883.759 cell:+1.804.382.5957 e-mail:[email protected] http://peteraiken.net 4 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Organizations Surveyed International Organizations 10% Local Government 4% State Government Agencies 17%

Federal Government 11%

Public Companies 58%

5 - datablueprint.com

• Results from more than 400 organizations • 32% government • Appropriate public company representation • Enough data to demonstrate European organization DM practices are generally more mature

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

% of DM organizations labeled "successful" 0.45

0.36

0.27

0.18

0.09

Successful

0 Partial Success Don't know/too soon to tell Unsuccessful Does not exist

• In 25 years: 6 - datablueprint.com

1981 2007 © Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Investment Return 20%

Largely Ineffective DM Investments

• Approximately, 10% percent of organizations achieve parity and (potential positive returns) on their DM investments. • Only 30% of DM investments achieve tangible returns at all. • Seventy percent of organizations have very small or no tangible return on their DM investments. 7 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

8 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

September 21, 2004

9 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Hmm … Confusion Correct Name: Yusuf Islam TSA No Fly Listing: Youssouf Islam

10 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

• 15,000 people 15,000 want off the US terror watch appealedlist to be removed from list • 2,000 month requesting removal • TSA promised 30 day review process • Actual time is 44 days • American Civil Liberties Union estimates 1 million people on US government watch lists

11 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

US Terror Watch List Facts

• Fall 2008 comments: – Fewer than 2,500 people on US "no-fly" list – 10% those are US citizens – 16,000 people on "selectee" list (additional screening)

• Transfer responsibility of comparing names on lists from dozens of airlines to TSA 12 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

IT Project Failure Rates Recent IT project failure rates statistics can be summarized as follows:

– Carr 1994 • 16% of IT Projects completed on time, within budget, with full functionality

– OASIG Study (1995) • 7 out of 10 IT projects "fail" in some respect

– The Chaos Report (1995) • • • •

75% blew their schedules by 30% or more 31% of projects will be canceled before they ever get completed 53% of projects will cost over 189% of their original estimates 16% for projects are completed on-time and on-budget

– KPMG Canada Survey (1997) • 61% of IT projects were deemed to have failed

– Conference Board Survey (2001) • Only 1 in 3 large IT project customers were very “satisfied"

– Robbins-Gioia Survey (2001) • 51% of respondents viewed their large IT implementation project as unsuccessful

– MacDonalds Innovate (2002) • Automate fast food network from fry temperature to # of burgers sold-$180M USD writeoff

– Ford Everest (2004) • Replacing internal purchasing systems-$200 million over budget

– FBI (2005)

http://www.it-cortex.com/stat_failure_rate.htm (accessed 9/14/02) New York Times 1/22/05 pA31

• Blew $170M USD on suspected terrorist database-"start over from scratch" 13 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

DM Involvement Initiative Leader

Initiative Involvement

Not Involved

Data Warehousing XML Data Quality Customer Relationship Management Master Data Management Customer Data Integration Enterprise Resource Planning Enterprise Application Integration 0

12.5

25.0 Particpation Percentage

14 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

37.5

50.0

Misunderstanding Data Management

15 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Link business objectives to technical capabilities

16 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Data Management "Understanding the current and future data needs of an enterprise and making that data effective and efficient in supporting business activities" Aiken, P, Allen, M. D., Parker, B., Mattia, A., "Measuring Data Management's Maturity: A Community's Self-Assessment" IEEE Computer (research feature April 2007)

17 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

A Model Specifying Relationships Among Important Terms Wisdom & knowledge are often used synonymously

Intelligence

Data Information

Use

Data Request

Data Data Data Fact

Meaning Data

Data

1. Each FACT combines with one or more MEANINGS. 2. Each specific FACT and MEANING combination is referred to as a DATUM. 3. An INFORMATION is one or more DATA that are returned in response to a specific REQUEST. 4. INFORMATION REUSE is enabled when one FACT is combined with more than one MEANING. 5. INTELLIGENCE is INFORMATION associated with its USES. [Built on definition by Dan Appleton 1983] 18 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Expanding Scope 1970-1990

Years 1950-1970

1990-2000

2000-

Database design Database operation Data requirements analysis Data modeling Enterprise data management coordination Enterprise data integration Data stewardship Data use Data Quality, Data Security Data Compliance, Mashups (more) 19 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

DM Practice Evolution 1.0

0.8

0.6

0.4

Inferred and representative percentages of organizational 'practicing' DM by year 20 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Jan 1, 2007

Jan 1, 2006

Jan 1, 2005

Jan 1, 2004

Jan 1, 2003

Jan 1, 2002

Jan 1, 2001

Jan 1, 2000

Jan 1, 1999

Jan 1, 1998

Jan 1, 1997

Jan 1, 1996

Jan 1, 1995

Jan 1, 1994

Jan 1, 1993

Jan 1, 1992

Jan 1, 1991

Jan 1, 1990

Jan 1, 1989

Jan 1, 1988

Jan 1, 1987

Jan 1, 1986

Jan 1, 1985

Jan 1, 1984

Jan 1, 1983

Jan 1, 1982

Jan 1, 1981

Jan 1, 1980

Jan 1, 1979

Jan 1, 1978

0.2

Organizational DM Functions and their Inter-relationships Organizational Strategies

Data Program Coordination

Implementation Guidance

Goals

Organizational Data Integration

Integrated Models

Data Stewardship

Data Development

Standard Data

Application Models & Designs

Direction

Data Support Operations Business

Data Asset Use

Data

Feedback

Business Value 21 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Do you know the game Twister?

• • • •

22 - datablueprint.com

Canada Chile Columbia Egypt

• • • • •

Estonia Finland France Germany Great Britain

• • • • •

Ireland Italy Japan Qatar Scotland

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

• • • • •

Switzerland Thailand Turkey UAE US

Typical System Evolution

Finance Application (3rd GL, batch system, no source)

Payroll Data (database) Finance Data (indexed)

Payroll Application (3rd GL)

Marketing Data (external database)

Marketing Application (4rd GL, query facilities, no reporting, very large)

Personnel Data (database) Personnel App. (20 years old, un-normalized data) Mfg. Data (home grown database)

R&D Data (raw)

Mfg. Applications (contractor supported)

R& D Applications (researcher supported, no documentation) 23 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Nicolo Machiavelli (1469-1527)

He who doesn’t lay his foundations before hand, may by great abilities do so afterward, although with great trouble to the architect and danger to the building. Machiavelli, Niccolo. The Prince. 19 Mar. 2004 http://pd.sparknotes.com/philosophy/prince 24 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Information Architectures • … are plans, guiding the transformation of strategic organizational information needs into specific information systems development projects Source: Internet • "Information architecture is a foundation discipline describing the theory, principles, guidelines, standards, conventions, and factors for managing information as a resource. It produces drawings, charts, plans, documents, designs, blueprints, and templates, helping everyone make efficient, effective, productive and innovative use of all types of information." –

Source: Information First by Roger & Elaine Evernden, 2003 ISBN 0 7506 5858 4 p. 1.

• Information architecture (IA) is the art of expressing a model or concept of information used in activities that require explicit details of complex systems. (wikipedia.org) • All organizations have information architectures • Some are better understood and documented (and therefore more useful to the organization) than others. 25 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Building from the Top

26 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Sample Conversation (Developing Constraints) • I'd like to build a building. • What kind of building - do you want to sleep in it? Eat in it? Work in it? • I'd like to sleep in it. • Oh, you want to build a house? • Yes, I'd like a house. • How large a house do you have in mind? • Well, my lot size is 100 feet by 300 feet. • Then you want a house about 50 feet by 100 feet. • Yes, that's about right. • How many bedrooms do you need? • Well, I have two children, so I'd like three bedrooms ... 27 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

GAO Has Identified the Problem

28 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Concrete Block & Engineering Continuity

29 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Look Familiar?

30 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Finance Example • Business Rule: – A customer may have one and only one account

• Bank Manager: – The customer is always right ... – And this one needs multiple accounts!

31 - datablueprint.com

# 1 2 3 4 5 6 7 8 9 10 11

Account ID peter peter1 peter2 peter3 peter4 peter5 peter6 peter7 peter8 peter9 peter10

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Architecture Jargon

32 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Sorted IDs peter peter1 peter10 peter2 peter3 peter4 peter5 peter6 peter7 peter8 peter9

Avoiding Unnecessary Work Using Business Rule Metadata Person

BR1) Zero, one, or more EMPLOYEES can be associated with one PERSON

Job Class

BR4) One or more POSITIONS can be associated with one JOB CLASS.

'Mond-Licht' BR2) Zero, one, or more EMPLOYEES can be associated or with one JOB CLASS; 'Mondschein'

Employee

Job Sharing

Position

BR3) Zero, one, or more EMPLOYEES can be associated with one POSITION 33 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Student System Data Model

34 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Proposed Data Model 35 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

IBM's AD/Cycle Information Model

Application Build Model Defines the tools, parameters and environment required to build an automated Business Application.

Business Model Goals

Applications Structure Model Defines the overall scope of an automated Business Application, the components of the application and how they fit together.

Entity-Relationship Model Defines the Business Entities, their properties (attributes) and the relationships they have with other Business Entities. Extension Support Model Provides for tactical Information Model extensions to support special tool needs. Flow Model Specifies which of the Entity Relationship Model component instances are passed between Process Model components.

EntityRelationship Model

Process Model

Info Usage Model

DB2 Model Refines the definition of a Relational Database design to a DB2-specific design.

Enterprise Structure Model Defines the scope of the enterprise to be modeled. Assigns a name to the model that serves to qualify each component of the model.

Resource/ Problem Model

Enterprise Structure Model

Data Structures Model Defines the data structures and their elements used in an automated Business Application.

Derivations/Constraints Model Records the rules for deriving legal values for instances of Entity-Relationship Model components, and for controlling the use or existence of E-R instance.

Strategy Model

Organization/ LocationModel

Business Goals Model Defines the mission of the enterprise, its long-range goals, and the business policies and assumptions that affect its operations. Business Rules Model Records rules that govern the operation of the business and the Business Events that trigger execution of Business Processes.

Business Rules Model

Flow Model Value Domain Model

Extension Support Model

Derivations/ Constriants Model

Global Text Model

Application Structure Model

Application Build Model Program Elements Model IMS Structure Model

DB2 Model

Relational Database Model

Test Model

Library Model

Panel/ Screen Model

Global Text Model Supports recording of extended descriptive text for many of the Information Model components.

Library Model Records the existence of non-repository files and the role they play in defining and building an automated Business Application.

Program Elements Model Identifies the various pieces and elements of application program source that serve as input to the application build process.

IMS Structures Model Defines the component structures and elements and the application program views of an IMS Database.

Organization/Location Model Records the organization structure and location definitions for use in describing the enterprise.

Info Usage Model Specifies which of the Entity-Relationship Model component instances are used by other Information Model components.

Panel/Screen Model Identifies the Panels and Screens and the fields they contain as elements used in an automated Business Application.

Resource/Problem Model Identifies the problems and needs of the enterprise, the projects designed to address those needs, and the resources required.

36 - datablueprint.com

Relational Database Model Describes the components of a Relational Database design in terms common to all SAA relational DBMSs.

Process Model Defines Business Processes, their Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved! sub processes ©and components.

Data Structure Model

Strategy Model Records business strategies to resolve problems, address goals, and take advantage of business opportunities. It also records the actions and steps to be taken. Test Model Identifies the various file (test procedures, test cases, etc.) affiliated with an automated business Application for use in testing that application. Value Domain Model Defines the data characteristics and allowed values for information items.

37 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

38 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Archeology-based Transformations Solve a Puzzle • Primary sources of guidance: – The edge-pieces are easy to identify – Distinct physical piece features exist, such as colors, patterns, pictures, etc. • Steps for solving: – Physically segregate all identified edge pieces (not always present in existing environment.) – Create puzzle framework - connecting edge pieces using the puzzle picture – Within frame, physically group remaining pieces by distinct physical features – Solve a smaller section of the puzzle containing just a portion of the picture that is focused on similar physical features such as a ball or a puppy as images in the picture. This is an effective approach because the • Focus is on a common domain–one distinct aspect of the entire picture • Because it focuses the analysis on a smaller number of puzzle pieces it is proportionately smaller than attempting to solve the overall puzzle at once. – As the components are assembled, combine them to solve the complete puzzle. 39 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

How was this bridge constructed?

40 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Flood

41 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

42 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

New River Bridge

Bridge Engineering

43 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Oct 2004 IRS Accomplishment • Unified five definitions of "child" • Reduce 5 definitions to 1 for tax return preparations such as: – Dependent – Earned income tax credit – Child credit • Different reasons, either it – "Was developed to carry out social policy objective(s), or – Someone perceived it was going to save revenue" • "Is it easier for (customers) to understand and it is easier for IRS to audit and there a lots of things like that we can do" • Initiative started in 1991 - it took 13 years including 2.5 years moving as legislation! Source: Pamela F. Olson former Assistant Secretary for Tax Policy (quote from the Diane Rehm Show • 11/29/04 • http:// www.wamu.org/programs/dr/04/11/29.php) 44 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

45 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Data Integration/Exchange Challenges • Customer typically has had different meanings to different parts of the organization: – Accounting -> organization that buys products or services – Service -> client – Sales -> prospect

• Assigning the same mission to the DoD ‘lines of business’ to: “Secure the building” elicits very different results from each ‘line of business’: – Army: Posts guards at all entrances and ensures no unauthorized access – Navy: Turns out all the lights, locks up, and leaves – Marines: Sends in a company to clear the building room-by-room; forms perimeter defense around the building – Air Force: Signs three year lease with option to buy [Second example courtesy of Burt Parker] 46 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

FBI & Canadian Social Security Gender Codes

1. 2. 3. 4. 5. 6. 7. 8. 9.

Male Female If column 1 in Formerly male now female source = "m" • then set Formerly female now male value of target data Uncertain to "male" Won't tell • else set value of Doesn't know target data to "female" Male soon to be female Female soon to be male

Hypothesized extensions contributed by a Chicago DAMA Member 10. Both soon to be female 11. Both soon to be male 12. Psychologically female, biologically male 13. Psychologically male, biologically female 47 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Predicting Engineering Problem Characteristics

Legacy System #1: Payroll

Platform: OS: 1998 Age: Data Structure: tables Physical Records: Logical Records: Relationships: Entities: Attributes: Characteristics Platform: OS: 1998 Age: Data Structure:

Amdahl MVS 15 VSAM/virtual database 780,000 60,000 64 4/350 683

WinTel Win'95 new Client/Sever RDBMS

Platform: OS: 1998 Age: Data Structure: Physical Records: Logical Records: Relationships: Entities: Attributes:

Records: Relationships: Entities: Attributes:

Logical 250,000 1,034 1,600 15,000

New System

48 - datablueprint.com

UniSys OS 21 DMS (Network) 4,950,000 250,000 62 57 1478

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Physical 600,000 1,020 2,706 7,073

Legacy System #2: Personnel

May 27, '96 ID 1

Task Name

Duration

S

Work

S

M

T

Jun 3, '96 W T

18.01d

$128,335.99

82.44d

2

1100 Organize Project

18d

$42,585.33

27.36d

Technology Consultant[0.55],Experi

3

1200 Complete Work Program

18d

$71,739.42

46.08d

Technology Consultant[0.92],Consu

4

Detailed Work Plan and Finalized Deliverable List

0d

$0.00

0d

5 6

1000 ORGANIZATION

Cost

1300 Develop Quality Plan 2000 ESTABLISH DEVELOPMENT ENVIRONMENT

18.01d

$14,011.24

9d

54d

$235,364.34

228.07d

Technology Consultant[0.18],Consu

7

2100 Setup Application Software

18d

$51,310.67

49.86d

Manager[0.44],Technology Consulta

8

2200 Site Preparation

54d

$184,053.67

178.2d

Experienced Analyst[0.56],Technolo

9

Comprehensive Backup Plan

0d

$0.00

0d

72.01d

$347,901.67

249.13d

18.01d

$39,821.00

21.97d

0d

$0.00

0d

36d

$123,597.00

91.08d

18.01d

$17,485.42

12.96d

10

3000 PLAN CHANGE MANAGEMENT

11

3100 Develop Change Management Plan

12

Change Management Plan

13

3200 Implement Change Management Plan

14

3300 Develop Impact Analysis Plan

15

Impact Analysis Plan

0d

$0.00

0d

16

3400 Implement Impact Analysis Plan

18d

$166,998.25

123.12d

4000 PERFORM CONFIGURATION TEST

17

72d

$93,585.25

76.14d

18

4100 Prepare for Functional Configuration Testing

54d

$53,091.67

36.18d

19

4200 Perform Functional Configuration Testing

18d

$40,493.58

39.96d

20

5000 PRELIMINARY SYSTEM & PROCESS DESIGN

108d

$1,248,758.99

1079.82d

21

5100 Analyze Business Processes

54d

$621,386.25

511.92d

22

5200 Software Fit Analysis

54d

$568,447.16

505.44d

Project: Date: Thu 9/28/00

Task

Summary

Progress

Rolled Up Task

Milestone 49 - datablueprint.com

Rolled Up Progress

Rolled Up Milestone Page 1

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

"Extreme" Data Engineering • 2 person months = 40 person days • 2,000 attributes mapped onto 15,000 • 2,000/40 person days = 50 attributes per person day or 50 attributes/8 hour = 6.25 attributes/hour and • 15,000/40 person days = 375 attributes per person day or 375 attributes/8 hours = 46.875 attributes/hour • Locate, identify, understand, map, transform, document, QA at a rate of • 52 attributes every 60 minutes or .86 attributes/minute! 50 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Manager[0.24],Consultant[0.61],Exp

Analyst[0.76],Consultant[1.26],Mana

Consultant[0.37],Experienced Analy

Manager[1.42],Consultant[3.32],Ana

Why Data Projects Fail by

Joseph R. Hudicka

Median Project Expense

• Assessed 1200 migration projects! – Surveyed only experienced migration specialists who have Median Project Cost done at least four migration projects $0 • The median project costs over 10 times the amount planned!

$125,000

$250,000

$375,000

$500,000

• Biggest Challenges: Bad Data; Missing Data; Duplicate Data

• The survey did not consider projects that were cancelled largely due to data migration difficulties • "… problems are encountered rather than discovered"

Joseph R. Hudicka "Why ETL and Data Migration Projects Fail" Oracle Developers Technical Users Group Journal June 2005 pp. 29-31 51 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Organizational DM Functions and their Inter-relationships Organizational Strategies

Data Program Coordination

Implementation Guidance

Goals

Organizational Data Integration

Major focus of study and research

Integrated Models

Data Stewardship

Standard Data

Data Development

Application Models & Designs

Direction

Data Support Operations Business Data

Feedback

Business Value 52 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Data Asset Use

New Technical Expertise Required

• Focus has been on new systems development • Guidance and technical expertise required to develop new data applications and components. – New domain focus is on maintenance of existing environments. – Understanding what the existing systems were originally designed to accomplish (the requirements) and on how (the design) those systems 53 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Why?

54 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Metadata Engineering O-1/3 reconstitute original metadata O-4/5 improve the current metadata O-6/9 improve system data capabilities based on the improved metadata Reverse Engineering

Existing

As Is Information As Is Data Design Assets As Is Data Implementation Requirements O5 Reconstitute Assets Assets Requirements

O7 Redevelop Requirements

New

O3 Recreate Requirements To Be Requirements Assets

O2 Recreate Data Design To Be Design Assets

O6 Redesign Data

Metadata

O4 Reconstitute Data Design O1 Recreate Data Implementation To Be Data Implementation Assets

O9 Reimplement Data

O8 Redesign Data Forward engineering 55 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

System Component A

System Component B

System Component C

System Component D

System Component E

System Component F

Architectural components from selected system components …

Common metadata model implementation

CM2

… are repurposed for use on other integration efforts.

Organizational performance optimization 56 - datablueprint.com

Business engineering

Data & architecture evolution

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Software reverse engineering

Component Structure Analysis

Existing Environment Component structure is unknown

Component structure is discovered

Pareto Analysis

Phase I: Archeology-based Transformations designed to understand the existing environment

Phase II: Developing the desired architecture

T4 T1 T2 T3 - Planning Repeatability Pareto Implementation Technology Integration Reusability Subset is Modeling Filtering Combing Hypothesized

T5 - Potential Capabilities Analyses

T8 - CM2T7 based Solutions Component Engineering Implementatio n Engineering

T6 - Gap Analyses

Component 1 Component 2 Component 3 Component 4 Component 5 Component 6 Component 7 Component 8 Component 9 Unknown Component 10 collection of Component 12 components Component 13 Component 14

Component 11

Component 15 Component Component Component Component

16 17 18 19

Component Component Component Component Component

21 22 23 24 25

Component 20

57 - datablueprint.com

Structured Data Engineering © Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Consumer Goods

Federal Govmnt

Hospitality

Health Insurance

Telephony

• Customers

• Citizens

• Guests

• Policies

• Customers

• Suppliers

• Taxpayers

• Services

• Policyholders

• Products

• Products

• Terrorists

• Stays

• Groups

• Services

• Stores

• Visas

• Facilities

• Claims

• Equipment

• Locations

• Claimants

• Loyalty Programs

• Providers

• Inventory

Major Subject Areas

• Services

Distribution

State/Local Govmt

Insurance

• Customers

• Citizens

• Policies

• Assets

• Tax Payers

• Policyholders

• Supply Chain

• Service Recipients

• Incidents

• Warehousing

• Properties

• Inventory

• Claims

Manufacturing/ Distribution • Customers • Suppliers • Distributors • Products

• Beneficiaries

• Parts

Transportation • Customers • Suppliers • Vehicle Inventory • Transportation Routes

• Inventory

Finance/Banking

Universities

Healthcare

Retailers

Utilities

• Customers

• Students

• Patient

• Customers

• Customers

• Accounts

• Instructors

• Suppliers

• Loyalty Programs

• Suppliers

• Products

• Courses

• Treatments

• Suppliers

• Services

• Branches

• Enrollments

• Hospitals

• Products

• Installations

• Facilities

• Doctors

• Orders

• Utilization

• Classrooms

• Nurses

• Inventory

• Exams & testing

• Medications

Adapted from Data Strategy by Sid Adelman, Larissa Moss and Majid Abai (2005) Addison-Wesley Professional, ISBN: 0321240995 58 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

How many interfaces are required to solve this integration problem?

Application 1

Application 2

Application 3

15 Interfaces (N*(N-1))/2

Application 4

Application 5

Application 6

RBC: 200 applications - 4900 batch interfaces 59 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

XML-based Integration Solution

Application 1

Application 2

Application 3

Integration Processor

Application 4

60 - datablueprint.com

Application 5

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Application 6

Typical System Evolution

Finance Application (3rd GL, batch system, no source)

Payroll Data (database) Finance Data (indexed)

Payroll Application (3rd GL)

Marketing Data (external database)

Marketing Application (4rd GL, query facilities, no reporting, very large)

Personnel Data (database) Personnel App. (20 years old, un-normalized data) Mfg. Data (home grown database)

R&D Data (raw)

Mfg. Applications (contractor supported)

R& D Applications (researcher supported, no documentation) 61 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Becomes this …

Finance Application (3rd GL, batch system, no source)

Payroll Data (database) XML Processor

Finance Data (indexed)

Payroll Application (3rd GL)

XML Processor Marketing Data (external database)

XML Processor Marketing Application (4rd GL, query facilities, no reporting, very large)

XML Processor Personnel Data (database) XML Processor

Personnel App. (20 years old, un-normalized data) R&D Data (raw)

Mfg. Data (home grown database)

XML Processor

R& D Applications (researcher supported, no documentation) 62 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Mfg. Applications (contractor supported)

3-Way Scalability

XML-based Integration Solution

Application 1

Application 2

Application 3

Expand the: 1. Number of data items from each system –

XML Processor

How many individual data items are tagged?

Application 4

2. Number of interconnections between the systems and the hub 43 - datablueprint.com



Application 5

© Copyright 2004 by Data Blueprint - all rights reserved!

How many systems are connected to the hub?

3. Amount of interconnectability among hubconnected systems –

How many inter-system data item transformations exist in the rule collection?

63 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

XML-Based Meta Data Management

Existing System 1

System 2

System 4

System 3 System 5 System 6

64 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Application 6

XML-Based Meta Data Management

Existing

New System-to-System Program Transformation Knowledge

System 1

XSLT System 2

Transformations

System 4

XSLT 3 System

Transformations Data Store

Generated Programs

Transformations

System 5 XSLT Transformations

System 6

65 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

XML-based Portals: Portal Motivation • Portals do for the web what Windows did for DOS

66 - datablueprint.com

[Adapted from Terry Lanham Designing Innovative Enterprise Portals and Implementing Them Into Your Content Strategies Lockheed Martin’s Compelling Case Study Web Content II: Leveraging Best-of-Breed © Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved! Content Strategies - San Francisco, CA 23 January 2001]

Portal Solution

[Adapted from Terry Lanham Designing Innovative Enterprise Portals and Implementing Them Into Your Content Strategies Lockheed Martin’s Compelling Case Study Web Content II: Leveraging Best-of-Breed Content Strategies - San Francisco, CA 23 January 2001] 67 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Top Tier Demo

68 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

69 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Cruiser Collector

70 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

1996 Council of American Capability Maturity Building Officials (COBE) and the We have a process for 2000 International Code Councilimproving our DM capabilities Model Levels recommendations call for unit runs to be not less than 10 inches and unit rises not more We manage our DM processes so that than 7! inches. the whole organization can follow our standard DM guidance

We have experience that we have standardized so that all in the organization can follow it

71 - datablueprint.com

One concept for process improvement, others include:

Managed (4)

Defined (3)

Repeatable (2)

Initial (1)

Optimizing (5)

• Norton Stage Theory • TQM • TQdM • TDQM • ISO 9000 and focus on understanding current processes and determining where improvements can be made.

We have DM experience and have the ability to implement disciplined processes

Our DM practices are ad hoc and dependent upon "heroes" © Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Key Finding: Process Frameworks are not Created Equal With the exception of CMM and ITIL, use of process-efficiency frameworks does not predict higher on-budget project delivery… Percentage of Projects on Budget By Process Framework Adoption

…while the same pattern generally holds true for on-time performance Percentage of Projects on Time By Process Framework Adoption

Source: Applications Executive Council, Applications Budget, Spend, and Performance Benchmarks: 2005 Member Survey Results, Washington D.C.: Corporate Executive Board 2006, p. 23. 72 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Organizational DM Functions and their Inter-relationships

Implementation

Organizational Strategies

Data Program Coordination

Guidance Goals

Organizational Data Integration

Integrated Models

Data Stewardship

Data Development

Standard Data

Application Models & Designs

Direction

Data Support Operations Business

Data Asset Use

Data

Feedback

Business Value

73 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Defining, coordinating, resourcing, implementing, and and monitoring organizational Organizational DM Functions their Inter-relationships data program strategies, policies, plans, etc. as coherent set of activities.

Implementation

Organizational Strategies

Data Program Coordination

Identifying, modeling, coordinating, organizing, distributing, and architecting Guidance data shared across business areas or organizational boundaries. Goals

Organizational Data Integration

Ensuring that specific individuals are assigned the responsibility for the maintenance of specific data as organizational assets, and that those individuals are provided the requisite knowledge, skills, and abilities to accomplish these goals in conjunction with other data stewards inDirection the organization.

Integrated Models

Data Stewardship

Standard Data

Specifying and designing appropriately Application architected Models &toDesigns data assets that are engineered be capable of supporting organizational needs. Data Support Operations Business Data

Feedback

Initiation, operation, tuning, maintenance, backup/ recovery, archiving and disposal of data assets in support of organizational activities.

74 - datablueprint.com

Data Development

Business Value

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Data Asset Use

Organizational DM Functions and their Inter-relationships Data management processes and infrastructure

Data Program Coordination Combining multiple assets to produce extra value

Organizational Strategies

Implementation Guidance

Goals

Organizational Data Integration

Organizationalentity subject area data integration

Integrated Models

Data Stewardship

Achieve sharing of data within a business area

Standard Data

Data Development

Application Models & Designs

Direction Feedback

Provide reliable access to data

Data Support Operations Business

Data Asset Use

Data Leverage data in organizational activities

Business Value 75 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

How is it done?

• Follows form of a semistructured interview • Approximately one hour is required to complete each interview • Examines organizational data management practices in five areas • Branched series of questions explores capabilities, execution, and ongoing efforts. • Total time to results typically ranges from 1 week to 1 month 76 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Council Hill Road Sign roadsign

Photo from William J. Manon Jr. .pbase.com/g3/91/555491/ 2/66430431.telWKGJG.jpg

77 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Assessment Benefits

• Quantitative Benefits – Objective determination of baseline BI/Analytic capabilities – Gap analysis indicates specific actions required to achieve the "next" level – Available comparisons with similar organizations – Provides facts useful when prioritizing subsequent investments

• Qualitative Benefits – Highlights strengths, weaknesses, capabilities, and limitations existing BI/ 78 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Data Management Practices Measurement (DMPA) Collaboration with CMU's Software Engineering Institute (SEI) Results from more than 400 organizations



• Optimizing (V)

Managed (IV)

Defined (III)

Repeatable (II)

Initial (I)

Public Companies State Government Agencies Federal Government International Organizations

– –

Data Program Coordination



Focus: Guidance and Facilitation

Organizational Data Integration





Data Stewardship Focus: Implementation and Access

Data Development

Defined industry standard

Data Support Operations 3279 - datablueprint.com - datablueprint.com

© Copyright©01/1/08 Copyright and 07/23/08 previous by years Data Blueprint by Data Blueprint - all rights -reserved! all rights reserved!

Sample Perception vs. Fact Chart 5

4

3

3.0

2.4

2.3

2.2

2

2.0

1

1.0

2.0

1.2 1.0

1.0

0 Development Guidance

Data Adminstration

Support Systems

Verified 80 - datablueprint.com

Asset Recovery Capability Average

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Development Training

Comparative Assessment Results Challenge Data Program Coordination

Challenge

Organizational Data Integration

Data Stewardship

Data Development

Challenge Data Support Operations

0

1 Client Nokia

81 - datablueprint.com

2 Industry Competition

3 4 All Respondents

5

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

High Marks for IFC’s Program Data Mgmt Audit 2006 "These IFC scores represent the highest aggregate scores in the area of data stewardship recorded in our database of hundreds of assessments that has been recognized as as a representative scientific sample."

Leadership & Guidance

Asset Creation

Metadata Management

Quality Assurance

Change Management

Data Quality

0

Page Overall Benchmarks

1

2 Industry Benchmarks

3

4 TRE

IFC

5 ISG

The challenge ahead 5.00

The chart represents the average scores presented on the previous slide interesting that none have apparently reached level-3

4.00

3.00

2.00

1.00

0.00 1

83 - datablueprint.com

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

After more than a decade …

Question How many software practices (surveyed) are above level 1 on the CMM? Answer By far most organizations (95%) surveyed are producing software using informal processes Question How many organizations have demonstrated at least some proficiency according to the DM3? (i.e., scored above level 1) Answer One in ten organizations has scored above level 1 84 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Service Orient or Be Doomed!

• Service Orient or Be Doomed! – How Service Orientation Will Change Your Business (Hardcover) by Jason Bloomberg & Ronald Schmelzer – I'm not quite sure what "doom" awaits by not service orienting, other than remaining mired in archaic, calcified and siloed processes — which a lot of 85 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Services Integration Possibilities • User Interface • Business Process • Application • Data AV Component • Well defined components • Self-contained • No interdependencies Analogy derived from D. Barry "Web Services" Intelligent Enterprise 10/10/03 pp. 26-47 - wiring diagram from sunflowerbroadband.com 86 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Contractor Implemented Wiring

87 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Concise Notes on Software Engineering

88 - datablueprint.com

– Published in 1979 – 93 pages including appendices & references – Out of print – $1.99 at half.com • Principles of Information Hiding (p. 32-33) – Conceal complex data structures whenever possible – Allow only selected service modules to know about the concealed data structures – Bind together modules that know about concealed data structures

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

How Does SOA Fit In Existing Architectures? The basketball and golfball slide

Bank

89 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Evolving applications from stove pipe to web-service-based architectures

Organizational Portal Sunday, April 27, 2008 - All systems operational! Organizational News • Organizational Early News • Press Releases

16 million lines of legacy code

2.1 million lines of legacy code

Organizational IT

Email

• Service Desk • Settings

• 320 new msgs, 14,572 total • Send quick email

Organizational Essentials

Search

• • • • •

Knowledge network Employee assistance IT procurement Organizational media design Organizational merchandise

Reporting

Regional • Northeast • Northwest • Southeast • Southwest • Midnorth • Midsouth

90 - datablueprint.com

• Industry News • Newsletters

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

State • Alabama • Arkansas • Georgia • Mississippi • Vermont • Virginia

Go Stocks Full Portfolio

Market Update

XYZ YYZ ZZZ

50 29.5 45.25

As of: Sunday, April 27, 2008

Get Quote

Legacy Systems Transformed Into Web-services Accessed Through a Portal

Organizational Portal

Web Service 1.1 Web Service 1.2 Web Service 1.3 Web Service 2.1 Web Service 2.2 Web Service 3.1 Web Service 3.2 Web Service 4.1 Web Service 4.2 Web Service 5.1 Web Service 5.2 Web Service 5.3

Legacy Application 1

Legacy Application 2

Legacy Application 3

Legacy Application 4

Legacy Application 5

2591 - datablueprint.com - datablueprint.com

Monday, November 03, 2008 - All systems operational! Organizational News • Organizational Early News • Press Releases

• Industry News • Newsletters

Organizational IT

Email

• Service Desk • Settings

• 320 new msgs, 14,572 total • Send quick email

Organizational Essentials

Search

• • • • •

Knowledge network Employee assistance IT procurement Organizational media design Organizational merchandise

Reporting

Regional • Northeast • Northwest • Southeast • Southwest • Midnorth • Midsouth

State • Alabama • Arkansas • Georgia • Mississippi • Vermont • Virginia

Go Stocks Full Portfolio

Market Update

XYZ YYZ ZZZ

50 29.5 45.25

As of: Monday, November 03, 2008

Get Quote

© Copyright © Copyright 01/1/08 and 11/03/08 previous by Data yearsBlueprint by Data Blueprint - all rights- reserved! all rights reserved!

Solution Framework External Address Validation Processing

Channels Repository

SORs SOR 1 SOR 2 SOR 3 SOR 4 SOR 5

Ch 2

Indicator Extraction Service (could be segmented by day of week month, system, etc.)

Ch 1

Latency Check Service Customer Contact

SOR 8 92 - datablueprint.com

Ch 4 Ch 5 Ch 6 Ch 7 Ch 8

SOR 6 SOR 7

Ch 3

Update Addresses © Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Logical Extension

Text

93 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Logical Extension

94 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

CONTENTS >>

DEFINING THE BUSINESS VALUE OF TECHNOLOGY

ISSUE 1,198 AUG. 11, 2008

> NEWS FILTER

COVER STORY

28 Simpler Than SOA

21

Stymied by the complexity of SOAs, some IT departments are taking the Web-oriented architecture route

Global Problem The indictment of 11 people in five countries in connection with the theft of credit card numbers from U.S. retailers demonstrates how easily cybercrime crosses borders

21 Still Standing IT spending

has tightened in the United States, but demand from other parts of the world kept big tech companies growing in the second quarter

22 No Deal Deutsche Post kills

a proposed seven-year outsourcing deal with Hewlett-Packard, saying it wouldn’t save enough money to be worth the risk

Simpler Than SOA Stymied by the complexity of SOAs, some IT departments are taking the Web-oriented architecture route

23 Lost Opportunity IBM’s

e-discovery software offers many useful features, but it misses the mark by not pulling e-mail from third-party archives

23

Real Protection SunGard parlays its partnership with VMware into a service that uses virtualization to provide faster disaster-recovery setup

Smart Web App Development 24 Olympic-Sized Task

AT&T’s new Synaptic Hosting cloud computing service will get its first big test this week, providing temporary Web server capacity for the U.S. Olympic Committee’s Web site

Cover photo by Mick Coulas

Web-oriented architectures are easier to implement and offer a similar flexibility to SOA 25 New Cloud Forms 21 Small world

3595 - datablueprint.com - datablueprint.com

informationweek.com

22 Backpedaling

Elastra advances the idea of private clouds, in which corporate data centers use the technologies and practices of public cloud infrastructures from the likes of Amazon.com and Google © Copyright ©01/1/08 Copyright and 07/23/08 previous by years Data Blueprint by Data Blueprint - all rights -reserved! all rights reserved! Aug. 11, 2008 5

WOA

http://hinchcliffe.org/archive/2008/02/27/16617.aspx

35 - datablueprint.com - datablueprint.com

© Copyright©01/1/08 Copyright and 07/23/08 previous by years Data Blueprint by Data Blueprint - all rights -reserved! all rights reserved!

SOA & Data & ???

97 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

SOA Requirements Data Program Coordination

Organizational Data Integration

Data Stewardship

Data Development

Data Support Operations 0 4498 - datablueprint.com - datablueprint.com

1.25

© Copyright © Copyright 01/1/08 and 11/03/08 previous by Data yearsBlueprint by Data Blueprint - all rights- reserved! all rights reserved!

2.50

3.75

5.00

Predictive Analysis • I'm a little surprised, with such extensive experience in predictive analysis, you should've known we would hire you

99 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

What is Analytics?

• Analytics: – Something that is analytic • Analytic: – Of or relating to analysis; especially; separating or breaking up a whole or a compound into it component parts or constituent elements – Skilled in or using analysis – The science of logical analysis

101 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Car Maxx in Doha, Qatar

102 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

BI/Analytic Capabilities

• Business Intelligence (BI) – refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information and sometimes to the information itself. – The purpose of business intelligence--a term that dates at least to 1958--is to support better business decision making.

• Analytics – The simplest definition of Analytics is "the science of analysis." – A simple and practical definition, however, would be how an entity (i.e., business) arrives at an optimal 103 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

BI/Analytic Capabilities

Analytics

Strategy formulation 104 - datablueprint.com

Business Intelligence

Strategy implementation

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

BI/Analytic Capabilities

• Wine quality = 12.145 + 00.00117 winter rainfall + 0.0614 growing season temperature - 0.00386 harvest rainfall (Orley Ashenfelter) • Out performs experts – specifically Robert Parker (http://www.erobertparker.com/)

– Most everyone else

• Clinical Versus Statistical Prediction (Paul Meele)

– 8/136 studies experts were more accurate 105 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

BI Challenges

• Technical Challenges – – – –

Poor quality data Poor understanding of architectural constructs Poor quality data management practices New technical expertise is required

• Non-Technical Challenges – Architecture is under appreciated – BI perceived as a "technology" project – Inability to link technical capabilities to business objectives – Putting BI initiatives in context 106 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Obstacles to Real-Time BI-Lessons from Deployment Business case, high cost or budget issues

60%

Non-integrated data sources

47%

Education and understanding of real-time BI by business users

46%

Lack of infrastructre for handing real-time processing

46%

Poor quality data

43%

Education and understanding of real-time BI by IT staff

36%

Lack of tools for doing real-time processing

35%

Immature technology Performance and scalability

28%

24%

TDWI The Real Time Enterprise Report, 2003

107 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Cost of Poor Data Quality $600 Billion Annually!

108 - datablueprint.com

Thanks to Bret Champlin

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Who is Joan Smith?

http://www.sas.com 109 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Defining Customer Challenges • Purchased an A4 on June 15 2007 • Had not done business with the dealership prior • "makes them seem sleazy when I get a letter in the mail before I've even made the first payment on the car advertising lower payments than I got" 110 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Defining Customer Challenges • Purchased an A4 on June 15 2007 • Had not done business with the dealership prior • "makes them seem sleazy when I get a

111 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

How to solve this data quality problem using just tools?

Retail price for the unit was $40

112 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

A congratulations letter from another bank Problems • Bank did not know it made an error • Tools alone could not have prevented this error • Lost confidence in the ability of the bank to manage customer funds 113 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

From my retirement plan

114 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Rolling Stone Magazine TRA- - datablueprint.com

© Copyright 11/11/08 and prior years by Data Blueprint - all rights reserved!

Quantitative Benefits

116 - datablueprint.com

© Copyright 01/1/08 and previous years by Data Blueprint - all rights reserved!

Evidence Type

Business Intelligence

Location

User Type

Evidence

System Component

Component Element

System Component Type

Logical Data Attribute Logical Data Entity

Information

Business Processes

Process

XML-based Portals

Model Decomposition

01101001 01100100 01110010

Business Rules

Increased business perception of DM value resulting from better business systems including repositories, warehouses, ERP implementations

http://peteraiken.net

Data Assets XML-based Repositories

XML

Challenge #4 Challenge #3 Challenge #1 Challenge #2 Challenge #1

Revised Data Management Goals Data Analysis Technologies

Contact Information: Tomorrow's Data Management

10 - datablueprint.com

Quality © Copyright 2004 by Data Blueprint - all rights reserved!

Peter Aiken, Ph.D. Department of Information Systems School of Business Virginia Commonwealth University 1015 Floyd Avenue - Room 4170 Richmond, Virginia 23284-4000 Data Blueprint Maggie L. Walker Business & Technology Center 501 East Franklin Street Richmond, VA 23219 804.521.4056 http://datablueprint.com office :+1.804.883.759 cell:+1.804.382.5957 e-mail:[email protected] http://peteraiken.net 117 - datablueprint.com

12/18/07 by Data - all rights -reserved! © CopyrightCopyright 01/1/08 and previous yearsBlueprint by Data Blueprint all rights reserved!