Testing data warehouse with key data indicators - CGI

27 downloads 97 Views 539KB Size Report
Testing data warehouses with key data indicators ... Quality assurance of historical data for Basel II, IFRS etc. ..... Fast reports with results in MS Excel and PDF.
Testing data warehouses with key data indicators Results with Highspeed Adalbert Thomalla, Stefan Platz © CGI GROUP INC. All rights reserved _experience the commitment

TM

Agenda

The Problem •

General Problem



Problem within the Project

The Idea The Solution •

General Method of Solution



Solution within the Project

2

The Problem General Problem

General Problem Test in the project / regression test •

Non-recurring assurance of the data quality in a project within a specified project plan



Test and retest multiple deliveries of mass-data



Quality assurance of historical data for Basel II, IFRS etc.

Plan Testbegin

Scheduled end of project Corrected Datadelivery

Datadelivery

Test preparation

Test

approval Re-Test

Time 4

General Problem Data verification •

Recurring assurance of data quality within production



Continuous check of the delivery of mass data



Additional sources of errors within recurring data deliveries

Reporting

Reporting

Datamart

Datamart

DWH Code X

X

X

Code X

X

X X

5

Problem Problem within the Project

Concrete Problem within the Project Project

Root-Systems

Basel II DWH (min. 5 years history)

Subsequent processing

Build and test of a DWH for historical Basel II data

ETL Processes



eg. calculation of parameters or regulatory reporting

7

Concrete Problem within the Project The original plan •

Non-recurring historical data delivery and test of this data set (inclusive Re-Test)



Handover of the daily data delivery within production



No usage of testing tools intended

8

Concrete Problem within the Project Scope of testing Around 50 tables –500 fields – several millions of data records • Around 500 test cases within 3 levels (Possible value range – Data integrity – End-to-End-Test)



Test execution •

Manual execution and documentation of the tests



Individual execution of every test case



Documentation of the test execution within a MS Access testing database

9

Concrete Problem within the Project Actual condition •

Recurring historical data delivery because of changes and incidents Time- and resources consuming (Duration of a complete test cycle around 20 person days) Partial abort of the test because of a new data delivery Concentration on one defined test data (One historical month)

Additional requirement •

Recurring verification of the data quality in production

10

Concrete Problem within the Project Plan Testbegin

Scheduled end of project Corrected Datadelivery

Datadelivery

Test preparation

approval Re-Test

Test

Time Scheduled end of project

Current situation Testbegin

Corrected Datadelivery

Datadelivery

Test preparation

Test

Corrected Datadelivery Test

New Datadelivery

Re-Test

Test

Re-Test ?!

Time 11

The Idea

The Idea No time-consuming repeat of all test cases! Fast test execution!

Automatization!

Design of a slim test tool using predefined data quality indicators 13

The Idea

Table A

Table B

DWH ETL / Transformation

Table A

Table B

+ Merge

=

Test: Balance Account 0815 Table A = Balance Account 0815 Table B?? 14

The Idea

Table A

Table B

DWH ETL / Transformation

BalanceA

Balance

BalanceB

BalanceA

Balance

BalanceB ?

15

The Idea Balance

Balance

Creditcards

Creditcards

Defaults ...

...

Defaults ...

...

16

The Solution General Method of Solution

The Solution

Reporting Administration und configuration of indicators and rules

Calculation engine

18

Process

System Landscape

Indicators & Rules

Execution

Results

Export TestingDatabase

19

System Landscape

System Landscape

Indicators & Rules

Execution

Results

Export TestingDatabase

Prerequisite •

SAS and Excel



Data sources directly in SAS or through links to Oracle or DB2



Authority to read the data

Indicators and Rules

Calculation Engine

20

System Landscape

System Landscape

Indicators & Rules

Execution

Results

Export TestingDatabase

21

Indicators Functional

System Landscape

Indicators & Rules

Execution

Results

Export TestingDatabase

Indicators in the context are sums and other aggregate functions like: Indicator

Table

Usage

Number of accounts

Accounts, Scoring Accounts, Customers Accounts, Balance-Sheet Accounts, Balance-Sheet Accounts

Reconciliation between data sources Reconciliation between data sources Reconciliation with the Balance-Sheet Reconciliation with the Balance-Sheet Direct Validation, Integrity Direct Validation, Integrity

Number of customers Sum of balance Sum of credit cards balance Number of defaulted accounts without being past due Number of accounts without scoring

Accounts

22

Indicators Technique

System Landscape

Indicators & Rules

Execution

Results

Export TestingDatabase

From a technical point of view the indicators are summary functions according to the SQL standard (SAS): Function

Description

Example

SUM AVG|MEAN COUNT|FREQ|N

Sum Average Counting values

SUM(Balance) AVG(Scorevalue) COUNT(*)

NMISS

Counting missing values Smallest value Maximum value

NMISS(Customer)

MIN MAX

MIN, PRT, RANGE, Statistics e.g.: STD STD, STDERR, T, (standard deviation) USS, VAR, CSS, CV

MIN(Scorevalue) MAX(Scorevalue) STD(Scorevalue)

23

Indicators Technique •

System Landscape

Indicators & Rules

Execution

Results

Export TestingDatabase

Conditional sum of balance of accounts being in a special product group with CASE WHEN... SUM(CASE WHEN ARREAR