The Big Debate: Inmon Versus Kimball. ▫ In the beginning there were basically
two approaches to modeling the data warehouse. ▫ Inmon popularized the term
...
Data Warehouse Architecture Lecture #3 IMRAN KHAN IBA
What & Why Architecture? “An architecture is a set of rules to adhere to when building something” Because a data warehouse can become quite large and complex, using an architecture is essential for success Strict rules for how to architect a data warehouse do not exist over the last 15 years a few common architectures have emerged
Imran Khan IBA-FCS City Campus
What & Why Architecture? According to research conducted in 2006 by The Data Warehousing Institute (TDWI) , five possible ways to architect a data warehouse: 1.
2. 3. 4. 5.
Independent data marts—Each data mart is built and loaded individually; there is no common or shared metadata. This is also called a stovepipe solution. Data mart bus—The Kimball solution with conformed dimensions. Hub and spoke (corporate information factory)—The Inmon solution with a centralized data warehouse and dependent data marts. Centralized data warehouse—Similar to hub and spoke, but without the spokes; i.e. all end user access is directly targeted at the data warehouse. Federated—An architecture where multiple data marts or data warehouses already exist and are integrated afterwards. A common approach to this is to build a virtual data warehouse where all data still resides in the original source systems and is logically integrated using special software solutions. Imran Khan IBA-FCS City Campus
Imran Khan IBA-FCS City Campus
Imran Khan IBA-FCS City Campus
The Big Debate: Inmon Versus Kimball
In the beginning there were basically two approaches to modeling the data warehouse.
Inmon popularized the term data warehouse
Strong proponent of a centralized and normalized approach
Kimball took a different perspective with his Data Marts and Conformed Dimensions.
Imran Khan IBA-FCS City Campus
Differences between the Inmon and Kimball approach 1.
2.
3.
Data warehouse versus data marts with conformed dimensions Centralized approach versus iterative/decentralized approach Normalized data model versus dimensional data model
Imran Khan IBA-FCS City Campus
Conceptual DW Architectures Direct
data mart
Short
term, quick results
Architected,
enterprise data
warehouse
Long
term foundation for future development Imran Khan IBA-FCS City Campus
Data Mart
Data Mart
A subset
of data (from the data warehouse) designed to answer specific business questions
Also called: Departmental Data Warehouse (Silverston & Graziano) Dimensional Data Warehouse (Kimball)
Imran Khan IBA-FCS City Campus
Direct Data Mart Transformation Routines (ETL) Source 1
Sales Data Mart
Source 2
Financial Data Mart
Source 3
Customer Service Data Mart Imran Khan IBA-FCS City Campus
Direct Data Marts Pros:
Build
individual data marts faster
Good
for prototyping
Imran Khan IBA-FCS City Campus
Direct Data Marts
Cons:
Requires
redundant coding
Must transform each source multiple times
Once for each data mart New data marts require new transform for each source New sources require multiple transformations
If
business rules change, must change code in multiple routines
Increased number of routines may require more processing power
Multiple points of failure for ETL
Data marts can get out of sync Imran Khan IBA-FCS City Campus
Architected Data Warehouse Core
enterprise data warehouse design
Based
on corporate (logical) data model
May include an ODS Specific
Based
departmental data marts on business needs Imran Khan IBA-FCS City Campus
Architected Data Warehouse Sales Data Mart
Source 1
Source 2
Enterprise Data Warehouse
Financial Data Mart
Customer Service Data Mart
Source 3
Imran Khan IBA-FCS City Campus
Architected Data Warehouse
Pros:
Reduced
long term maintenance
Complex source transformations occur once
From source to staging area (or ODS) If business rules change, code changes required in only one place
Reduces
points of failure
Second set of ETL routines handle simple aggregations and data segmentation
Easier to create new data marts
Enterprise
DW becomes source of historical data Imran Khan IBA-FCS City Campus
Architected Data Warehouse Cons:
Requires
more disk space
Requires 2 sets of ETL routines
Imran Khan IBA-FCS City Campus
Corporate Information Factory Information Workshop
Library & Toolbox
Workbench
Information Feedback
External
API
Data Warehouse
ERP
Internet
API
API
Legacy
API
Other
Data Acquisition
CIF Data Management
Data Delivery
Operational Data Store TrI
Operational Systems
Systems Management
Exploration Warehouse
DSI
Data Mining Warehouse
DSI
OLAP Data Mart
DSI
Oper Mart
DSI
Meta Data Management Data Acquisition Management
Operation & Administration
Service Management
Imran Khan IBA-FCS City Campus
Change Management
Multi-Tiered Architecture other
Metadata
sources Operational
DBs
Extract Transform Load Refresh
Monitor & Integrator
OLAP Server
Serve
Data Warehouse
Analysis Query Reports Data mining
Data Marts
Data Sources
Data Storage
OLAP Engine Front-End Tools
Imran Khan IBA-FCS City Campus
Source Data Component
Production Data
Internal Data
“private” spreadsheets, documents, customer profiles, and sometimes even departmental databases.
Archived Data
data comes from the various operational systems of the enterprise e.g. financial systems, manufacturing systems, systems along the supply chain, and customer relationship management systems.
Some data is archived after a year. Sometimes data is left in the operational system databases for as long as five years.
External Data
For example, the data warehouse of a car rental company contains data on the current production schedules of the leading automobile manufacturers. This external data in the data warehouse helps the car rental company plan for its fleet management. Imran Khan IBA-FCS City Campus
Data Staging Component
Data Extraction Data Transformation Data Loading Data staging provides a place and an area with a set of functions to Clean Change Combine Convert Deduplicate Prepare source data for storage and use in the data warehouse.
Imran Khan IBA-FCS City Campus
Type of Meta Data
Operational metadata Extraction & Transformation metadata End-user metadata
Why is metadata especially important in a data warehouse?
First, it acts as the glue that connects all parts of the data warehouse. Next, it provides information about the contents and structures to the developers. Finally, it opens the door to the end-users and makes the contents recognizable in their own terms.
Imran Khan IBA-FCS City Campus
OLAP Server Architectures
Relational OLAP (ROLAP)
Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware to support missing pieces
Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services
greater scalability
Multidimensional OLAP (MOLAP)
Array-based multidimensional storage engine (sparse matrix techniques)
fast indexing to pre-computed summarized data
Hybrid OLAP (HOLAP)
User flexibility, e.g., low level: relational, high-level: array
Specialized SQL servers
specialized support for SQL queries over star/snowflake schemas Imran Khan IBA-FCS City Campus
OLAP:On-Line Analytical Processing
an environment for the analysis of multi-dimensional data
dice
rotate
drill-down
rollup OLAP provides advanced database support involving attribute selection, attribute encoding, row sampling, data cleansing and allows the use of multiple different search engines
easy to use user-interface
open system architecture using local processing power Imran Khan IBA-FCS City Campus
Roll-up, Drill-down, Slicing, Dicing Drill-Down pop92
| state | | NOR_EAS NOR_CEN SOUTH WEST Total | ------------------------------------------------------------------------------------LAR_CITY | 3.62% 8.59% 15.68% 13.28% 41.17% | MED_CITY | 3.35% 5.36% 5.18% 7.02% 20.91% | SMA_CITY | 2.58% 5.66% 4.85% 5.16% 18.25% | SUP_CITY | 8.30% 3.54% 2.54% 5.29% 19.67% | ------------------------------------------------------------------------------------Total | 17.84% 23.15% 28.25% 30.75% 100.00% |
| state |E_N_CEN E_SO_CE MID_ATL ... --------------------------------------------------------LAR_C | 5.46% 2.76% 2.09% ... MED_C | 3.84% 0.44% 1.38% ... SM_C | 4.12% 0.92% 1.49% ... SUP_C | 3.54% 0.00% 8.30% ... --------------------------------------------------------Total | 16.96% 4.12% 13.26% ...
Dicing | state | | MID_ATL NEW_ENG NOR_EAS | -----------------------------------------------------------------50000~60000 | 12.26% 13.69% 25.96% | 60000~70000 | 10.93% 7.13% 18.05% | 70000~80000 | 10.52% 14.83% 25.35% | 80000~90000 | 4.89% 9.56% 14.45% | 90000~99999 | 2.79% 13.40% 16.19% | -----------------------------------------------------------------MED_CITY | 41.39% 58.61% 100.00% |
pop92
pop92
Imran Khan IBA-FCS City Campus
pop92
Slicing | state |MID_ATL NEW_ENG NOR_EAS | --------------------------------------------------------LAR_C | 11.72% 8.56% 20.28% MED_C| 7.76% 10.99% 18.75% SM_C | 8.34% 6.11% 14.45% SUP_C | 46.52% 0.00% 46.52% --------------------------------------------------------Total | 74.34% 25.66% 100.00%
|
| | | | |
Data Warehouse Design Process
Top-down, bottom-up approaches or a combination of both
Top-down: Starts with overall design and planning (mature)
Bottom-up: Starts with experiments and prototypes (rapid) From software engineering point of view
Waterfall: structured and systematic analysis at each step before proceeding to the next
Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn around Typical data warehouse design process
Choose a business process to model, e.g., orders, invoices, etc.
Choose the grain (atomic level of data) of the business process
Choose the dimensions that will apply to each fact table record
Choose the measure that will populate each fact table record
Imran Khan IBA-FCS City Campus
A practical approach (blend of top down & bottom up) The steps in this practical approach are as follows: 1. Plan and define requirements at the overall corporate level 2. Create a surrounding architecture for a complete warehouse 3. Conform and standardize the data content 4. Implement the data warehouse as a series of supermarts, one at a time
Imran Khan IBA-FCS City Campus
Imran Khan IBA-FCS City Campus