The benefits can be viewed by examining the three most common OLAP
techniques: multi-dimensional OLAP (MOLAP), hybrid. OLAP (HOLAP) and
ROLAP.
TECH2TECH APPLIED SOLUTIONS #1
Cubes by design ROLAP and HOLAP solutions using SAS and Teradata.
O
ne of the best-kept secrets in SAS®’s business intelligence (BI) offering is its relational implementation of online analytical processing (OLAP). SAS is well known for its powerful analytics that typically requires data to be housed in a separate SAS data set or, in the case of OLAP, inside a cube. SAS also provides the option to implement a relational OLAP (ROLAP), where the data stays in the data warehouse. The benefits can be viewed by examining the three most common OLAP techniques: multi-dimensional OLAP (MOLAP), hybrid OLAP (HOLAP) and ROLAP. (See figure 1.) In the first method, MOLAP, data is extracted from the data warehouse and aggregated into a data structure, commonly referred to as a cube, for analysis. Since the data is pre-aggregated, the response is quickly returned to the end users. The cost of this technique relates to the overhead of the tasks the BI administrators must perform. First, the data resides in both the warehouse and the cube. This means that the data must be updated and maintained in two locations. Second, since the data must go through an aggregation process when a MOLAP cube is built or updated, additional overhead is incurred. As businesses expand their analysis to include more dimensions or deeper levels of analysis, the cost and overhead of moving and replicating data into an external cube becomes a challenge for the BI administrator and IT. The next technique, HOLAP, addresses some of the challenges of the MOLAP implementation. HOLAP is a hybrid
by Michelle Wilkie and Arlene Zaima
approach in which higher-level aggregations that are commonly accessed are stored on a server and the more granular information is stored in the data warehouse. This technique was developed to enable larger definitions of cubes without affecting the cube build time. A cube designer can add details or dimensionality into the cube without increasing the overhead cost of the MOLAP cube. This provides the BI administrator the flexibility to establish the location of the multi-dimensional data depending on access frequency, administration and processing overheads. Figure 1
In the third OLAP option, ROLAP, the data stays in the data warehouse and only the metadata is stored outside the database. Each request is converted to SQL and sent to the data warehouse, where the results are retrieved and returned to the analyst. The ROLAP solution resolves the overhead problem of maintaining data in multiple locations, as well as the additional processing involved in building or updating the cube; however, if the ROLAP is not designed and implemented properly, the response time may be slow. A SAS ROLAP and HOLAP implementation can be optimally built with
OLAP techniques
The Teradata system can optimize online analytical processing (OLAP) implementations regardless of which technique is used.
PAGE 1 | Teradata Magazine | September 2008 | ©2008 Teradata Corporation | AR-5728
a Teradata Database to maximize performance. OLAP query processes are optimized through aggregate join indexes (AJIs), a Teradata Database feature. An AJI is a join index that specifies SUM or COUNTS aggregate operations across one or more tables. AJIs require no user or BI administrator maintenance and are used automatically by the Teradata Optimizer to improve ROLAP requests.
SAS cubes MOLAP, ROLAP and HOLAP cubes are all supported in SAS 9.1.3. A cube designer can define a cube using SAS OLAP Cube Studio (an easy-to-use user interface) or using code (PROC OLAP) that is then built by the SAS Workspace Server. A SAS cube comprises three parts: metadata, navigation files and the physical data or aggregation tables. (See figure 2.) The first two components do not differ with the different OLAP techniques. The metadata for a cube created in the SAS Enterprise Intelligence Platform defines information such as location of data, cube structure, cube-based security permissions and calculated measure definitions. The navigation files are used to help understand how information of the input data translates to the structure of the cube; for example, how members relate to each other and formats, member properties and captions for each member. The physical data is dependent on which OLAP technique the cube designer specifies when building the cube structure: > MoLap. Relevant SAS proprietary highly indexed aggregation tables are created and stored within the physical cube. > RoLap. All data resides in the relational database management system (RDBMS) where relational tables are optimized for low-level dimensional requests, and aggregate indexes are created for higher-level OLAP
requests. The Teradata system will determine the optimal database structure to use. > HoLap. A mix of the SAS proprietary aggregation table and relational tables will be used. This is typically dependent on the granularity and cardinalities that are present within the cubes.
Figure 2
SAS cube components
SAS OLAP Server The SAS OLAP Server has a dual role: > Security validation • Authentication of the The metadata and navigation files components do not user against the SAS differ regardless of online analytical processing (OLAP) Metadata Server techniques. However, aggregations define the physical data structure, which is unique to each technique. • Authorization and validation of what the user is allowed to see tune a cube based on end-user interac> Query engine tions or queries that have been submitted • Handles the multi-dimensional against that cube. expressions (MDXs) passed from SAS BI clients Teradata Database • Retrieves the relevant data that considerations answers the MDX query SAS OLAP cubes support three types of • Sends that data back to the clients input data: star schema, detail data or Which OLAP technique the cube is summarized tables. Star schema input based on will determine how the MDX data sources will typically give SAS OLAP query is handled and translated into the the best build performance; however, appropriate query that will be passed either the physical database design of any data to an underlying database or internally. In warehouse should reflect the customer’s the MOLAP-based cube, the SAS OLAP business, independent of any tool or appliServer spawns multiple threads internally cation requirements. to retrieve the queries from the relevant Teradata recommends an applicationcube aggregation tables. For a ROLAPagnostic data model such as third normal based cube, MDX is translated into SQL form, adhering to the best practices and queries, which are passed down to the methodology that provide an enterprise RDBMS to handle and optimize. view of the business. To implement SAS For the most optimized performance ROLAP on a normalized data model or at query time, a SAS cube requires snowflake schema, a semantic layer must aggregation tables that best meet the be built on top of the table to represent query result set. SAS provides application a star. response measurement (ARM) logs that The optimal solution is building help cube designers or administrators aggregates on top of a normalized model
PAGE 2 | Teradata Magazine | September 2008 | ©2008 Teradata Corporation | AR-5728
TECH2TECH APPLIED SOLUTIONS #1
or snowflake schema and using view semantic layer to represent a star. It is crucial that the normalized model or snowflake schema be “cleansed,” meaning there are no NULLs, data transformations are complete and data is ready for reporting. If not, then it may be necessary to build a physical semantic layer, as the Teradata aggregate approach described above will not work on “uncleansed” normalized data. If a physical semantic layer is required, it is recommended to implement a snowflake schema that is populated by INSERT/SELECTs from the normalized model in the Teradata Database. The INSERT/SELECTs would be defined so that they perform the data-cleansing tasks to result in a snowflake schema that is ready for reporting. Views would then be created to present the star schema for SAS. Teradata AJIs can then be built on top of the snowflake schema to increase OLAP performance.
A ROLAP solution The advantages of a ROLAP solution include: > Only metadata and navigation files are created, resulting in fast build times. > Data management remains within the RDBMS, not within the cube. A ROLAP-based cube lets the RDBMS handle the SQL and optimization, which is dependent on implementing an AJI feature. This is the preferred method. When defining the SAS cube structure, the cube designer needs to be aware of the second-to-last window in the SAS Cube Designer Wizard where the box Do not create an NWAY must be checked. This fully summarized table, composed of all crossing of the levels defined in the cube, is equivalent to the PROC OLAP option NO_NWAY.
Better, faster analysis SAS and Teradata naturally complement each other with a powerful and flexible solution for BI administrators. When organizations consider an OLAP technique, the
PAGE 3 | Teradata Magazine | September 2008 | ©2008 Teradata Corporation | AR-5728
choices are extended with an accelerated SAS ROLAP solution with the Teradata Database. This combined solution enables businesses to analyze data at the “speed of thought” with the breadth and depth of analysis only provided by an integrated solution. Now you can make the ideal choice to best meet your expanding business needs. T Michelle Wilkie, a product manager with SAS Institute Inc., supports research and development teams developing OLAP products. Arlene Zaima, a strategic intelligence program manager at Teradata, has more than 10 years of experience in advanced analytics.
T
Online
For more information, read “An added dimension” on TeradataMagazine.com and visit support.sas.com for more information on the SAS OLAP Server.