Managed Query Processing within the SAP HANA Database Platform

0 downloads 0 Views 734KB Size Report
chies. Moreover, SAP HANA also provides developers ... of the database software development lifecycle are sup- ported ... ditional–text book style–software architecture stack is ... SQL as the query language represents the inter- ..... Education_SHINE_en.pdf for the setup procedure. .... tion servers and Netweaver solutions.
Noname manuscript No. (will be inserted by the editor)

Managed Query Processing within the SAP HANA Database Platform Norman May · Alexander B¨ ohm · Meinolf Block · Wolfgang Lehner

the date of receipt and acceptance should be inserted later

Abstract The SAP HANA database extends the scope of traditional database engines as it supports data models beyond regular tables, e.g. text, graphs or hierarchies. Moreover, SAP HANA also provides developers with a more fine-grained control to define their database application logic, e.g. exposing specific operators which are difficult to express in SQL. Finally, the SAP HANA database implements efficient communication to dedicated client applications using more effective communication mechanisms than available with standard interfaces like JDBC or ODBC. These features of the HANA database are complemented by the extended scripting engine–an application server for server-side JavaScript applications–that is tightly integrated into the query processing and application lifecycle management. As a result, the HANA platform offers more concise models and code for working with the HANA platform and provides superior runtime performance. This paper describes how these specific capabilities of the HANA platform can be consumed and gives a holistic overview of the HANA platform starting from query modeling, to the deployment, and efficient execution. As a distinctive feature, the HANA platform integrates most steps of the application lifecycle, and thus makes sure that all relevant artifacts stay consistent whenever they are modified. The HANA platform also covers transport facilities to deploy and undeploy applications in a complex system landscape.

SAP SE, Walldorf, Germany & Technische Universit¨ at Dresden, Germany E-mail: [email protected], [email protected], [email protected], [email protected]

1 Introduction Database systems are at the core of almost every business-critical application. SQL as the query language and interface between applications and the data management layer reflects the traditional way of coupling the database and the application. However, modern application development suffers in two aspects within this context. First, especially the development productivity of data-intensive applications is often reduced by the complexity and the limited expressiveness of traditional SQL language concepts and demands the support or even deep integration of domain-specific languages. Second, since applications are getting more and more complex with many different artifacts sprinkled over the complete software stack, a comprehensive support of the software development process in terms of lifecycle management (deployment, migration etc.) is missing. In this paper, we outline how we address these two issues and call this this managed query processing approach. This approach goes beyond traditional SQLbased query processing approach because 1) all phases of the database software development lifecycle are supported, and 2) it provides database application developers with a more fine-grained control over the semantics of the data access layer and to tune the performance of their application. In order to motivate the SAP HANA platform including the database engine but also many other components (Section 3), we highlight the different aspects referring to Figure 1. On the left hand side, the traditional–text book style–software architecture stack is shown. The database contains schema information plus corresponding data, which requires to be persistent as well as shared with other applications in a consistent way. SQL as the query language represents the inter-

2

face to the application, which is usually running in the context of an application server. While database application developers benefit from the declarative nature of SQL, which hides many implementation details from them, it is still sometimes necessary to have more finegrained control over the logic executed on the database server. For example, passing hints from the application to the database server sometimes suffices to achieve the necessary performance characteristics. Especially looking at Big Data applications deploying sophisticated statistical analysis tasks, the SQL language is simply not expressive enough for the intended application logic. For communication purposes, the application uses APIs like JDBC, ODBC or embedded SQL to exchange commands and data with the database. In some cases, an OR-mapper hides this layer to some extent from the client application resulting in a strong coupling between the database schema and the code within the client applications.

Fig. 1 Traditional Database Application versus Applications in the HANA Platform

However, as either the application code or the design of the database schema evolves, changes are required that affect other components. Keeping the application logic and the database schema consistent across multiple versions of the application code becomes challenging, especially considering agile software development processes with frequent updates of the application code. This challenge is typically addressed in an ad-hoc fashion writing custom code or SQL scripts to setup the database schema and populate it with data, application code, and deployment logic. The versioning of the different artifacts is typically managed by an external revision control system like SVN or Git. These tools are specialized on versioning code and synchronizing changes submitted by multiple developers. However, in addition to software versioning, the transport of the complete set of artifacts of an application into different systems is usually not supported and requires again hand-crafted tools.

Norman May et al.

Even well known cloud platforms like Google AppEngine1 , Microsoft Azure2 , Salesforce Force3 , or Oracle APEX4 follow this traditional setup in their cloud platform offerings. The development tools are either webbased or linked to the cloud offering from desktop-based tools. Code revision tools, the database system, and the application server are typically hosted and managed by the cloud provider but–at the core–these tools are only loosely integrated. Developers can use popular programming languages to write the application logic, but running code inside the database kernel as advocated in [4] is very limited. With LINQ, the .Net environment offers a declarative domain-specific language that allows developers to access various data sources using a unified model [12]. Its main objective is to bridge the gap between the application code and DBMS. Together with the rest of the .NET ecosystem one can also realize the client interfaces including REST interfaces and web-based user interfaces. Hence, .NET and LINQ share the goal of having a declarative and abstract data access layer with SAP’s core data services (CDS) which we discuss in Section 5.3. As we will outline in this paper, versioning as well as transporting a consistent image of a large application with many artifacts distributed within an application stack is neither a trivial task nor properly supported by traditional database or software development tools. We therefore present the development infrastructure of the HANA platform as it is available with HANA SPS09. With this infrastructure we want to remedy the issues outlined above. As shown on the right hand side of Figure 1, a core feature of the HANA platform is that it does not only include components to manage data, i.e. the SAP HANA database as the core database server, but it also tightly integrates components to host database applications (the extended scripting engine, XS engine) and the tools to develop and deploy such applications (the HANA repository, modeling and development tools). We will discuss these main extensions in Section 3. The HANA platform includes the HANA repository as development infrastructure that guides the developer in the development process and makes sure that all development artifacts stay consistent. Whenever a development artifact is activated (i.e. committed and compiled), this infrastructure automatically reactivates all dependent objects and thereby avoids that inconsistent application logic becomes visible to other developers. Another feature of the HANA repository is 1 2 3 4

https://cloud.google.com/developers/ https://azure.microsoft.com https://developer.salesforce.com/ https://apex.oracle.com/

Managed Query Processing within the SAP HANA Database Platform

3

2 Example Application

Fig. 2 HANA EPM Web Application

the ability to organize development artifacts into packages (which introduces a namespace for development artifacts) and to assign packages to delivery units. Delivery units are reminiscent of libraries as a delivery unit shall include a set of development artifacts that comprise a complete database application or a largely independent subset thereof. Also, the delivery units are the entities for transport of development artifacts in a complex system landscape. We will discuss the repository as the backbone of the development infrastructure in Section 4. The HANA platform offers various interfaces to the SAP HANA database which provide different trade-offs of performance, flexibility and abstraction. For example, modeling SQL views is the most declarable way to define a view on data stored in the SAP HANA database. Analytic views and Calculation views offer a far more fine-grained control on the semantics and execution of complex data processing tasks. For example, calculation views expose non-relational operators to the application and allow developers to link them to data flow graphs; this is difficult to achieve with pure SQL. In addition, applications may interact with the HANA platform at various abstraction levels starting with SQL or web-oriented consumption channels like REST-based or HTTP-based interfaces to the user interface of an application. In Section 5 we will describe these alternatives in more detail. With Section 6, we will conclude the paper with an outline of our planned future work in this particular context.

To illustrate our requirements, we consider the webbased EPM (enterprise procurement model) application shown in Figure 2. This application is used by SAP to illustrate the features of the HANA platform and its development model by showing a realistic application in the sense that it uses the same EPM model as it is used in the SAP Business Warehouse. Having a simple and personalized web-based user interface it also shares the design principles of the SAP HANA Fiori applications [1]. The SAP Fiori applications make core business functionality like confirming production orders or inventory analysis available on all kinds of web-based devices and with a unified user experience. The user interface of the sales dashboard shown in Figure 2 is realized using HTML and the JavaScript library SAP UI5 which offers a number of UI widgets typically used in SAP applications. Evidently, the dashboard gives an overview of sales activities from different angles. The content displayed in these charts (and tables of the other tabs) is exposed as an OData service for potential reuse via other HTTP clients. This OData service again wraps a HANA-based view (a calculation view) which is based on various SAP HANA tables. Clearly this application very nicely serves as an example of the layered architecture shown in the right part of Figure 1. The Web IDE shown in Figure 3 is the main interface for developers of web-based applications in the HANA platform. Just like the dashboard presented above, it is hosted and executed using the XS-engine of the HANA platform. The left hand side of the Web IDE is used to access the development artifacts which are organized in a hierarchical fashion. Different editors are available to edit table or view definitions, JavaScript code, or to declare a decision table that defines some business rules. In the current example, the right hand side in Figure 3 shows the editor for JavaScript code. 3 Components of the HANA Platform The focus of this section is to describe the main components of the HANA platform that complement the SAP HANA database [6] which naturally reflects the core of the overall HANA platform. As shown in Figure 4, the main building blocks of the HANA platform–from an application development perspective–are the HANA Database Platform and the extended scripting engine (XS engine). As described in detail in [6] the HANA database platform consists of multiple data processing engines. For relational data, the column- and the row-oriented

4

Norman May et al.

Fig. 3 HANA Web IDE

OnPremise

ABAP

SAP Cloud

XS-Engine HTML5

Data Platform

HANA Platform

“Optimized for” (e.g. FAE, FDA)

Repository

CDS

SQL / SQLScript

OData

JavaScript & SAP UI5

Streaming / Hadoop

Data Processing & Engines (Column & Row Store)

Fig. 4 The Architecture of the HANA Platform

in-memory storage are the most important ones. Further supported storage engines include a graph store and a text store. These storage engines are complemented by the respective query execution engines. Of course it is possible to access data from multiple stores and process them efficiently in multiple execution engines in a single query. This versatile set of storage formats and processing capabilities builds the foundation for different language and application programming interfaces which are exposed to client applications. Applications communicate with the HANA database platform through various interfaces including SQL as the most common and well-known interface. SQL is complemented by HANA’s stored procedure dialect, SQL-

Script. Similar to SQL stored procedures, developers can use SQLScript to define imperative application logic that runs inside the database server. By executing application logic inside the database, round-trips between client application and database can be avoided and thus the overall application performance increased. Additionally, the declarative subset of the SQLScript language allows to optimize the application code; for details see [5, 8]. Other important interfaces include the support for streaming to deal with high-velocity data and Hadoop for high-volume data [11]. The support for streaming and Hadoop is fully integrated into the application development tools as we discuss them in this paper. As a common abstraction layer for all these interfaces, we have defined the Core Data Services (CDS). CDS defines a domain specific language that aims at integrating the capabilities of various database application interfaces (like SQL, HANA-specific view definitions, or SAP-specific APIs) in a unified data definition and query language. In particular, the CDS is used for defining semantically rich domain data models which can be further enriched through annotations (e.g. associations between entities) or higher level programming language constructs (e.g. actions, event handlers, or

Managed Query Processing within the SAP HANA Database Platform

constraints) when embedded into a programming language environment. The CDS Query Language (QL) is used for conveniently and efficiently reading data based on CDS data models as well as defining views within the CDS data models. The CDS QL is an extension to SQL introducing the use of e.g. associations as defined by the CDS DDL within queries. The CDS Expression Language (EL) defines the syntax for specifying, e.g. calculated fields, default values, or constraints within the queries as well as for elements in data models. The key concept of CDS is to pull data modeling as well as retrieval and processing of data to a higher semantic level close to the conceptual thinking of domain experts. This is done by carefully extending the relational model and language of standard SQL by entities with structured types (e.g. address) containing elements with: – Custom-defined or semantic types instead of being limited to primitive types only. – Associations in the data model, replacing joins by simple path expressions in queries. – Annotations to enrich the data models with additional metadata, e.g. for OData. – Expressions used for, e.g., calculations–not only in queries but also in data models. The HANA database platform also hosts the HANA repository. As we will describe in detail in Section 4, the repository is a set of database tables that stores the application development artifacts. The repository also implements the operations needed during the lifecycle of a development process, i.e. the creation, activation (and compilation), deployment, import/export, transport, and undeployment. All these lifecycle operations are exposed via a REST interface. Some application artifacts, e.g. HTML5 pages, JavaScript code or images are deployed as resources in the HANA extended scripting engine (XS engine). Like most other application servers, the XS engine implements a stateless request-response interaction to HTTP clients. While simple resources like images are handled immediately by the Request Handler, complex requests are forwarded to Resource containers. The XS engine also offers a C++-container, mainly for built-in domainspecific logic. Server-side application logic is usually processed by the JavaScript container. In addition to these containers, the XS engine also provides various services to manage and execute complex web-based applications including authorization, a session manager for stateful applications, or a job scheduler. Some of these platform services directly access database kernel logic, e.g. the authentication and authorization component of the HANA database platform. Especially for data-oriented services, the XS engine natively imple-

5

ments OData which exposes data as resources and the basic HTTP operations on these resources. While current deployments of the HANA platform mainly focus on on-premise deployments, e.g. hosting Fiori applications on top of SAP ERP data to mobile devices, cloud-based deployments of the HANA platform are part of the SAP HANA Cloud Platform offering which is becoming increasingly popular for large applications.5

4 Principles of the SAP HANA Repository The purpose of the HANA repository is to provide application developers and modeling experts with the necessary infrastructure to store, manage, deploy, and transport their artifacts, e.g. the SQLScript procedures.

Fig. 5 Packaging and Object Structure for Delivery Unit

Content

4.1 Object Structure and Content Hierarchy The repository represents all content it manages as a hierarchy of three components (depicted by Figure 5). Conceptually, all models, procedures, or other application content stored in the repository are represented by a corresponding object. An object is of a specific type (e.g. a SQLScript procedure) and may have any number of references to other objects. These references can be used to express dependencies between objects (e.g. a procedure operates on a specific table) and are used during activation (see below). Packages are used for grouping multiple objects together, thereby forming a logical unit that is meaningful from an application point of view. For example, a SQLScript procedure, together with some table and view definitions might be combined in a corresponding package. Packages can also be nested, so that several sub-packages (each of them with self-contained functionality) can be part of the same comprising package. Packages introduce the concept of a namespace, i.e. within a single package the name of an object must be unique, but the same name may be used for an object in another package. 5

https://hcp.sap.com/

6

Norman May et al.

Fig. 6 Content Editing and Activation

Delivery units allow to group multiple packages into a single logical container. While packages cluster objects with the purpose of creating (nested) groups of self-contained functionality, the goal of delivery units is to identify (groups of) packages that provide a complete application that can be deployed on a system to deliver specific functionality to the end users. Delivery units also serve as containers to transport objects from one system to another one. The application shown in Figure 2 is delivered in a single delivery unit (HCO DEMOCONTENT).6 This delivery unit contains altogether 53 packages with 428 objects. The structure of these packages is shown on the left hand side of Figure 3. The root package of the EPM application is sap.hana.democontent.epm and contains sub-packages data with table definitions and data shipped with the application, functions with definitions of some scalar user-defined functions, models with view definitions, or admin with the administration GUI including JavaScript code.

4.2 Content Management and Development Lifecycle Based on the hierarchy of objects, packages, and delivery units discussed above, the repository assists developers both during application development, testing, and in their deployment to the production systems. Figure 6 illustrates these different development phases: The application is developed using front-end tools such as the HANA Web IDE. Whenever hitting the save button, the current version of the object is persisted to the repository. When the object is saved for the first time a new inactive version of the application object is created. This version is private to the workspace of the 6

The application is part of the standard HANA delivery. See http://help.sap.com/hana/SAP_HANA_Interactive_ Education_SHINE_en.pdf for the setup procedure.

user, similar to a local working copy of the repository in SVN or Git. Subsequent changes update this private copy. To test-drive the current version, a developer can choose to activate it. The activation process can be compared to the compilation process for a Java or Cbased application. Depending on the type of the objects to be activated (e.g. a SQLScript procedure or a table definition), a corresponding runtime of the repository is called, this runtime deploys the object to the HANA system. For example, the activation process creates the corresponding tables for a table definition in the HANA database, or creates or updates SQLScript procedure definitions in the metadata catalog. All objects handled during activation are checked for both syntactical and (with respect to existing schemabased constraints) for semantic correctness. Within this process, the object references–as discussed in Section 4.1–are used to verify that all objects an object refers to are already present in the system (e.g. the tables a procedure operates on). As multiple objects are activated with a single repository call, the system also uses references to compute the order in which objects need to be activated. If the new version of an object that was just activated does not meet the expectations of the developers, the repository aborts the transaction associated to the activation, and thus allows rolls back to the previous active version. This mechanism both allows developers to be more courageous in trying out changes in development systems. In addition, the repository has a function to generate a previous software version in production systems from the activation history if an unwanted behavior is detected late. Activated versions of an object also become visible to all other developers. This also implies that conflicts between concurrent changes of developers must be re-

Managed Query Processing within the SAP HANA Database Platform

solved. As indicated in Figure 6, we use an optimistic concurrency control mechanism based on timestamps as follows: Whenever a clients wants to modify an active version, it increments the version number (i.e. timestamp) yielding the next expected version for the active version of this object. When a client wants to activate an inactive version, we compare the version of the current active version and the inactive version that is supposed to be activated. If this new version is not exactly the next version, there is some inconsistency of the version used by this client, and the activation is aborted.

7

4.3 Architecture and System Integration The repository functionality is integrated into the HANA database kernel and can be accessed by external tools and clients using a REST-based interface. Internally, the repository consists of the four major components depicted in Figure 7.

Persistence and Runtime Plugins

The persistence component is responsible for storing all the artifacts handled by the HANA repository as well as corresponding metadata information in a reliable and efficient manner. All objects are stored in a corresponding column store table, with their content (e.g. the textual representation of a SQLScript procedure) represented by a large object field (LOB). Similarly, information about packages, references between objects, and delivery units, are reflected by corresponding columnar tables. This results in a generic persistence of development objects and allows us to support yet unknown types of objects. As the set of features and capabilities of the SAP HANA platform grows constantly we need to frequently add new object types to the repository. For example in the SPS09 release support for the native streaming capabilities have been added. To address the requirement to add new object types with only minimal imA common problem with database applications is pact on the system architecture, the repository defines dealing with schema evolution. In the repository, we an interface for runtime plugins. Each runtime plugin can handle simple schema changes automatically, e.g. is responsible for the deployment of one or more object adding columns. Complex schema changes, e.g. refactypes. Object-specific runtime plugins implement this toring a table structure, must be handled by the appliinterface and contain specific code that interprets the cation developer. Consequently, the application can still content of all objects of a certain type. More specifiprocess changes as long as schema changes can be autocally every runtime plugin has to implement a method matically handled by the repository. But the developer to derive dependencies of an object to other objects. For can not only define the structure of a table or view but example, given the definition of an SQL view the runit is also possible to define data (as CSV files) as part time plugin responsible for SQL views will derive the of the application content, and this data is loaded into names of all referenced tables or views. However, unthe database during activation. Typically this is done like the database catalog the view definition may also with configuration or master data. In these scenarios, refer to objects in the SAP HANA repository. Furtherthe developer can more freely deal with changes to the more, every runtime plugin implements a method which database schema. is called for the deployment of an object during the activation process. This method transforms the definiToday, the activation process basically checks the tion of a development object into an executable form. structural correctness of the application. For executable For example, the definition of a procedure is parsed development objects like database views or JavaScript and checked for syntactical and semantic correctness, code also static checks are performed. This means that the SQL statements to create, alter or drop the procefunctional and performance testing are realized as separate applications. We typically rely on Jenkins for schedul- dure are generated and executed, and finally repositoryinternal metadata is maintained. As a result, whenever ing tests and Selenium for testing the web-based user adding support for a new object type to the repository, interfaces. Dealing with automated testing is part of it is sufficient to provide a new runtime plugin, and no our ongoing work. To conclude the description of our development lifecycle we compare our development lifecycle with other approaches and development tools. The version management of the repository is simpler than tools like Git or Perforce as it only works with one main branch that contains all globally visible changes. We call these versions the active versions of a development object. This matches the best practices advocated by the continuous integration paradigm where everybody continuously integrates into one mainline branch. Development can still proceed in parallel because changes are applied to local copies (called inactive versions). During activation we apply the changes of one or more objects in an atomic operation. In our experience this simple approach to version management works well because conflicts are rare.

8

Norman May et al.

Fig. 7 Key Components of the HANA Repository and their Interaction

changes to the persistence or other core components are necessary.

version in case of unwanted consequences (e.g. a misbehaving application), which is especially useful in case of emergency on production systems.

Versioning and Activation

Transport

Whenever a set of objects is to be activated, the repository performs three major steps. First, the reference information between objects is used to topologically sort the objects in order to make sure that all dependencies that can be resolved are handled, i.e. that dependent objects are only activated after the objects they depend on have been handled before. We choose a topological order that clusters objects by runtime plugin to reduce the overhead of calls to each runtime. The dependencies could also be used to parallelize the activation process, but we do not exploit this opportunity yet. Second the type-specific runtime plugins are called for each object. By delegating to the corresponding plugin, the activation code does not need to contain objectspecific information, thereby separating the overall activation workflow from object-specific handling code. Finally, depending on the outcome provided by the runtime plugin, activation either exchanges the current active version of an object (if any) with the new version provided by the plugin, or, in case of errors, aborts the activation process and rolls back to the previous state. During successful activation, the old version of an object is also preserved. This allows to roll back to the old

Once an application has reached a mature state and should to be rolled out from the development to the production system, the repository provides the infrastructure to do so based on the delivery unit concept. The transport subsystem of the repository allows to export the delivery unit to a single, self-contained file that can be transferred and imported to a destination production system. Technically, delivery units are tgzarchives that contain a manifest file, a set of objects and package definitions and dependency information of these objects–all serialized to files. At the target system, delivery units are imported as a set of inactive objects and can be activated (see above), thereby replacing a previous version of the same functionality (if any). Applications developed with SAP’s proprietary development language ABAP may require certain versions of view definitions or SQLScript procedures maintained by SAP HANA’s repository. For these scenarios the transport mechanism of the repository described above is also integrated with the ABAP change and transport system (CTS). Delivery units are attached to an ABAP transport and imported after the ABAP objects contained in that transport.

Managed Query Processing within the SAP HANA Database Platform

5 Basic Querying and Result Set Delivery Mechanisms After outlining the deployment capabilities within the HANA platform in the previous sections, we now focus on content (or model) creation as well as the querying and manipulation side of the data platform.

5.1 (Almost) Relational Interfaces and Adapters Users of the HANA platform interact either in the traditional way of using (declarative) query languages or using graphical modeling and development tools (see Section 3). With the HANA database at it’s heart, one of the major roles of the HANA platform is to serve as the relational storage backend for SAP’s application servers and Netweaver solutions. In this scenario (and also other, classical three-tier deployments), standard relational database interfaces such as SQLDBC, ODBC, and JDBC are used to connect the application server to the HANA database system. To facilitate the integration and enable more complex business scenarios, the HANA platform provides different options to pre-define (or model) application artifacts and provide a wrapper to allow the consumption of those pre-defined models. For example, nonrelational functionality such as currency conversion or complex analytical functions (as provided by the HANA predictive analytics library [3]) can be called by corresponding procedure calls, with the input and output parameters given in the form of (temporary) tables. Therefore, the tabular structure and the relational model build the foundation of the HANA query processing and modeling framework. In addition, HANA also supports query languages beyond classical SQL to allow a better support of business applications in complex scenarios, e.g.: MDX: The MDX language is used for pre-defined OLAPscenarios especially considering the role of a dimension or fact table in a data warehouse environment [2, 15]. With MDX one can define complex query constructs which are resolved into multiple SQL statements in traditional ROLAP-scenarios. The tight integration of MDX into the HANA engine enables a direct mapping of MDX expressions to internal query plans. WIPE: The language WIPE reflects a proprietary language to process property graph structures stored within the HANA platform [14]. Similar to MDX, a WIPE statement is compiled into internal query plans additionally exploiting special graph processing operators like graph traversal. WIPE shows a

9

declarative nature but is fully aligned to the underlying property graph model. R: The HANA platform provides an interface to an instance of an R server allowing R scripts being part of a regular HANA user request. The script will be forwarded to the R server and processed against data sets sitting in the HANA engine. An efficient coupling mechanism as illustrated in [7] is the basis for executing R scripts against database content. SQLScript: In addition to plain SQL DML and DDL commands, SQLScript [5] offers a procedural language to express application logic within the database engine. In addition, SQLScript serves as a glue language for query patterns expressed in R or WIPE and provides a mechanism to process result sets in a (beyond) relational form as outlined in the following. As already discussed, the HANA platform provides a transportable and extensible way to define a database schema. Like in any other relational database environment, the SAP HANA platform allows developers to define tables, roles, and privileges to access objects in the database schema. SAP HANA also offers synonyms which serve as an alias to access a catalog object which potentially resides in a different database schema. Of course, privileges are checked for the referenced object.

5.2 (Beyond) Relational Result Sets Besides specialized query languages and data models, a tight integration of application server and data management platform provides room for optimization that goes beyond what traditional interfaces like JDBC or ODBC can provide. In the context of the HANA platform, this includes–but is not limited to–having multiple result sets for a single query, the ability to directly send an externally assembled execution plan to the HANA runtime system, or use optimized data transfer for large result sets or query parameter set, as for example for BW’s formula element selection queries [13]. Other examples include the necessity to transfer multi-dimensional results (based on MDX statements) for Business Intelligence tools (e.g. cube structures) or to avoid complex transformations and post-processing on client side, specifically in light-weight applications using the HANA XS engine. Figure 8(a) shows a typical example for a result set to be visualized in a client application that is hard to express in SQL in an efficient manner and without redundancy: Here, we want to list all employees with a salary of more than 100.000 EUR, grouping them by location (with a total count by location). Such a (hierarchical) visualization is very common in

10

Norman May et al. Walldorf 3 Frederik Kurz Daniel Ducati Jessica Franz Seoul 2 Chulwon Song Taehyung Kim (a) Non-Relational

130.000 EUR 170.000 EUR 230.000 EUR 190.000 EUR 270.000 EUR Result Representation

Walldorf 3 Walldorf 3 Walldorf 3 Walldorf 3 Seoul 2 Seoul 2 Seoul 2 (b) Relational

NULL NULL Frederik Kurz 130.000 EUR Daniel Ducati 170.000 EUR Jessica Franz 230.000 EUR NULL NULL Chulwon Song 190.000 EUR Taehyung Kim 270.000 EUR Representation with Data Redundancy

Fig. 8 Alternative Result Representations

BI clients, often combined with a tree-like control that allows to collapse some parts of the result set for a better overview. Using a standard interface like JDBC, the result would need to incorporate redundancy to adhere to the relational model, thereby making data transfer more expensive and requiring post-processing on client side (see Figure 8(b) for an example). For this purpose, besides standard database interfaces, HANA incorporates proprietary extensions for BI and BW solutions that allow for efficient query formulation and result consumption.

5.3 (Beyond) Relational Model Definition In addition to traditional DDL mechanisms to define database objects, SAP HANA provides a fairly large number of different methods to create data objects often referred to as HANA models. From an user interface perspective, the HANA platform offers a web-based development environment (SAP HANA Web IDE) which is complemented by a web-based monitoring tool and an eclipse-based administration and development tool (SAP HANA Studio). Based on this tool set, the HANA platform offers multiple ways to define views over a set of other database objects. Consistently to database design methods, views shield applications from the physical database schema, serve as entities to define privileges in a consistent way, and are used to create reusable artifacts for database applications. However, in addition to standard SQL views, SAP HANA offers further types of views which give a more fine-grained control over the execution of these views to the developer. For example, an analytic view specifically addresses queries performed on snowflake schemas [10]; calculation views allow users to model acyclic data flows that contain relational operators but also include non-standard database operations like unit conversion.7 7 The most important special case in ERP systems is the currency conversions, i.e. interpreting decimal values as Euro and converting them given a specific date to US dollars. Interestingly in the context of a business application, this is

Approach SQL With calculated columns With analytic view Fully modeled

Execution Time (in s) 13.0s 10.7s 8.2s 2.9s

Table 1 Performance impact of modeling

In Table 1 we illustrate the performance improvement possible with these fine-grained modeling capabilities using a scenario from the domain of high-energy physics [9]. In that application we implement the typical analysis workflow used in high-energy physics where particle physicists employ Monte-Carlo generators to produce millions of simulated events and compare them with the results of the detector experiments. Going from top to bottom in the table we model more and more parts of the analysis explicitly using views and other advanced features of the SAP HANA query processor: SQL: Execution with one complex SQL query in the SAP HANA database. The SQL statement contains complex arithmetic expressions, correlated nested queries and multiple query parameters. With calculated columns: SQL execution using precalculated columns instead of expression evaluation. This is beneficial because the experimental data is almost never updated after it was loaded into the SAP HANA database. With analytic view: SQL execution using calculated columns and an analytical view for the most-expensive inner-most computation. Fully modeled: Fully modeled SQL execution using nested analytical and calculation views. The runtimes summarized in Table 1 clearly show how the HANA modeling capabilities help to reduce the overall query runtime. The fastest alternative using SAP HANA is about 4.5 times faster than the pure SQL-based implementation. Figure 9 shows how business rules can be defined on top of HANA tables or views using a decision table. The intuitive interface allows even non-experts to define not only a semantically complex operation, but it is also very performance critical.

Managed Query Processing within the SAP HANA Database Platform

11

Fig. 9 Graphical Interface to Model Business Rules using a Decision Table

such business rules. Upon deployment the business rules are compiled into the same runtime representation as it is used by calculation views. In a similar fashion, editors –either eclipse- or Web-based– are available for other artifacts, e.g. SqlScript procedures, table definitions, or user privileges. These editors follow the common workflow for creation, updating, deploying and transporting development artifacts. For each of the development artifacts mentioned above, the eclipse-based SAP HANA Studio offers a dedicated user interface to create, verify, deploy, and delete the artifact. Depending on the type of artifact either a graphical, form-based or textual interfaces is available. Graphical interfaces are mainly used for working with view definitions; form-based interfaces are available for defining highly structured development artifacts like roles, users or tables. Like the graphical interfaces, the form-based interface relieves the user from knowing all details of the underlying syntax, and hence makes this functionality more easily accessible to casual and business users. Having the graphical definition of HANA views on the one side, HANA also provides a more expressive data modeling as well as querying language, CDS (Core Data Services), in order to ”model” more complex business objects structures beyond pure relational tables. In a nutshell, CDS mainly introduces structured types and

associations as first class citizens for developers [2]. Supported types consist of ”built-in” primitive types (like string, integer, decimalfloat, date) and custom-defined simple and structured types. For example type D e r i v e d : S t r i n g ( 1 1 1 ) ; type AddressType : S t r i n g ( 7 ) enum {

home ; b u s i n e s s = ’ b i z ’ ; } type S t r u c t u r e d {

descr : Derived ; // r e u s i n g a custom −d e f i n e d

type

amount : Decimal ( 1 0 , 2 ) ; // c a l c u l a t e d

element

grossAmount : Decimal ( 1 0 , 2 ) = amount ∗ ( 1 . 0 0 + t a x r a t e ( ) ) ; k i n d : AddressType d e f a u l t home ; }

Based on the type system, CDS allows the definition of entities to define the objects to be persisted plus associations between entities including cardinality annotations. For example e n t i t y Address { // can b e u s e d f o r :m a s s o c i a t i o n s owner : A s s o c i a t i o n to Employee key s t r e e t A d d r e s s : S t r i n g ( 7 7 ) ; key zipCode : S t r i n g ( 1 1 ) ; c i t y : String ( 4 4 ) ; }

12

Norman May et al.

e n t i t y Employee { a d d r e s s e s : A s s o c i a t i o n [ 0 . . ∗ ] to Address v i a b a c k l i n k owner ; // u s i n g XPath l i k e f i l t e r d e f i n i t i o n s

homeAddress = a d d r e s s e s [ k i n d=home ] ; o r g u n i t : A s s o c i a t i o n to OrgUnit ; } e n t i t y OrgUnit { name : S t r i n g ( 1 1 1 ) ; c o s t c e n t e r : String ( 4 4 ) ; manager : A s s o c i a t i o n to Employee ; p a r e n t : A s s o c i a t i o n to OrgUnit ; }

Although CDS provides many more features to enrich the data model for example with additional meta data, predicated attributes, parametrized views, we highlight the querying side of CDS models. CDS-QL leverage the enhancements provided by the data models (e.g. structured types and associations) thus improving conciseness and comprehensiveness of query statements in application code. For example, the CDS-QL statement SELECT name [ f i r s t = ’Kim ’ ] ,

salary . value , orgunit . costcenter

FROM Employee ;

database administrators can choose between eclipsebased desktop environments or web-based monitoring and development tools. Today, web-based applications on the SAP HANA platform are mainly hosted in on-premise scenarios. With the SAP HANA Cloud Platform we are rapidly extending the scope of the HANA platform towards cloud offerings where the development, deployment and consumption of web-based database applications is completely cloud-based. An advantage of the described architecture is that on-premise and cloud deployments are equally well supported. For example Fiori applications, similar to the one presented in this paper, can be developed with web-based tools, but they can still be deployed as an on-premise solution. References 1. SAP Fiori for SAP Business Suite. http://help.sap.com/ fiori (2014) 2. SAP HANA Developer Guide. http://help.sap.com/ hana/SAP_HANA_Developer_Guide_en.pdf (2014) 3. SAP HANA Predictive Analysis Library. http: //help.sap.com/hana/SAP_HANA_Predictive_Analysis_ Library_PAL_en.pdf (2014)

4. A.Blakeley, J., Rao, V., Kunen, I., Prout, A., Henaire, M., Kleinerman, C.: .NET Database Programmability and corresponds to a SQL query with an explicit join Extensibility in Microsoft SQL Server. In: Proc. SIGof tables Employee with OrgUnit for the cost center MOD, pp. 1087–1098 (2008) values: 5. Binnig, C., May, N., Mindnich, T.: SQLScript: Efficiently analyzing big enterprise data in SAP HANA. In: BTW SELECT e . name , e . ” s a l a r y . v a l u e ” , ou . c o s t c e n t e r (2013) FROM Employee e JOIN OrgUnit ou 6. F¨ arber, F., May, N., Lehner, W., Große, P., M¨ uller, I., ON e . o r g u n i t I D = ou . ID Rauhe, H., Dees, J.: The SAP HANA Database – an arWHERE e . f i r s t = ’Kim ’ ; chitecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012) 7. Große, P., Lehner, W., Weichert, T., F¨ arber, F., Li, W.S.: As can be seen, CDS-QL also introduces ”shortcuts” Bridging two worlds with RICE – integrating R into the for filter expressions and many other syntactic enhanceSAP in-memory computing engine. Proc. VLDB 4(12), ments to improve the design and usage of registered 1307–1317 (2011) data models. 8. Große, P., May, N., Lehner, W.: A study of partitioning and parallel UDF execution with the SAP HANA database. In: SSDBM, p. 36 (2014) 9. Kernert, D., May, N., Hladik, M., Werner, K.: From static to agile - interactive particle physics analysis using ana6 Summary and Outlook lytical views in the SAP HANA DB. In: DATA (2015) 10. Legler, T., Lehner, W., Ross, A.: Der Einfluss der DatenIn this paper we describe the architecture of the SAP verteilung auf die Performanz eines Data Warehouse. In: BTW, pp. 502–513 (2007) HANA platform which extends the scope of the HANA 11. May, N., Lehner, W., P., S.H., Maheshwari, N., M¨ uller, database with tools and services to develop, deploy, and C., Chowdhuri, S., Goel, A.: SAP HANA – from relahost web-based database applications. The core of the tional OLAP database to big data infrastructure. In: development platform is the SAP HANA repository, a EDBT (2015) 12. Meijer, E.: The world according to LINQ. Commun. set of services offered by the HANA database that deal ACM 54(10), 45–51 (2011) with storing development objects, versioning of those 13. Nagel, K.: BW-on-HANA and the ”FEMS”.

objects, the activation and deployment of these objects. The HANA repository is complemented by the XS engine, an application server that is tightly integrated with the HANA database and which focuses on webbased database applications that realize their serverside logic mainly in JavaScript. From a tool-perspective,

http://www.saphana.com/community/blogs/blog/2013/ 05/15/bw-on-hana-and-the-fems (2013)

14. Rudolf, M., Paradies, M., Bornh¨ ovd, C., Lehner, W.: The graph story of the SAP HANA database. In: BTW, pp. 403–420 (2013) 15. Vassiliadis, P., Sellis, T.: A survey of logical models for OLAP databases. SIGMOD Rec. 28(4), 64–69 (1999)