Database Access With automated MVC, a New Web Services Model H. Paul Zellweger
Jason Thompson
ArborWay Labs Rochester, MN 55901 USA 617.864.1040
ArborWay Labs
Rochester, MN 55901 USA
617.864.1040
[email protected]
[email protected]
ABSTRACT
General Terms
The paper introduces a new web services model for creating database applications on the internet. The automated MVC redirects web services away from today’s focus on object-oriented languages to take full advantage of data patterns in the RDBMS. In effect, it shifts our attention away from all languages, including Java and SQL, to a newly discovered pattern of data relations in the database called the Aleph. This simple parent/child data relationship, until now, has remained hidden from view. Its discovery takes database applications into a deeper level that is entirely data driven. The automatic relational technology or ART Studio, our authoring system, models the Aleph to generate the two essential parts of every database application: 1) end-user access to information in the database, and 2) a windows interface that displays this information. The Studio adheres to the “configuration, no coding” principle. The Aleph is a compact, yet remarkably powerful data object. It embodies an IF-THEN logic between parent and child data; it also represents a tree structure. These properties enable the Studio to generate decision trees automatically. End-users navigate these trees to pinpoint information in the database. The Studio also uses the Aleph to create a window interface using a WYSIWYG editor that transfers data to and from one or more database tables to its fields. In effect, this window can display, create, and update table data on more than table without using any JOIN operations. To distribute its decision trees and windows, the Studio generates files that encode Aleph data and its models. Internally, these files use its parent/child structures in a dual capacity: 1) lists of child data serve as options in its decision trees, and 2) embedded pointers on each child link to the next state, either another list or a window interface. Therefore, the Studio not only automates the construction of the model, but aspects of the controller as well. So far, the Studio’s visualizations include nested lists of data topics, known as a “Database Taxonomy,” and geography maps. To expand this new web database service, an integration tool for other types of visualization is underway. By operating directly at the data level, the ART Studio automates database access on the web; no coding is ever required; and the relational-object mismatch is averted altogether, saving time and expense.
Languages, Theory.
Categories and Subject Descriptors • Information systems~Relational database model Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference’10, Month 1–2, 2010, City, State, Country. Copyright 2010 ACM 1-58113-000-0/00/0010 …$15.00.
Keywords Aleph data relation, named set theory, dbms, database interface, MVC, web services.
1.
INTRODUCTION
We begin our description of our proposed alternative to an MVC implementation by briefly discussing some of the fundamentals of today’s MVC design pattern. There are three main components to the MVC design pattern in client-server-like web applications of database content, as shown in Figure 1 below. The pattern’s
objective is the separation of concerns [1]. Thus developers implementing the view can focus solely on the user interface, and not on the application’s business logic and data management. And developers of the model layer do not need to be concerned about how the data will be rendered. There is optionally a fourth component, so-called the object-relational mapping layer that is discussed below. In general – and within the context of web applications - the model layer is implemented with an object-oriented programming language, such as Java[2]. The objects in the model layer encapsulate the application’s business logic and usually interface with some sort of relational database[1]. Data in objects in the model layer and data in a relational database are handled in conceptually different ways. Relational databases separate data and data behavior. Object-oriented programming languages encapsulate, and therefore combine, the data and its behavior [3]. This difference is the so-called object-relational impedance mismatch [2], and for the sake of brevity, the details of this mismatch are omitted here.
a pure mathematics developed by Mark Burgin called named set presented in the next section. This mathematical theory was not only instrumental in isolating the Aleph’s uniform pattern, but in developing models that could represent tree structures and data networks. In the taxonomy interface, these data relationships represent decision trees that enabled end-users to “browse and explore” data throughout the database going from one table to another. The authoring system for the taxonomy interface eventually evolved into the automatic relational technology (ART) Studio; the database application tool described in Section 4.
There are several techniques available to provide object persistence in the model layer. It can be realized with low-level SQL queries that are hand-written in the object-oriented framework and passed on to the relational database via a driver, such as the open database connectivity (ODBC) drivers [3]. More advanced approaches use object-relational mapping (ORM) to abstract the details – to varying degrees – of SQL queries, and thus shield the developer from these sorts of concerns when implementing the model layer using an object-oriented programming language [3]. Technologies, such as Hibernate in Java [4], provide a means to configure the mapping of objects in the model layer to entities in the relational database. There are several advantages to the an ORM approach, including the ability to map to any sort of relational database. Moreover, because the mapping of objects to entities is configured, there is a reduction in code to interface the model layer with the database [3]. However, it still requires developers to have a good understanding of the schema and structure of the database. And even with the availability of higher-level query languages, such as Hibernate’s HQL [5], the developer will still be tasked with generating SQL to persist objects in the model layer. The fundamental objective of this new web services model is to automate the construction of the model and its controller. It also eliminates the need to construct any SQL queries by hand that would be needed to access the database. And lastly, it reduces any design and/or coding to custom-build visualization options, should the existing coding-free visualizations not be sufficient. To achieve these objectives, the new automated MVC model opens up an entirely new area of database research, namely modeling relational data. Before Zellweger’s discovery of the Aleph data relationship [6], this research area was an unknown frontier. To make this point, he argues that tree structures on the computer, like their counterpart on paper, are recursively defined by the Aleph, a mechanically constructed parent/child data relation [7]. This study of data relations started with a visualization tool called the “Database Taxonomy” [8][9]. Its end-user interface is depicted in Figure 2. Using algorithms on models of data, this end-user interface transforms relational data into nested lists of data topics. End-users navigate down its menus to pinpoint information in the database. The taxonomy interface was the first application to model the Aleph data relation. Models of the Aleph evolved from
Historically, the database research community is acutely aware of the lack of progress with the database interface [p. , 10]. They report that the last major advance was with Query By Example (QBE) [11], an database interface first developed over four decades ago. Since then this interface has evolved into the visual query model [12]. Unfortunately, however, its underlying technology is based on constructing SQL statements which, when there is more than one table, include JOIN operations that cause performance problems. A second interface model, based on the “browse and search” in information retrieval [13], has recently emerged. An example of this is [14]. The advantage of this approach is that it encourages end-users to explore the content of the database. However, its interfaces rely extensively on internal models designed and built by hand to augment searches. Subsequently, this newer technology is labor intensive and costly to implement.
2. BURGIN’S MATHEMATICAL THEORY Burgin’s named set theory injects mathematical rigor into the study and structure of a name [15]. Research into the nature of names goes back to Frege’s distinction between a reference and its meaning. To adapt this idea for structures today, Burgin introduces a third component, a connection between the reference and its meaning that is implied by a name. He calls this new triadic structure the fundamental triad [p. 40, 16],
His study of names focuses on structures, in general, and on mathematical structures, in particular. In mathematics, all
mathematical structures are defined by a standard triplet. When mathematicians name a new object, they refer to this triplet as a technical reference. By introducing a connection between this triplet and its mathematical name, Burgin effectively puts all mathematical structures and their names on an equal footing. This subtle, yet enormously important adjustment creates a new universal perspective on these abstract details. Equipped with this new vantage point, Burgin has undertaken an exhaustive inventory of mathematical structures to study their patterns for new levels of abstraction. In advanced algebraic systems, for example, Burgin notes a prevalence of chaining patterns [p. 445-412, 16]. He predicts that similar chaining patterns exist in other dense mathematical settings like a relational database running on a computer1. The BZ chain presented at the end of the next section confirms his conjecture.
3. THE ALEPH DATA RELATION The Aleph’s uniform pattern was first found in individual database tables. Zellweger reduced the table to its essential components: a pair of attributes, one input and the other output, and a retrieval operation that employed them. In formal terms, this pairing of two table attributes corresponds to Burgin’s algorithmic named set [p. 42, 16]. In theory, it consists of an algorithm A, an input set X, and a set of Y outputs. In notation, it is A = (X, A, Y). In the database table, this named set models the data mapping between the input and output attributes. The algorithm or rules that binds these two attributes together is an SQL SELECT statement. Zellweger names this newly formed mathematical structure a binary attribute relation or BAR, (Ainput, rules, Boutput) where Ainput and Boutput are attributes in table R, and the rules are the algorithm that executes a SELECT statement. More specifically, the rules that bind input data to output data is a wff SELECT statement called a BAR query. Its details, listed below, explicate how data condition v self-references the input domain Ainput to return output data t. The algebra in this expression allows its unknown output to include one or more data values, regardless of the database application. ( SELECT Boutput FROM R WHERE Ainput = v ) → t where v ∋ Ainput ; t ⊂ Boutput ; Ainput ⊂ R and Boutput ⊂ R. The next model is called the B-Z chain. It aggregates the Aleph and shows that it has scaling symmetry. In its most basic form, the B-Z chain accounts for a data network in a single table. Each link in the chain depicts a binary attribute relation that models the Alephs between two table attributes, say (COLOR, SIZE). In the chain, these links are interrelated: an output attribute in one link becomes the input attribute in the next link, (COLOR, SIZE) (SIZE, SHAPE) (SHAPE, WEIGHT).
In more complex chains, adjacent links connect one table to another. In this case, both attributes are keys. This type of attribute pairing corresponds to pairs of primary and foreign keys that link two tables. At the data level, it reveals a second type of Aleph data relation called a table link. The recursive algorithm that steps through the chain, separates out display data from table links when transforming relational data into menu data for a decision tree. This recursive logic is presented in pseudocode in [p. 26, 7].
4. THE ART STUDIO This authoring system creates and manages models of the Aleph to generate database applications automatically. It streamlines application development both internally and by its overall interface design. Throughout the Studio, it deploys contextsensitive navigation forms to configure the system, its modeling capabilities, and its output. It uses “point and click” widgets throughout its interface to collect settings. In configuration forms that require complex, detailed information, its step-by-step approach guides the designer through the process by displaying only one set of pertinent options at a time. When all the sections of a form are complete, the system determines the next appropriate one in the sequence. The principle design goal of the Studio reflects the aspirations of all MVC design patterns: “configuration, not coding.” In this system, there are two levels of configuration. At one level, a database administrator or DBA grants password access to the Studio, to its external databases and their connections, and to a local data dictionary of their schema labels. At the applications level, a developer or designer uses the view of the database granted by the DBA’s configuration. He or she uses the Studio to configure the database application and to generate data files for its visualization. In the dictionary, the DBA controls access to the database at the table and attribute levels. He or she also has access to translation services in the dictionary that convert technical looking schema labels, such as “PUB_$Name,” into end-user labels that have more semantic meaning, in this case simply “Name.” Both the Studio and all of its end-user interfaces display these end-user labels to replace the original schema name. And lastly, the dictionary generates metrics on the target database, including a catalog of its table links. As mentioned earlier, the Studio uses the Aleph to automate database application development. Every database application has two essential parts. One enables end-users to locate information in the database. In this instance, the Studio uses models of the Aleph to generate decision trees. At the bottom of each decision path the interface links seamlessly to an information window, the other essential part of every application. As the reader will see shortly, Aleph models are central to their automation as well. The Studio constructs models of the Aleph by algorithm or by context-sensitive navigation forms. Regardless of its formation, the system treats each Aleph model like a Lego building block that connects to other models. The modularity of these models allows the Studio to construct decision trees that merge data topics generated by algorithm along side others added by hand. This type of decision tree is called a hybrid. Trees that only deliver “live” data at runtime are called persistent. And when the Studio
1 This prediction was made in a phone conversation between the author and Mark Burgin.
compiles menu data for a decision tree at a scheduled interval its display data is static. In the literature, generating decision trees from databases is not new. The Studio, however, breaks new ground by constructing decision trees based exclusively on data relations, notably the Aleph., as opposed to attribute relations like [17]. The Studio’s techniques take a data-centric approach because, Zellweger believes, relational data is far richer and more expressive than schema relations. Case in point, the authors contend that the Aleph is a primitive object-oriented design pattern. It encapsulates data as well as a method on this data. In particular, its parent/child data relation expresses an IF-THEN logical relationship between a parent and child data. When the Aleph is embedded in a file format, it is a spatial pattern that indicates child data always follows parent data. The Aleph is also a tree structure. Subsequently, the Studio can and does model the Aleph to automate the construction of its decision trees. With B-Z chains, these trees represent data networks that span from one database table any other in the system. More to the point, its ability model data networks in the database is unrestricted and bounded. Yet, in a nested list interface, like the one depicted in Figure 2, after keyed data is separated from display topics this data topic network functions perfectly as an interactive decision tree2.
4.1 ENDUSER ACCESS TO THE DATABASE To accommodate usability issues, including problems with scalability, the Studio generates three different types of network structures: 1) Ambient, 2) Category filtered, and 3) Index structured. Each one addresses a different end-use requirement or need: • Ambient - This decision tree enables end-users to browse and explore relational data by freely traversing its natural flow. The B-Z chain models this logical flow in a step-by-step fashion that avoids JOIN operations altogether.
techniques. No coding is required. The editor consists of a window palette that represents an approximation of an HTMLgenerated rendering of the information window. A side panel of the palette displays available field resources, including related tables and their attributes as well as site specific fields such as logos and social media icons. The dictionary and window configuration determines the availability of these resources. The designer drags an attribute off the resource panel and drops it on the palette to create a window field. These areas include text and numeric data types, as well as blobs for images and links to images. A model of the Aleph links each window field to an individual table attribute. These models support data transfer traffic to and from the field to the table attribute. They create a two-way linkage that enables our window fields to not only display table data, but to create and update table data as well. The design palette renders a responsive HTML window for a wide range of devices. Furthermore, its WYSIWYG editor includes a comprehensive toolkit. Designers can add text fields, separators, and titled panels, as well as alignment tools to improve the overall layout and appearance of a window. Most importantly, the designer can “preview” a window to see exactly how accurately HTML renders its details at runtime.
5. THE AUTOMATED MVC MODEL The ART Studio generates a set of distribution files that simplify the original MVC stack presented in Figure 1. In our automated MVC model, depicted in Figure 3 below, there is a dramatic reduction in the number of its components and in their details. One such simplification merges the model stack layer with its controller. Another simplification, brought about by distributing Aleph data objects to Model/Controller layer, is the use of uniform file formats to store its details. This, in turn, allows for generic program logic, instead of complex dedicated programs, to process end-users’ selections, and to present new data content for the next visualization state.
• Category filtered - The content of this decision tree alternates between types of data and their values. One layer depicts a series of categories derived from end-user labels in the Studio’s dictionary. Each category term, in turn, links to a list of data values in the database. This alternating pattern enables endusers with incomplete information, say they know the year and month of something, but not its day of the week, to narrow their search in a systematic fashion. End-users employ trial and error only after all known variables have reduced the search field.
• Index structured - This decision tree breaks up long, unruly lists of topics into shorter intervals to address scalability issues. The Studio can generate one or more index layers link to each range based on the index configuration, either top-down or bottom-up. This feature enables an end-user to pinpoint a particular item in a list of 1.2M entries within four selections.
4.2 THE INFORMATION WINDOW Designers use the Studio’s WYSIWYG editor to create window templates. They use only “point and click” and “drag and drop”
2
Zellweger contends that the form of this decision tree is based entirely on applied predicate calculus.
Each Aleph file type has its own format. Data files store nested Aleph data objects for static decision trees to conserve server resources. In contrast, model files store one of more B-Z chains to construct decision trees that display “live” data. The generic program logic at the Model/Controller layer processes both file types, sometimes even within the same decision tree. In this fashion, each file type is orthogonal to the other files in the distribution set. As indicated earlier, the Aleph is really an object design pattern. It encapsulates data, a parent data symbol and one or more child data symbols, along with a spatial method of binding parent data to child data. In an Aleph data file, a block of one or more child data always comes immediately after a parent field. Subsequently, this use of adjacent space as a form of logic implication is one way the Studio generates distribution files that simplify program logic. Another way the Studio simplifies the complexity of program logic on the server can be found in the data and model files themselves. For instance, the Studio always adds detailed information to each Aleph data object. This includes, amongst other things, the actual file location of a child’s next descendent. This simplifies demands on the program logic that traces out this lineage. It also simplifies the logic required to determine the next visualization because, in this setting, one state is logically chained to the next, either by data or by models. Of course, the underlying assumptions made about database applications and about relational data also contribute to this simplification. After all, end-user access and information windows hypothetically reduce the number of possible visual states down to two. Therefore, the next state is either another list or a window. And by avoiding language-based objects altogether, the program logic on the server no longer dedicated to these details, including entity or POJO associations. Consequently, this shift from language-based objects to data-based objects allows the controller to handle relational data in more generic fashion. By redirecting our focus to relational data and its object design patterns, this new MVC model has a profound impact on today’s MVC framework options. First and foremost, it enables this new web services to provide a wrapper around both the model and its controller. More importantly, this new wrapper shields the developer from all of its implementation details. And lastly, this new model eliminates even the consideration of object-relation mapping or ORM because it is simply not required any more.
6. WEB DATABASE SERVICES The ART Studio generates a complete set of distribution files for making database content available on the internet. The structure of these database applications, like all Studio components, is orthogonal. One part of the application enables end-users to “browse and explore” database content to locate information. A second part, a window, transparently links to end-user selections to display the information they found. A designer can configured the Studio’s distribution files to support these two visualizations using different server settings. The Studio can compile distribution files to conserve server resources and performance. It can also distribute model files to display “live” data from a database at run time. And lastly, the designer can configure the Studio to achieve aspects of both settings.
At its core, the Studio deploys the Aleph data to generate decision trees that furnish browse and explore access, allowing end-users to pinpoint the exact information they need. In this respect, the Aleph represents the digital analog to organizing information in a book. In deep hierarchical structures, it functions exactly like a back-of-the-book index. But in shallow structures, or in a subtree of a deep tree, it can also function like a “table of contents” in the front of a book. Currently, the ART Studio only supports two access visualizations: the taxonomy interface and geographic maps. The former metaphorically corresponds to a book index, while the latter is more like a table of contents. Yet, the same decision tree can use both visual interfaces to access database content. This flexibility lays the foundation for a web database services that could interface to a wide range of visualizations, including ones that display information in different forms.
7. CONCLUSION The paper introduces a new web services model called automated MVC. This new technology shifts our attention away from today's object-oriented programming language implementations to a newly discovered data relation in the relational data called the Aleph. It is a mechanically derived parent/child data relationship that is captured, modeled, and managed by the ART Studio. The Aleph is exquisitely similar to a programming language object. It has not only attributes: a parent data symbols and one or more child data symbols, but its spatial layout in silica embodies an IF THEN method that binds parent data to its child data. By exploiting this similarity to programming language objects, the automated MVC model not generates the model but the program logic for its controllers automatically. The ART Studio is a proof of concept of rapid database application development. Future research on the automated MVC model will focus on three critical areas: security, optimizing ambient networks, and developing open source distribution files for other visualizations. In terms of security, runtime database access still needs rigorous architectural and system-based protections. Next, ambient data networks model SQL queries, however by adding additional SELECT conditions to its stepwise configuration its navigation process could be simplified. In terms of integration tools, new features will be added to the Studio to integrate other forms of visualizations with its distribution files.
ACKNOWLEDGMENTS The authors thank Mark Burgin for all his generous support over the years and with this project.
8. REFERENCES 1.
Carrano, Frank M. and Janet Prichard. 2006. Data Abstraction & Problem Solving with Java. Pearson.
2.
Freeman, Eric, Elizabeth Freeman, Kathy Sierra, and Bert Bates. 2004. Head First Design Patterns. Sebastopol, California: O’Reilly Media. pp. 549-50.
3.
Hoffer, Jeffrey A., Ramesh Venkataraman, Heikki Topi. Modern Database Design. Pearson, 2012, pp. 14-3 – 14-8, 14-10, 14-12.
11. Zloof, Moshe. 1977. Query-by-example: a data base language, IBM Systems Journal, v.16 n.4, December. p. 324-343.
4.
Red Hat Projects: Hibernate. 2017. http://hibernate.org/, accessed on February 4, 2017.
5.
Red Hat Projects: Jboss. 2017. https://docs.jboss.org/ hibernate/orm/3.3/reference/en-US/html/queryhql.html, accessed on February 3, 2017.
12. Catarci, Tiziana, Maria F. COSTABILE, Stenfano Levialdi, Carlo Batini. 1997. Visual Query Systems for Databases: A Survey. Journal of Visual Languages & ComputingVolume 8, Issue 2, April 1997, pp. 215-260.
6.
Zellweger, Paul. 2016. The Aleph Data Relation, A Tree Within a Tree Visualization. Visual Data Analysis (VDA’16). San Fransisco, California. January 2016. pp.1-1(1).
7.
Zellweger, Paul. 2016. Tree Visualizations in Structured Data Recursively Defined by the Aleph Data Relation. 20th International Conference Information Visualization (IV'16). Lisbon, Portugal. July 2016. pp. 21-26.
8.
Zellweger, Paul. 2011. A Knowledge Visualization of Database Content Created By A Database Taxonomy. 15th International Conference Information Visualization (IV'11). London, UK. July 2011. pp.323-328.
9.
Paul Zellweger. 2005. A Database Taxonomy Based on Datadriven Knowledge Modeling. Knowledge Intensive MultiAgent Systems (KIMAS’05). IEEE Catalog Number 05EX1033. Waltham, MA. April 2005. 469-474.
10. Gray, Jim, Hans Schek, Michael Stonebraker, Jeff Ullman. 2003. The Lowell report. Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California. pp.740-752. [doi>10.1145/872757.872873]
13. Hearst, M.A. 2009. Search User Interfaces, Cambridge,United Kingdom: Cambridge University Press. 14. Dakka, Wisam, Panagiotis G. Ipeirotis, Kenneth R. Wood. 2005. Automatic construction of multifaceted browsing interfaces. 14th ACM International Conference on Information and Knowledge Management, (CIKM’05) October 31-November 05, 2005, Bremen, Germany. pp. 768-775. 15. Burgin, Mark. 1990. Theory of Named Sets as a Foundational Basis for Mathematics. In Structures in Mathematical Theories, San Sebastian. pp. 417-420. 16. Burgin, Mark. 2011. Theory of Named Sets. Hauppauge, New York: Nova Science Publishers. 17. Chakaravarthy, Venkatesan T., Vinayaka Pandit , Sambuddha Roy , Pranjal Awasthi , Mukesh Mohania, 2007. Decision trees for entity identification: approximation algorithms and hardness results. Principles of database systems (PODS’07) , June 2007, Beijing, China. pp.53-62.