Using SQL Queries to Evaluate the Design of SQL Databases

6 downloads 420 Views 188KB Size Report
that describe good and bad/questionable database design practices, respectively .... PostgreSQL ™ databases that were presented at the end of the. 2012 fall ...
Using SQL Queries to Evaluate the Design of SQL Databases Erki Eessaar, Janina Voronova Department of Informatics, Tallinn University of Technology, Akadeemia tee 15A, 12618 Tallinn, Estonia [email protected], [email protected]

Abstract-The system catalog of a database with explicit schemas contains among other things information about the structure of the database. Queries based on the system catalog allow us to search occurrences of database design antipatterns (database design flaws). In this paper, we present the results of an evaluation of a set of SQL databases. We used the queries that were presented in the previous paper on this topic. A goal of the research is to further experimentally evaluate the queries. We present findings about the queries as well as evaluated databases. In addition, we propose more questions about the design of conceptual schemas of SQL databases that can be answered by querying their system catalogs. The use of the queries would allow us to partially automate the process of evaluating structure and constraints of existing databases and detecting design flaws.

I.

INTRODUCTION

There is a persistent problem that many existing databases have various design flaws. These make it more difficult to maintain and extend the databases as well as develop, maintain, and extend applications and utilities that use the databases. Catalogs of database design patterns or antipatterns that describe good and bad/questionable database design practices, respectively, help the database design community to systemize the information about design practices, teach these more effectively, and take steps towards automating the detection and resolving of database design problems. The previous paper on this topic [1] proposed a set of SQL queries for detecting possible occurrences (instances) of SQL database design antipatterns in the existing SQL databases. Each such antipattern describes violations of some SQL database design principle/heuristic. The paper analyzed 12 database design antipatterns [2] and proposed in total 14 queries for 11 antipatterns. Eight of the analyzed antipatterns were about logical design and four were about physical design. The queries are based on the Information Schema views that are themselves created based on the base tables of the system catalog. We are interested in queries based on the Information Schema because the structure of the views in the information_ schema schema has been standardized by the SQL standard [3] and hence there is a better chance that the queries are usable in case of different database management systems (DBMSs). PostgreSQL ™ 9.2 [4] was used as the basis to write and test the queries because PostgreSQL provides the information_schema and we have experience in using the

DBMS. The query statements [1] can be found from the file: http://staff.ttu.ee/~eessaar/files/Design_flaws_queries.pdf According to the ANSI-SPARC architecture, a database has one or more external views, one conceptual view, and one internal view [5]. All these are described by means of schemas. The SQL standard [6] specifies database objects that constitute external and conceptual schemas. Hence, it would be possible to use the Information Schema views to find answers to the questions about the external and conceptual schemas of databases. The antipatterns [2] and their detection queries [1] deal with the design problems of conceptual schemas. The first goal of the paper is to experimentally evaluate the antipattern-detection queries [1] by using them to evaluate a set of existing databases. This was not extensively done in the previous paper [1] due to space restrictions. The second goal of the paper is to propose additional questions about the design of conceptual schemas of SQL databases that can be answered by querying their system catalogs. The third goal is to discuss possible application areas of the queries. Reference [1] and the current paper offer different but complementary approach for evaluating the quality of the conceptual schema of an existing database compared to a previous paper ([7]) on this topic. Reference [7] suggested a semiotic approach for using a conceptual data model of a database to evaluate the semantic quality of its conceptual schema. In the present approach, we do not use conceptual/logical/physical data models but rely on the specification of the database schemas that are stored internally by a DBMS as well as actual data that has been registered based on the conceptual schema. Queries based on the system catalog allow developers to better comprehend the schemas of a database and as such are tools to improve the pragmatic quality of the database schemas. By using some of the queries one can identify missing constraints or places where a declarative approach for enforcing constraints can and should be used instead of an imperative approach. Declaratively enforced constraints in databases are the means through which DBMSs get information about the semantics of data in the databases and are also able reflect this information to the users and evaluators of the conceptual schemas. The results of the queries are a basis to improve the semantic quality of the conceptual schema so that there could be a better correspondence between the stakeholders’ knowledge about the domain of the database and its conceptual schema. It could

increase the perceived completeness [7] of the conceptual schema and does not reduce the perceived validity [7] of the schema. The rest of the paper is organized as follows. Firstly, we explain the experiment for investigating existing databases. Secondly, we offer the main findings about the queries and the databases. Thirdly, we present further questions about the design of the conceptual schemas of SQL databases that can be answered by querying their system catalogs. After that, we discuss possible application areas of the queries. Finally, we conclude and point to the future work with the current topic. II. A RESEARCH EXPERIMENT TO EVALUATE THE ANTIPATTERN-DETECTION QUERIES In [1] it was already shown that the queries can produce false-positive and false-negative results and hence their results point to possible problems in schemas rather than confirm with certainty that there is/is not a design problem. Because the use of the queries does not eliminate the need of experts to review the schemas, we are especially interested in the frequency and reasons of false-positives in the query results because these clutter the results and decrease the usability of the queries. We needed databases for the experiment. The first author teaches in the Tallinn University of Technology (Estonia) courses about database design and programming mostly to the bachelor students of informatics and business information technology. Students have to create database projects and finally implement the database by using a server SQL DBMS. One of the possible server SQL DBMSs that students can use is PostgreSQL ™ (another is Oracle Database ™). Students have a freedom to select the information system for that to design a database. We used a set of students’ databases for the experiment. We did not want to evaluate only the best or only the worst presented databases and hence, we evaluated all the PostgreSQL ™ databases that were presented at the end of the 2012 fall semester. In total, we evaluated 41 students’ databases. 49 projects were presented in the period and only eight projects did not use PostgreSQL ™ at all. The use of student projects has its limitations. The size of the databases was quite small (average number of projectrelated base tables in the databases was 10.9). The total number of evaluated base tables was 447. During the design process the databases were repeatedly manually reviewed by the academic and many of the design problems were resolved as a result of that. During the study process many of the design problems and their solutions referenced in [2] were explained. The importance of integrity constraints and advantages of enforcing constraints declaratively was stressed and the use of constraints was also required in the project. Then again, the used CASE tool (IBM Rational Rose ™ [8]) does not support automatic detection of database design problems. In addition, some students do not pay enough attention to the talk of lecturers and requirements of projects or misunderstand these. Hence, there is still a possibility that their database designs have problems. Regardless of limitations, the databases allow

Fig. 1. Conceptual data model of a software system that can be used to execute queries for evaluating the design of a database.

us to evaluate how many false-positives the queries produce and what are the reasons of the false-positive results. The system catalog of a SQL database is based on the underlying data model of SQL. Hence, the queries can be used to evaluate the base tables of the system catalog. We used the queries to evaluate the schemas of PostgreSQL ™ 9.2 system catalog. There are 50 and 7 base tables that are in the schemas pg_catalog and information_schema, respectively. For this task we had to adjust the queries to find information about the schemas other than public that have the owner postgres. Of course, one can execute queries one by one in a database by using a general-purpose database management tool. However, it would be much more efficient to be able to select a database, select desired queries, and execute the selected queries based on the selected database. Therefore, we suggest a specialized software tool for executing the queries. An initial version of the tool was implemented for the PostgreSQL ™ DBMS. The tool requires a separate PostgreSQL database. The base tables that the tool internally needs could be created based on the conceptual data model in Fig. 1. It is possible to access and evaluate all the PostgreSQL databases on the server where the tool resides by using the tool. Users would have to log in by using the username/password of a database user who has enough privileges to query the databases that they want to investigate. Categories and tests are unordered and ordered sets of queries, respectively. An example of a category is “Database design antipatterns”. An example of a test is “Checking of student projects in the course X”. Surely it would be possible to execute all the queries belonging to a category one after another based on the selected database. However, tests allow us to specify sets of queries that belong to different categories. It would be possible to select a database, select a test, and let the system to execute all the queries that belong to the test in the order specified in the test based on the selected database. III. THE RESULTS OF THE EXPERIMENT A. Observations About the Antipattern-Detection Queries The numerical values in this section are based on the 41 students’ databases. There were 81 cases (triples of a database, an antipattern, and a query) where a query pointed to the presence of one or more occurrences of an antipattern in a database. However, after manual checking of the database specifications only 23 cases (28.4%) turned out to include real

occurrences of the antipattern. In eight cases (out of 23) the query result pointed to the real as well as false-positive occurrences of the antipattern. In general, the queries are based on the existence or lack of database constraints, data in the base tables that design we evaluate, or names of database objects. Although many queries use a combination of these aspects, it is possible to state the main aspect for each query. In case of mainly constraint-based queries there were five cases that consisted only of falsepositive results. In case of mainly data-based queries there were 19 cases that consisted only of false-positive results. In case of mainly name-based queries there were 34 cases that consisted only of false-positive results. There were two queries that resulted with the biggest number of false-positive results. One of the queries takes into account data in the base tables that design we evaluate. It is used to detect the occurrences of antipattern “Format Comma-Separated Lists”. It detects implementations of multi-valued attributes by searching columns with the VARCHAR or TEXT type that contain values that themselves contain separation characters like “,” or “;”. The query detected possible occurrences of the antipattern in 20 databases. However, only in one of the databases there was a real occurrence of the antipattern. A main reason of the false-positive results is that columns contain textual data that represents objects with complex internal structure. This happened at least once in 18 databases. The main reason was that the identified column contains data about postal addresses. A heuristic is to ignore columns those names contain the word address. Another main reason of the false-positive results is that columns contain ordinary sentences like free-form comments. This happened at least once in 15 databases. A heuristic is to ignore columns those names contain the words description or comment. The third reason of false-positive results is that columns contain names that themselves contain separation characters. It occurred in two databases. In one case a column contained the names of election constituencies and in another case the names of diagnoses. A heuristic is to ignore columns with the names like constituency. The suggested sets of names can be modified if one gets more information about the naming practices in the databases that will be evaluated. Another query that produced a lot of false-positive results takes into account the column names of the evaluated base tables. It is used to detect the occurrences of antipattern “Leave Out the Constraints”. It searches missing referential constraints by finding pairs of columns of different base tables where the names and types of the columns are the same and there is no referential constraint that connects these columns. In each pair, at least one of the columns is the primary key column or a unique column of a base table. The query (query A) detected possible occurrences of the antipattern in 36 databases. However, only in six of the identified databases there were one or more real occurrences of the antipattern. In case of five of these databases there were also false-positive results. Another query (query B) for detecting the antipattern, which searches base tables that are not connected with any

other base table through a referential constraint, detected the same occurrences as well as pointed towards two more missing referential constraints in another database. The query A did not find these because referencing (foreign key) and primary key columns had a different type. On the other hand, the query A detected one missing constraint that the query B did not detect because only one, not all/most of the constraints were missing. One of the main reasons of the false-positive results is the implementation of a generalization hierarchy that occurs in a conceptual data model. The resulting base tables can contain columns with the same name and type if class table inheritance or concrete table inheritance [2] designs or the PostgreSQLspecific table inheritance feature [4] is used to implement the hierarchy. This happened at least once in 25 databases. However, such effect to the results of the query should not be the reason why to avoid these designs. It is difficult to find a good heuristic if the Information Schema does not provide explicit information about the existence of generalizations behind the base tables. Another reason of false-positive results is the existence of columns in different base tables that have the same name and type but are meant to record data with different semantics. This happened at least once in 24 databases. For instance, Organization and Service do not have a common supertype in the conceptual data model but both could have the attribute name. The third reason of false-positive results is the existence of multiple one-to-one or one-to-many relationships out from the same base table resulting with multiple base tables with referencing columns that have the same name and type. However, there should not be referential constraints between the columns. This happened at least once in 13 databases. A heuristic is to ignore pairs of columns that are both referencing columns and in case of both the referenced table is the same base table. The suggested heuristics would decrease false-positive results but would increase false-negative results. To summarize, different queries have a different probability of false-positive results. Specialized software for executing the queries could display the level of confidence that the results do not contain false-positives. The confidence information could be stored in the query database (see Fig. 1) by using a classifier that characterizes the queries. Queries that probably cause more false-positives could be hidden from the less experienced users. Moreover, the possibility to sort the results of individual queries based on different columns would be useful in dealing with a large number of false-positive results. There are many reasons why data in a particular column could resemble data that can be recorded in a column due to an occurrence of a database design antipattern. Declared constraints in a database help DBMSs as well as human investigators to understand the meaning of data in the database. If there are no declared constraints, then it is difficult for everyone to understand the meaning of data. In this case, incorrect data can be easily recorded in the database. In natural language people can use different terms to refer to the same concept as well as use the same term to refer to

different concepts. Finding good and expressive names to database objects is important and certainly simplifies understanding of the database schema. However, the names cannot be the only basis for understanding the semantics of data in the database. The least susceptible to false-positive results are queries that are only based on the declarations of constraints in databases. However, in case of the query for detecting the occurrences of “Always Depend on One’s Parent”, we got false-positive results in two databases. It searches referential constraints where the referencing table and the referenced table are the same. PostgreSQL ™ allows multiple referential constraints with the same name in the same schema if these constraints are associated with different base tables. In this case the query will give incorrect results because it cannot unambiguously determine the referencing and the referenced table. This information is not available through a single Information Schema view and we have to find the result by joining different views based on the names of referential constraints. B. Observations About the Students’ Databases By using the queries, we detected at least one real occurrence of database design antipatterns in 13 databases (out of 41) (31.7% of databases) – in total 8 different antipatterns. The most frequently occurring antipattern was “Leave Out the Constraint” that means missing referential constraints. It manifested in seven databases. The following problems occurred at least once in two databases (not necessarily the same databases): list of permitted values in a column was specified in a check constraint (antipattern “Specify Values in the Column Definition”) and hierarchical structure was implemented in a base table by adding a foreign key that refers to a candidate key of the same table (antipattern “Always Depend on One's Parent”). In addition, in one database double precision type was the declared type of columns (antipattern “Use FLOAT Data Type”) for recording quantities of products and quantities of money. The queries did not detect real occurrences of antipatterns “Assume You Must Use Files”, “Clone Columns”, “Use Generic Attribute Table”, and “Use Dual Purpose Foreign Key” in any evaluated database. C. Observations About the System Catalog of PostgreSQL By using the queries, we found possible occurrences of six antipatterns in the PostgreSQL ™ 9.2 system catalog [4]. “Clone Columns”: Base table pg_statistic of schema pg_catalog that stores statistical data about the contents of the database has columns stakind1…stakind5 (values “indicate the kind of statistics stored in the Nth “slot”” [4]), staop1...staop5, stanumbers1…stanumbers5, and stavalues1…stavalues5. “Format Comma-Separated Lists”: Base table pg_aggregate of schema pg_catalog that stores information about aggregate functions has column agginitval where each value is the external string representation of the initial value of the transition state. These values consist of comma-separated components. Although this is not an occurrence of the antipattern, the query also reveals that pg_ts_dict base table of

pg_catalog, which contains entries defining text search dictionaries, has column dictinitoption that contains information about both language and stop words. These values could be put into different columns. “Leave Out the Constraints”: Referential constraints are not enforced in the base tables of the system catalog. Poorly tested DBMS modules could violate referential integrity. Reverse engineering of the system catalog does not reveal relationships. “Specify Values in the Column Definition”: Base tables sql_features, sql_packages, and sql_parts of information_schema contain column is_supported that is based on domain yes_or_no. It uses varchar(3) as the base type and has a check constraint that determines possible values “yes” and “no”. PostgreSQL ™ supports type Boolean that could be used as the base type of yes_or_no. “Use a Generic Attribute Table”: Base table pg_largeobject of schema pg_catalog for holding “large objects” resembles the key-value structure of the antipattern. The design is caused by the internal design of the DBMS. Table pg_attribute of schema pg_catalog contains columns attoptions and attfdwoptions for recording key-value pairs. The query did not refer to the columns but referred to the table due to its name. “Use FLOAT Data Type”: There are in total four base tables in pg_catalog that have in total six columns that have type real. The columns could have decimal type and column reltuples of pg_class could in our view even have type bigint. PostgreSQL uses OIDs (Object Identifiers) as the primary keys of system tables [4]. It is an occurrence of “One Size Fits All” antipattern. However, the corresponding query did not reveal that because data about these primary key columns is not provided through the Information Schema. IV. SOME ADDITIONAL QUESTIONS ABOUT THE DESIGN OF CONCEPTUAL SCHEMAS OF SQL DATABASES If we want to find answers to questions about the internal schema of a database, then we have to query unstandardized base tables of the system catalog because the Information Schema does not provide this information. However, there are plenty of tools for advising developers how to change the internal schemas. Moreover, some DBMSs are capable of selftuning internal schemas [9]. Unfortunately tools for verifying the conceptual schemas of databases are much less common. We are aware of Parasoft ™ [10] that offers some commercial tools for the database conceptual schema verification. Next, we present some additional questions about the conceptual schemas of SQL databases, the answers of which help us to characterize databases and possibly identify their design flaws. The questions can be answered by querying the system catalog. We have created all the queries in PostgreSQL ™ 9.2 [4]. We do not present the query statements here due to space restrictions. We do not claim that it is the final list of useful questions and encourage more research in this field. Which base tables are without any unique constraints and primary keys? The answer identifies base tables permitting duplicate rows that cause various practical problems [11].

The next four questions help us to identify base tables where some integrity constraints are possibly missing. The reason could be that some requirements have not been identified or that the requirements have been incompletely translated into database design. Each not null constraint is a check constraint. However, the queries that are created to answer these questions should not take into account not null constraints. Which base tables are without any associated (directly or through domains) check constraints? Which base tables have at least two columns of date or timestamp (with or without time zone) type and have no check constraints involving two or more of these columns? The columns mean that we want to record data about events or processes, which often have a certain order. Hence, the values in these columns must be in a certain order in case of each row of such a table. For instance, the end of a meeting cannot be earlier than the beginning of the meeting. Such constraints cannot be associated with a SQL domain. Which base table columns are not referencing columns of any referential constraint and are without any associated (directly or through domains) check constraints? Which base tables have a surrogate key as the primary key and do not have any additional unique constraints? In case of defining the surrogate primary key in a base table it is a common mistake to forget to declare existing natural keys. What is the number of triggers in the database? Triggers can be used to enforce complex integrity rules. Hence, the number of triggers gives an indication about how many complex integrity rules have been enforced at the database level. The following three questions help us to identify the extent of using NULLs in SQL base tables. NULLs could lead to wrong answers to certain queries [11]. Which referencing columns of base tables permit the use of NULLs? Among other things one has to check whether such columns are consistent with the conceptual data model. What is the percentage of optional columns (that permit NULLs) in case of each base table? Which optional base table columns do not have a default value for preventing NULLs in the database? What are the pairs of base tables that have at least two columns with the same name and data type? The tables might violate the principle of orthogonal design [5] and hence might facilitate uncontrolled data redundancy over different tables. In addition, queries based on the system catalog can calculate values of database metrics [12] Table size, Depth of relational tree of a table T, Referential degree of a table T, Size of a schema, Depth of referential tree, and Referential degree. All the table level measures are about base tables. Reference [1] and the current paper are about evaluating the design of SQL databases with the help of queries based on their system catalog. However, one could use queries based on system catalog in case of DBMSs with different data models. The prerequisite is that it must be possible to explicitly define database (external, conceptual as well as internal) schemas at the database level so that information about these would be

stored in the system catalog by the DBMS. If one uses NoSQL systems to create schemaless data stores [13], then one has to review application code to evaluate the implicit schema scattered amongst the code that access the data. V. POSSIBLE APPLICATION AREAS OF THE QUERIES The queries facilitate partial automation of the evaluation of the quality of the conceptual schemas of existing databases. The bigger is the number of schema elements, the bigger advantage the queries offer. They would be helpful in the context of database evolution because it allows interested parties to quickly evaluate the design of a database and find out what, if any, database refactorings are needed. Such automation would of course be useful in case of large (in terms of schema size) legacy databases with no or limited documentation. The use of the queries would be helpful even if documentation does exist because a) there could be discrepancies between the models and the implementation and evaluation of the actual database gives the best information about its possible design problems, b) reverse engineering of the schemas by using a CASE tool is not necessarily needed, c) agile modelling promotes the use of the simplest tools like whiteboards where the use of such automation at the model level would be impossible, and d) not all CASE tools allow execution of queries based on models. However, if a CASE tool supports querying database design models, then many of the proposed queries can be translated to queries based on models. An advantage of queries based on an actual database is that they make it possible to take into account data in evaluated tables. However, missing constraints raise the possibility of incorrect data. An advantage of querying models is that mappings between analysis and design models can be used to find out whether the design models are complete and valid in terms of captured requirements to the system [7]. The queries help us to investigate the quality of the end product (database) instead of the quality of the production of the product (database design process). It is not the most effective way to improve the quality [14] but it is what we can do if we have a ready-made database. We should ensure that in the future development process tries to avoid mistakes that we discover in the existing databases. If we integrate the use of the queries (continuous feedback) to the development process, then it improves the process quality and helps developers to create end products (databases) with fewer design flaws. The queries are also useful in the context of teaching/learning database design. A survey [15] showed that 80% of surveyed academics who taught databases used practical work for assessment. It was the most widely used assessment type in case of databases. If the students can use a learning software environment with an intuitive and userfriendly user interface to execute the queries, then it increases the immediacy of feedback about their practical work and allows them to work at their own pace. Preferably the software should be web-based to avoid installation and configuration of the system and allow users to access the system regardless of

their location and their computing resources. If a learning environment where the queries are executed would be able to store statistics about the query results, then it would give information about the frequency of database design problems. Academics can take this into account and adjust their ongoing and future courses accordingly. In addition, the statistics could reveal whether the work is based on trial and error or whether students consciously apply the design knowledge. A possibility to extend the software is to show information about better design solutions in case of identified possible design problems. Moreover, if an academic has to review many projects or the results of smaller practical tasks, then the queries will give quick overview of answers and one can concentrate attention to the most problematic parts of the answers. If the number of students increases, then supporting systems like this would allow academics to avoid simplification/reduction of assessed tasks. There are many systems for automatic assessment of assignments of imperative programming [16] or automatic assessment of assignments to write SQL statements [17]. We are currently not aware of systems for automating assessments of database design assignments. The system of queries cannot determine the grade automatically due to the possibility of false positive and false-negative results of queries. However, it would be a useful assistant to the academics who have to grade the work. In our view, involvement of academics in grading is not something to be abolished because it makes students sense human involvement in the assessment process and hence feel that their work effort is valued. It is possible to find values of software measures by using queries. These queries would give numeric results that characterize schemas but are less useful in determining the quality of a schema or the grade of a student that should depend on the quality of the schema. For instance, Piattini et al. [18] conclude “that the number of foreign keys in a relational database schema is a solid indicator of its complexity”. Increasing complexity reduces maintainability. However, it could be that a database has been created by using the anchor modeling [19]. Although there are a large number of base tables on the sixth normal form, there are also views based on the based tables that are on a lower normal form, and the conceptual and external schemas have probably been generated based on an anchor model. In this case the maintainability of the schemas is quite good because, for instance, it is easy to extend the conceptual schema – one always has to add new tables instead of altering existing tables. In the context of assessing students’ projects such queries can be used to count the number of different types of database objects. The result would be used if there were requirements to the minimal number of different types of database objects (for instance, a database must contain at least ten base tables).

of the queries would give false-positive results. Most susceptible are queries that apply heuristics based on the names of database objects or data in the evaluated base tables. Despite that, the queries were able to detect different problems in the evaluated databases and in our view they would be a useful tool in case of creating and managing databases as well as teaching/learning database design. Their use would not hinder implementation of innovative systems but allow developers to be more informed. For instance, the queries allowed us to quickly detect some possible design problems of the quite large PostgreSQL ™ 9.2 system catalog. We also proposed more questions about database design that can be answered by querying the system catalogs of SQL databases. Future work must contain more evaluations of the proposed queries, creation of audience-specific software systems for executing the queries, and identification of additional design problems that occurrences can be detected by querying the system catalogs of databases. REFERENCES [1]

[2] [3]

[4] [5] [6] [7]

[8] [9]

[10] [11] [12] [13] [14]

[15]

[16]

[17]

VI. CONCLUSIONS We presented the results of an experimental evaluation of queries for detecting possible occurrences of SQL database design antipatterns. We discovered various reasons why some

[18] [19]

E. Eessaar, “On Query-based Search of Possible Design Flaws of SQL Databases,” in Proc. Int. Conf. on Systems, Computing Sciences & Software Engineering (SCSS 12), in press. B. Karwin, SQL Antipatterns. Avoiding the Pitfalls of Database Programming, The Pragmatic Bookshelf, 2010, pp. 15-155. IWD 9075-11:201?(E) Information technology — Database languages — SQL — Part 11: Information and Definition Schemas (SQL/Schemata). 2011-12-21. “PostgreSQL 9.2 Documentation,” [Online]. Available: http://www.postgresql.org/docs/ C.J. Date, An Introduction to Database Systems, 8th ed.. Boston: Pearson/Addison Wesley, 2003. IWD 9075-2:201?(E) Information technology — Database languages — SQL — Part 2:Foundation (SQL/Foundation). 2011-12-21. E. Eessaar, “On Using a Semiotic Quality Framework to Evaluate the Quality of Conceptual Database Schemas,” in Proc. Int. Conf. on Systems, Computing Sciences & Software Engineering (SCSS 10), pp. 103–115. “Rational Rose,” [Online]. Available: http://www.ibm.com/developerworks/rational/products/rose/ S. Lightstone, T. Teorey, and T. Nadeau, Physical Database Design. The Database Professional’s Guide to Exploiting Indexes, Views, Storage, and More. Elsevier, 2007, ch. 12. “Parasoft,” [Online]. Available: http://www.parasoft.com/ C.J. Date, SQL and Relational Theory. How to Write Accurate SQL Code. O’Reilly, 2009, ch. 4. M. Piattini, C. Calero, H. Sahraoui, and H. Lounis, “Object-Relational Database Metrics,” L'Object, 2001. M. Fowler. (2013, January 7). Schemaless Data Structures. [Online]. Available: http://martinfowler.com/articles/schemaless/ D.L. Moody, “Theoretical and practical issues in evaluating the quality of conceptual models: current state and future directions,” Data & Knowledge Engineering, vol. 55, pp. 243-276, 2005. J. Carter and J. English, “How Shall we Assess This? ITiCSE'03 Working Group. Initial survey results,” [Online]. Available: http://www.cs.kent.ac.uk/people/staff/jec/assess/stats.html P. Ihantola, T. Ahoniemi, V. Karavirta, and O. Seppälä, “Review of Recent Systems for Automatic Assessment of Programming Assignments,” in 2010 Proc. Koli Calling Conf., pp. 86-93. S. Dekeyser, M. de Raadt, T.Y. Lee, “Computer assisted assessment of SQL query skills,” in 2007 Proc. ADC Conf.-Volume 63, pp. 53-62. M. Piattini, C. Calero, and M. Genero, “Table Oriented Metrics for Relational Databases,” Software Quality Journal, vol. 9, pp. 79–97, 2001. L. Rönnbäck, O. Regardt, M. Bergholtz, P. Johannesson, and P. Wohed “Anchor modeling - Agile information modeling in evolving data environments,” Data & Knowl. Eng., vol. 69, pp. 1229-1253, Dec. 2010.