Example: ⢠id int primary key. ⢠name varchar(200). Relational Database Systems 1 â Wolf-Tilo Balke â Institut für Informationssysteme â TU Braunschweig. 35.
Relational Database Systems 1 Wolf-Tilo Balke Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
7 SQL • SQL – Design Decisions – SQUARE & SQL – DDL • • • •
CREATE SCHEMA CREATE TABLE ALTER TABLE DROP TABLE
– DML • INSERT INTO • UPDATE • DELETE Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
2
7.2 Square & Sequel • Edgar F. Codd successfully established the relational model during the early 70„s in the research community – Based on set theory – A relation is a subset of the Cartesian product of domains
• Early query “languages” for the relational model were – Relational Algebra – Tuple Relational Calculus – Domain Relational Calculus
• Question: How to build a working database management system using this theory? Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
3
7.1 Theory to Implementation • System R was the first “real” relational database system (starting 1973) – Most design decisions taken during development of System substantially influenced the design of subsequent systems
• Questions – How to store and represent data? – How to query for data? – How to manipulate data? – How do you do all this with good performance? Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4
7.1 Theory to Implementation • The challenge of System R was to create a usable system – Theory is nice – but developers were willing to sacrifice theoretical details for the sake of usability
• Vocabulary change Table : Represents a relation Row : Represents a tuple Column : Represents an attribute / domain The old mathematical names were just too abstract for most people – Think of an index card! – – – –
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
5
7.1 Theory to Implementation • Data model decisions: During development of System R, two major and very controversial decisions had been taken – Allow duplicate tuples – Allow NULL-Values
• Those decisions are still heavily discussed fashion…
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
6
7.1 Theory to Implementation • Duplicate Tuples – In a relation, there cannot be any duplicate tuples – Also, query results cannot contain duplicates • Remember: The relational algebra and relational calculi all had implicit duplicate elimination
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
7
7.1 Theory to Implementation • Practical consideration – You want to query for name and birth year of all students of University Braunschweig – The result returns roughly 13,000 tuples – Probably there are some duplicates – It‟s 1973, your computer has an 8-Bit processor and 16 Kbyte of main memory… – To eliminate duplicates, you need to cache the result, sort it, and scan for adjacent duplicate lines • System R engineers concluded that this effort is not worth the effect • Duplicate elimination in result sets only on-request Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
8
7.1 Theory to Implementation • Decision: Drop mandatory duplicate elimination for query results • What about the tables themselves? – Again: Ensuring that no duplicates end up in the tables requires work – Engineers also concluded that there is actually no need in enforcing the no-duplicate policy • If the user wants duplicates and is willing to deal with all the arising problems – then that‟s OK
• Decision: Also drop no-duplicate policy for tables • As a result, the base theory of nowadays databases shifted from set theory to multi-set theory Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
9
7.1 Theory to Implementation • Sometimes, an attribute value is not known or an attribute does not apply for an entity – Example: Value of attribute “universityDegree” of entity Heinz Müller, if he does not have a degree? – Example: You do weather observation and store temperature, wind strength, and air pressure every hour – and then your barometer breaks?
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
10
7.1 Theory to Implementation • Possible solution: For each domain, define a value representing that data is not available / not known / not applicable /… – e.g. use none for Heinz Müller‟s degree, use -1 for missing pressure data – Problem • You need such a special value for each domain or use case • You need special failure handling for queries, e.g. “compute average of all pressure values which are not -1”
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
11
7.1 Theory to Implementation • Again, system designers chose the simplest (implementation-wise) solution: NULL values – NULL is a special value which is usable in any domain and represents that data is just there • There are many interpretations of what NULL actually means
– System has some default rules when dealing with null values • Aggregation functions usually ignore rows with NULL values (which is good in most, but not all cases) • Three-valued logics • However, creates some strange anomalies Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
12
7.2 Square & Sequel • Another tricky problem: How should users query the DB? – Donald D. Chamberlin and Raymond F. Boyce worked on this task – Both of IBM Research in San Jose, California – Main Concern: “Querying relational databases is too difficult with current paradigms”
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
13
7.2 Square & Sequel • Problem: More and more people started using databases (at that time, hierarchical and network DBs) – Many of those were not proficient in calculus and/or algebra – Had no in-depth knowledge of internal design of databases
• Relational Algebra – Users needed to define how and in which order data should be retrieved • Sequence of operations
– Choice of operations had significant impact on the database performance Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
14
7.2 Square & Sequel • Tuple / Domain Relational Calculi – Provided declarative access to data • Which is good: Just state what you want and not how to get it
– Unfortunately, were overly complex • Users needed to define a lot of extra variables • Made heavy used of quantifiers which was quite hard to understand for many people • Had no real linear notation
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
15
7.2 Square & Sequel • The first query language of Chamberlin and Boyce was Square – “Specifying Queries as Relational Expressions” – Based directly on tuple relational calculus – Main observation: • Most database queries are rather simple and complex queries are rarely needed • Quantification confuses people and can always replaced with an expression without quantification
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
16
7.2 Square & Sequel • Thus, square provided a linear notation for TRC – Excludes quantifiers, implicit notation of variables – Adds additional functionality which is needed for application (grouping, aggregating, etc) – Solved safety problem by introducing closed world assumption
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
17
7.1 Square & Sequel • Retrieve the names of all female students – TRC: *t.name| students(t) ⋀ t.sex=‘f’+ – Square: namestudentssex (‘f’) Conditions Which part of the result tuple should be returned?
The range relation of the result tuples
Attributes with conditions
• Get all exam results better than 2.0 of course 101 – TRC: {t.result | exams(t) ⋀ t.crsNr=101 ⋀ t.result