to cope with the new requirements and challenges. • Tons of research on
advanced topics in DBMSs in many directions. • New data models and data
formats.
CS561- ADVANCED TOPICS IN DATABASE SYSTEMS CS561-SPRING 2012 W P I , M O H A M E D E L TA B A K H
INTRODUCTION & LOGISTICS
1
HISTORY OF DBMS • Database systems have evolved since 70s to replace the file system w.r.t storing and querying the data
File system
DBMS
2
WHY DBMS ??? Storing and querying the data in file system has many disadvantages
• Data redundancy and inconsistency
• Multiple file formats, duplication of information in different files • Multiple records formats within the same file • No order enforced between fields
• Difficulty in accessing data
• Need to write a new program to carry out each new task • No indexes, always scan the entire file
• Integrity problems
• Modify one file (or field in a file), and not changing the dependent fields or files • Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly 3
WHY DBMS (CONT’D) ??? • Concurrent access by multiple users
• Many users need to access/update the data at the same time (concurrent access) • Uncontrolled concurrent access can lead to inconsistencies • Example: Two people are updating the same bank account at the same time
• Security problems
• Hard to provide user access to some, but not all, data
• Recovery from crashes
• While updating the data the system crashes
• Maintenance problems
• Hard to search for or update a field • Hard to add new fields 4
DBMS PROVIDES SOLUTIONS • Modeling of applications semantics and constraints • Data consistency even with multiple users • Efficient access to the data • Data integrity embedded in the DBMS • Recovery from crashes, security
5
TRADITIONAL APPLICATIONS OF DBMS • Transactional data, banking systems, retail stores, airline reservations, restaurant systems, etc… • Characteristics of these applications • • • •
Simple and well-structured data No complex relationships or operations Simple data types Querying and reporting is not very complex Given these ingredients è Relational Database Systems (RDBMS) is a perfect system 6
EMERGING APPLICATIONS !!! • DBMSs are the natural home of the data • Because of all DBMSs desired properties
• But, applications are getting more complex
• The assumed characteristics of simplicity no longer hold
• Database management systems have to change and expand to cope with the new requirements and challenges • Tons of research on advanced topics in DBMSs in many directions • • • •
New data models and data formats New features and access methods New optimizations and query processing …
7
EXAMPLES OF EMERGING APPLICATIONS • Data Stream Management Systems • Data are continuously arriving (no persistency) • One-pass main memory processing • Load balancing and load shedding
• Moving objects and spatio-temporal applications • Continuous streams of moving objects • Data, by definition, has two key dimensions (space & time) • Special query types, e.g., range queries, KNN queries
8
EXAMPLES OF EMERGING APPLICATIONS • Scientific Data Management • • • • •
E.g., in biology, chemistry, physics, atmospheric science, etc. Complex data types, e.g., arrays, images, sequences, structures Metadata, annotations and comments about the data Complex processing and workflows Provenance and lineage information
• Large-Scale Data Analytics and Distributed Processing • • • •
Massive scale data processing (terabytes and petabytes) Highly distributed and parallel processing New infrastructure and computing paradigms Distributed DBMSs and Hadoop/MapReduce framework
9
EXAMPLES OF EMERGING APPLICATIONS • Data Models for Complex Structures • Object-oriented data model (OODBMS) • Object-relational data model (ORDBMS) • Semi-structured data model (XML)
• Data Integration and Data Mining/OLAP • Integrating data from various sources • Entity resolution, schema mapping, etc. • Discovering hidden knowledge (without the users knowing what they want) The list goes on and on…. 10
COURSE PLAN AND ROADMAP • Touch various advanced topics in database systems • Lectures will have two flavors • Typical presentations (given by the instructor) covering book chapters • Research-oriented presentations (given by students) covering research papers
11
COURSE PLAN AND ROADMAP (WHAT YOU EXPECT TO LEARN) • Typical presentations will cover (By instructor) • • • • • •
Object-oriented and object-relational data models Semi-structured (XML) data model 50% of Distributed and parallel database lectures Active Databases and authorizations Information Integration and OLAP Hadoop and scientific data management
• Research-oriented presentations (By students) • Flexibility based on your interest • Suggested areas are: • • • • •
Scientific data management Hadoop/MapReduce Infrastructure Keyword search in database systems Cloud computing Data integration
50% of lectures
12
BRIEF OVERVIEW ON COURSE’S TOPICS (Typical Presentations)
13
1- OBJECT-ORIENTED & OBJECTRELATIONAL MODEL • Relations are the key concept, everything else is around relations • Primitive data types, e.g., strings, integer, date, etc. • Great normalization, query optimization, and theory • Application are getting more complex • •
CAD: Computer Aided Design, CAM: Computer aided manufacture Multimedia, document management, telecommunication
Relational model
• What is missing in relational model ?? • • •
Handling of complex objects and complex relationships Handling of complex data types Code is not coupled with data
•
No inherence, encapsulation, etc.
Object-Oriented model
14
partly in the spirit of C, where the dot and arrow both obta nents of a structure. However, in C, the arrow and dot ope slightly different meanings; in OQL they are the same. 111 C, a.f expects a to be a structure, while p->f expects p to be a p structure. Both produce the value of the field f of that structu
1- OBJECT-ORIENTED & OBJECTRELATIONAL MODEL CHAPTER 4. OTHER DATA IVODELS
that suggested by Fig. 4.1. Objects have fields or slots in which values are placed. These values may be of common types such as integers, strings, or arrays, or they may be references to other objects.
When specifying the design of ODL classes, we describe properties of three
• Object-Oriented Database 1. Attributes, which are values (OODBMS) associated with the object. • • • •
We discuss the
4.2. INTRODUCTION TO ODL
137
Example 4.2: In Fig. 4.2 is an ODL declaration of the class of movies. It is not a complete declaration; we shall add more to it later. Line (1) declarw If it makes sense, we can form expressions with several dots. Movie to be a class. Following line (1) are the declarations of four attributes if myMovie denotes a movie object, then myMovie. ownedBy denote that all Movie objects will have.
object that owns the movie, and mynovie. ownedBy .name denot that is the name of that studio.
1) c l a s s Movie { 2) attribute string t i t l e ; 9.1.3 Select-From-Where 3) a t t r i b u t e integer year;
Expressions in OQL
legal types of ODL attributes in Section 4.2.8. Depends purely on concepts from OO programming, e.g., C++ or 4) permits a t t r i b uus t e to integer length; OQL write expressions using a select-from-where s 2. Relationships, which are connections between the object at hand and ana t tfamiliar r i b u t e enum Film Ccolor,blackAndMite) filmType; 5) SQL's Java . to query form. Here is an example asking for th other object or objects. movie Gone IVzth the Wind. Define classes, objects, inheritance, etc. 3. Methods, which are functions that may be applied to objects of the class. SELECT m. year Tries to take some concepts from the relational model, e.g., SELECT Figure 4.2: An ODL declaration of the class Movie FROM Movies m Attributes, relationships, and methods are collectively referred to as properties. statement HERE m .on t i tline le = With Wind" The firstW attribute, (2),"Gone is named t i t lt eh.e Its type is string-a string of unknown length. U' e expect the value of the t i t lstring e attribute New languages ODL (object definition language) & OQL (object character 4.2.2 Class Declarations that, to escept theofdouble-quotes around the consta in anyXotice Movie object be thefor name the The next two attributes, year ODL &movie. OQL could be SQL rather than OQL. query language) A declaration of a class in ODL, in its simplest form, consists of: and length declared on lines (3) and (4), have integer type and represent the
In general, the expression consists year in which the movie wasOQL madeselect-from-where and its length in minutes, respectively. On of: line (5) is another attribute f ilmType, which tells whether the movie was filmed 1.black-and-white. The keylvord SELECT follolved by a listand of expressions. enumeration, the name of the in color or Its type isOBJECT-ORIENTATION anOBJECT-ORIENTATION 454 CHAPTER CHAPTER IN QUER 454 9. 9. IN QUERY L 2. The name of the class, and enumeration is Film. Values of enumeration attributes are chosen from a list 2. The keyrvord FROM followed by a list of one or more variable of le'terals, color and blackAndWhite example. 3. A bracketed list of properties of the class. These properties can be atd variable is declared in bythis giving An object in the1) class Movie asTYPE we have defined itASsoA(far MovieType 1) CREATE CREATE TYPE MovieType S (can be thought of tributes, relationships, or methods, mixed in any order. the four attributes. as a record or(a) tuple four tcomponents, one forhas Still the fundamental concept is ‘Relation’ i t li etwhose ,each .A2nwith value type, e.g. a 2expression l e CHAR(30) CHAR(30) ,aofcollection year year INTEGER, INTEGER, That is, the simple form of a class declaration is (b) The optional keyn-ord AS, and Extend the relational model with concepts from OO programming, 3,4) 4) oorl oBOOLEAN 3, Withi nt hci eno clWind", r BOOLEAN 1939, 231, color) { c l a s s The name (c) ("Gone 1;1; of the variable. e.g., complex types, inherence, encapsulation, etc.