CSE 544 Principles of Database Management Systems - CSE Home
Recommend Documents
CSE 544 - Winter 2009. Outline. • Introductions. • Class overview. • What is the
point of a db management system (DBMS)? • Main DBMS features and DBMS ...
CSE 544. Principles of Database. Management Systems. Magdalena Balazinska.
Fall 2007. Lecture 5 - DBMS Architecture ...
Page 1 ... CSE 544 - Fall 2009 ... http://www.cs.berkeley.edu/~brewer/cs262/
SystemR.html. 3 ... Solution 1: OS provides asynchronous I/O (true in modern OS)
. • Solution ... Rule of thumb: one OS-provided dispatchable unit per physical
device.
Database Systems 11(3), 1986. Also in Red Book (3rd and 4th ed). • Database
management systems. Ramakrishnan and Gehrke. Third Ed. Chapters 13 and 14
.
CSE 544 - Fall 2007. Outline. • Introductions. • Class overview. • What is the point
of a db management system (DBMS)? • Main DBMS features and DBMS ...
CSE 544 - Fall 2006. 3. Where We Are. • We covered the fundamental topics. –
Relational model & schema normalization. – DBMS Architecture. – Storage and ...
Database Systems. Introduction. UB CSE 562 Spring 2010. Some slides are
based or modified from originals by. Database Systems: The Complete Book,.
CSE 552: Distributed and Parallel Systems. Instructor: Tom Anderson (tom@cs).
TA: Lisa Glendenning (lglenden@cs). MW 10:30 – 11:50. (some weeks WF).
Prerequisite: CSE 143. Course Text: Data Structures and Algorithm Analysis in
Java 3 rd. Ed., Mark Allen Weiss, Addison. Wesley: 2012, ISBN-10: 0132576279.
11 Jun 2013 ... CSE 634 in the Paul G. Allen Center for Computer Science & ... Text: Data
Structures and Algorithm Analysis in Java 3rd Ed., Mark Allen Weiss, ...
Main textbook, available at the bookstore: • Database Systems: The Complete
Book,. Hector Garcia-Molina,. Jeffrey Ullman,. Jennifer Widom. Second edition.
Database Systems: The Complete Book,. Hector Garcia-Molina,. Jeffrey Ullman,.
Jennifer Widom. Most important: COME TO CLASS ! ASK QUESTIONS !
CSE 510 Database Management System Implementation. Spring'12. Catalog
Description Implementation of database systems. Data storage, indexing,
querying ...
UW/CSE Systems Programming — CSE 333. Course Description ... as
recommended or strongly recommended: Operating Systems, Networks. It would
also be ...
[email protected]. Office hours: See web page. • Text: Concepts of
Programming Languages,. Robert W. Sebesta, Sixth Edition. » Fifth edition is fine
.
Give an O nlog2 k algorithm to merge k sorted lists into one sorted list, where n is the total ... + the time taken by a
CSE 120. Principles of Operating. Systems. Fall 2000. Lecture 1: Course
Introduction. Geoffrey M. Voelker. September 20, 2000. CSE 120 -- Lecture 1 --
Course ...
Email Address: [email protected], [email protected] ...
algorithms, and major structures in computer operating systems. Students will be
able ...
Apr 26, 2012 ... CSE majors complete CSE110 Principles of Programming with Java. .... Starting
out With C++: Early Objects, Custom Edition for Arizona State ...
is, it performs a single access only to those nodes that may contain skyline points. BBS is ..... were inserted in the list before the creation of the temporary file are ... of BNL alleviates these problems by first sorting the entire dataset accordi
CSE 544 - Fall 2007. 3. Where We Are. • We covered the fundamental topics. –
Relational model & schema normalization. – DBMS Architecture. – Storage and ...
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 12 - Distribution: query optimization
References • R* Optimizer Validation and Performance Evaluation for Distributed Queries. L. F. Mackert and G. M. Lohman. VLDB’86. • Database management systems. Ramakrishnan and Gehrke. Third Ed. Chapter 22
CSE 544 - Fall 2007
2
Where We Are • We covered the fundamental topics – – – – – –
Relational model & schema normalization DBMS Architecture Storage and indexing Query execution Query optimization Transactions
Distributed DBMS • Important: many forms and definitions • Our definition: shared nothing infrastructure – Multiple machines connected with a network
DBMS
DBMS Network
stored data
stored data DBMS
DBMS
stored data
stored CSE 544 - Fall 2007 data
5
Reasons for a Distributed DBMS • Scalability (ex: Amazon, eBay, Google) – Many small servers cheaper than large mainframe – Need to scale incrementally
• Inherent distribution – Large organizations have data at multiple locations (different offices) -> original motivation – Different types of data in different DBMSs – Web-based and Internet-based applications
CSE 544 - Fall 2007
6
Goals of a Distributed DBMS • Shield users from distribution details • Distribution transparency – – – –
Naming transparency Location transparency Fragmentation transparency Performance transparency • Distributed query optimizer ensures similar performance no matter where query is submitted
– Schema change and transaction transparency
• Replication transparency (next week) • and more... CSE 544 - Fall 2007
7
Distributed DBMS Features • 70's and 80's, three main prototypes: – SDD-1, distributed INGRES, and R*
• Main components of a distributed DBMS – – – – –
Defining data placement and fragmentation Distributed catalog Distributed query optimization (today) Distributed transactions (next lecture) Managing replicas (next week)
Steps of query evaluation Query Parser Query Rewrite
πsname
(On-the-fly) (On-the-fly)
σssity=‘Seattle’ and ...
(Nested loop)
Optimizer Executor
Supplier (File scan) CSE 544 - Fall 2007
sno = sno
Supplies (File scan) 10
Review: Query Optimization • Enumerate alternative plans – Many possible equivalent trees: e.g., join order – Many implementations for each operator
• Compute estimated cost of each plan – Compute number of I/Os and CPU utilization – Based on statistics
• Choose plan with lowest cost
CSE 544 - Fall 2007
11
Distributed Query Optimization • Search space is larger – Must select sites for joining relations – Must select method for shipping inner relation: whole or matches
• Minimize resource utilization – I/O, CPU, & communication costs – Example cost function used in R* – WCPU Nbinst + WI/O NbI/O + Wmsg Nbmsg + WbyteNbbytes
• Could also try to minimize response time – Least cost plan != Fastest plan CSE 544 - Fall 2007
Read inner relation at its home site (either using an index or not) Project inner relation to remove attributes that are not needed Apply any single-table predicates Ship results to site of outer relation and store in temporary file • Note: we lose any indexes on the inner relation
• Fetch matches – – – –
For each tuple of outer relation, project tuple on join column Send value to site of inner relation Find matching tuples from inner relation Ship projected, matching tuples back
Why is fetch matches so inefficient? CSE 544 - Fall 2007
13
Distributed vs Local Joins Why can distributed joins be faster than local ones? • More resources are available to the join – Ex: Distributed query can use twice the buffer pool (useful when accessing relations through unclustered indexes)
• Different parts of the join can proceed in parallel – Ex: Join tuples from page 1 while shipping page 2
– Ex: Can sort the two relations in parallel
CSE 544 - Fall 2007
14
Additional Join Strategies • Dynamically-created temporary index on inner – Ship whole inner relation, store in temp table, index temp table
• Semijoin – – – –
Project outer relation on join column (eliminate duplicates) Ship projected column to site with inner relation Compute a natural join and ship matching tuples back Join the two relations
• Bloomjoin – Same idea as semijoin, but use Bloom filter instead of sending all values in the join column – Bloom filter creates some false positives through collisions CSE 544 - Fall 2007
Distributed DBMS Limitations • Top-down – Global, a priori data placement – Global query optimization, one query at a time; no notion of load balance – Distributed transactions, tight coupling
• • • •
Assumes full cooperation of all sites Assumes uniform sites Assumes short-duration operations Limited scalability
CSE 544 - Fall 2007
17
Distributed DBMS Challenges • Autonomy: different administrative domains – Cannot always assume full cooperation – Do not want distributed transactions
• Heterogeneity – Different capabilities at different locations – Different data types, different semantics -> data integration pb