Parallel vs Distributed DBS hs / FUB dbsII-03-10DDBIntro-2. References. > M.T.
Özsu and P. Valduriez. Principles of Distributed Database. Systems, 2nd edition.
IV Distributed Databases - Motivation & Introduction -
I OODBS II XML DB III Inf Retr DModel
• Motivation
• • • • •
Expected Benefits Technical issues Types of distributed DBS 12 Rules of C. Date Parallel vs Distributed DBS
References M.T. Özsu and P. Valduriez. Principles of Distributed Database Systems, 2nd edition. Prentice-Hall,1999. Rahm, E.: Mehrrechner-Datenbanksysteme, Addison-Wesley, 1994
G. Vossen, G. Weikum: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery, Morgan Kaufmann, 2001, ISBN ISBN: 1558605088 Gray, J.; Reuter, A.: Transaction Processing - Concepts and Techniques, Morgan Kaufmann Publishers, San Matteo, 1993 Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems, Addison-Wesley, 1987 (pdf) Bernstein, P.A., Newcomer, E.: Principles of Transaction Processing, Morgan Kaufmann, San Matteo, 1997 Material used from B. Kemme (McGill), H. Garcia-Molina (Stanford), A. Zaslavsky et al.(Monash), G. Alonso (ETH) hs / FUB dbsII-03-10DDBIntro-2
1
Motivation Application: Data "naturally" distributed Companies with different branches Airlines Financial Business University / faculties Any organization with a decentralized organizational structure
Technology: Network infrastructure, processors, RAM Economy: Hardware cost Software supporting Distributed Processing, e.g RPC Ö Huge number of interconnected systems Recent challenge: Web-based Computing Ö E-Commerce hs / FUB dbsII-03-10DDBIntro-3
Goals: Improvement of non functional characteristics Performance: the more computing power, the better Primary goal for parallel DBS, not necessary distributed DB
Reliability: Substitute faulty components (HW, software… … and network) seamlessly Fault tolerance: the ability to hide failures from users Related to higher availability 95,8 % too low?
Definitely: 1 hour / day !
Scalability upscale / downscale your system incrementally Central components and algorithms counter productive Ö Distributed algorithms hs / FUB dbsII-03-10DDBIntro-4
2
The dark side of distribution Systems often less reliable "You will never make a system of unreliable components more reliable by adding more unreliable components" However: hot standby But: data copies must be kept consistent, complex software, unreliable network.
Scalability DS inherently complex High development cost -> middleware efforts High administration cost
Ö lack of flexibility
hs / FUB dbsII-03-10DDBIntro-5
The dark side … Performance Double resources do not guarantee double performance Network performance? Q Transfer time not only depends on bandwidth Transfer of 4 KB page latency Bandwidth transfer 100 m 0.5 µs 10 Mbps 5 ms 100 m 0.5 µs 100 Mbps 0.5 ms 1 km 5 µs 100 Mbps 0.5 ms - 100 km 0.5 ms 100 Mbps 1 ms - 1000 km 5 ms 100 Mbps 5.5 ms - 10000 km 50 ms 1 Gbps 50 ms Q
Distance > 100 km Ö signal propagation time
Q
Compare mean disk access time: ~ 5 ms
dominates
hs / FUB dbsII-03-10DDBIntro-6
3
What is a Distributed Database? A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (D– DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. Distributed database system (DDBS) = DDB + D– DBMS
Def. by P. Valduriez, T. Öszu hs / FUB dbsII-03-10DDBIntro-7
Example (1) Transparency of distribution: one logical DB UPDATE empl SET sal = sal*1.1 WHERE proj.dur>12 AND emp.id = ass.eid AND proj.id=ass.pid
Berlin
net
All projects Berlin employees All assigments
Munic Muc projects Muc employees Muc assigments
New York NY employees
Expl. by B. Kemme
hs / FUB dbsII-03-10DDBIntro-8
4
Example (2) Cooperation: autonomous DB cooperating on particular tasks SELECT flights WHERE departure = Montreal AND arrival = Munich AND date = 12/9/2002 AND price < 800$ lufthansa.com
Travel-overland.com
net
air-canada.com
hs / FUB dbsII-03-10DDBIntro-9
Example(3) Autonomous, heterogenous systems, logically identical data types Select empl SET sal = sal*0.9 WHERE jobTitle = "product manager"
Daimler / Stuttg.
Chrysler / Detroit
net
OnlyStuttgart data IBM DB2
Only Detroit data Oracle 9i
Daimler / Bremen
Only Bremen data MySQL
hs / FUB dbsII-03-10DDBIntro-10
5
Example (4) Sophisticated Client / Server computing client client
Application Server A
client client Application Server B Possible R/W conflict
hs / FUB dbsII-03-10DDBIntro-11
Classification criteria Distribution Physically independent systems Peer-to-peer: data distribution and sharing Client / Server: function distribution e.g. parsing in client
Heterogeneity DBMS software Database schema (Types) and languages (SQL variants)
Autonomy No global control Local DBS operations may not influenced by global operations (e.g. of a global transaction) Note: subsumes completely independent or semiautonomous systems , see scenarios hs / FUB dbsII-03-10DDBIntro-12
6
Classification cube
Distributed DB: looks like one DB Federated: more autonomy but not independent (Expl. 3) Multi DB: independent, cooperative (Expl. 2) hs / FUB dbsII-03-10DDBIntro-13
by P. Valduriez, T. Öszu
Scenarios and common problems Not just one distributed database systems .. but indefinitely many Understand common problems e.g. how to guarantee one state for replicated data from the user point of view Solve by developing distributed algorithms e.g. transaction commit Main issue: Partial failure
Any unsolvable problems? Example: Internet marriage priest bride
Distributed transaction: YES of NO, this is the question
groom
All participants and communication unreliable hs / FUB dbsII-03-10DDBIntro-14
7
12 +1 rules for DDBS (C. Date) Rule 0: A DDB looks like a central DB to users Rule 1: sites should be as independent as possible – local autonomy Rule 2: There should not be a central master all sites are dependent on - No reliance on central site Rule 3: Never a need for complete shutdown – continuous operation Rule 4: Users should not need to know where data are stored - location transparency (independence) Rule 5: If data are split (e.g. columns of one relation) and distributed over several sites, user's should not be aware of it - fragmentation transparency hs / FUB dbsII-03-10DDBIntro-15
12 rules… Rule 6: Users should not be aware of replicated data - replication independence Rule 7: Efficient distributed query processing Rule 8: Global concurrency control and recovery – distributed transaction management Rule 9: Hardware independence Rule 10: OS independence Rule 11: Network independence Rule 12: DBMS independence hs / FUB dbsII-03-10DDBIntro-16
8
Parallel versus Distributed Databases More similarities than differences Similar to Parallel / Distributed Processing distinction Parallel DBS Not geographically distributed Goal: High Performance Homogenous Software Fast interconnect
Transparency
Distributed DBS Data geographically distributed Goal: Data sharing Disconnected operation possible -> autonomy hs / FUB dbsII-03-10DDBIntro-17
Parallel / distributed DBS Query processing in parallel DBS Distribute operators (sort, filter,…) an data over processor to make complex processing fast e.g. join on a shared disk MP system
P M1
P
P
P Mn
Join (R, S) { // |R| >> | S| 1. Split R into n-1 partitions Ri and assign to Mi/Pi; Assign S to processor / memory Pn / Mn; 2. Sort Ri and S; ( //n parallel 3. Join (n-1) + 1 streams } hs / FUB dbsII-03-10DDBIntro-18
9
Parallel / distributed DBS Distributed QP Given a data distribution Find strategy to evaluate query with minimal cost, in particular communication cost 10000 km
|S| = 100000 records
100 km
|R| = 10000 records
|T| = 1000 records Compute with minimal cost (time): R ZY S ZY T hs / FUB dbsII-03-10DDBIntro-19
Important terms Motivation: technology, application, economy Expected benefits: Scalability reliability performance Data / function distribution Fault tolerance in case of partial failures Autonomy , multi database, federated DB Distribution transparency Parallel versus Distributed DBS
hs / FUB dbsII-03-10DDBIntro-20
10