Distributed Database Management Systems
The Evolution of Distributed Database Management Systems
data over interconnected computer systems
– Both data and processing functions are distributed among several sites
– Data access via dumb terminals
Database Systems, 10th Edition 2
Database Systems, 10th Edition 3
The Evolution of Distributed Database Management Systems
Database Systems, 10th Edition 4
cyberspace not next door
The Evolution of Distributed Database Management Systems
Database Systems, 10th Edition 5
the quick-response decision making environment
The Evolution of Distributed Database Management Systems
Database Systems, 10th Edition 6
and distribution
decision making
The Evolution of Distributed Database Management Systems
Database Systems, 10th Edition 7
increase
The Evolution of Distributed Database Management Systems
8
Distributed Processing and Distributed Databases
more physically independent sites connected through a network
9
Distributed Processing and Distributed Databases
independent sites – Database composed of database fragments
10
Distributed Processing and Distributed Databases
11
Characteristics of Distributed Management Systems
12
Characteristics of Distributed Management Systems (cont’d.)
13
Characteristics of Distributed Management Systems (cont’d.)
transparently to the end user
14
15
DDBMS Components • Must include (at least) the following
components: – Computer workstations/remote devices
– Network hardware and software that reside in each device or w/s to interact and exchange data
– Communications media that carry data from one site to another
16
DDBMS Components (cont’d.)
– Transaction processor (a.k.a application processor, transaction manager)
– Data processor or data manager • Software component residing on each computer
that stores and retrieves data located at the site • May be a centralized DBMS
17
DDBMS Components (cont’d.) • The communication among the TPs and DPs is
made possible through protocols which determine how the DDBMS will – Interface with the network to transport data and
commands between the DPs and TPs – Synchronize all data received from DPs and route
retrieved data to appropriate TPs
– Ensure common DB functions in a distributed system e.g., data security, transaction management, concurrency control, data partitioning and synchronization and data backup and recovery
18
19
Levels of Data and Process Distribution
20
Single-Site Processing, Single-Site Data
the host computer
– DBMS usually runs under a time-sharing, multitasking OS
21
22
Multiple-Site Processing, Single-Site Data
23
Multiple-Site Processing, Single-Site Data
location
w/s which increases network traffic, slows response time and increases communication costs
24
Multiple-Site Processing, Single-Site Data
causes all 100,000 rows to travel to end user w/s
25
Database Systems, 10th Edition 26
Multiple-Site Processing, Multiple-Site Data
transaction processors at multiple sites • Classified as either homogeneous or
heterogeneous • Homogeneous DDBMSs
– Integrate multiple instances of the same DBMS over a network
Database Systems, 10th Edition 27
Multiple-Site Processing, Multiple-Site Data (cont’d.)
over a network but all support the same data model
– Support different data models (relational, hierarchical, or network)
– Different computer systems, such as mainframes and microcomputers
28
29
Distributed Database Transparency Features
– Distribution transparency
– Transaction transparency
– Failure transparency
– Performance transparency – Heterogeneity transparency
30
database as if centralized
– The user does not need to know • That the table’s rows and columns are split
vertically or horizontally and stored among multiple sites
31
Distributed Database Transparency Features
network site – Ensures that the transaction will be either entirely
completed or aborted in order to maintain database integrity
event of a node or network failure
– Functions that were lost will be picked up by another network node
32
Distributed Database Transparency Features
DBMS • No performance degradation due to use of a network or platform
differences • System will find the most cost effective path to access remote data • System will increase performance capacity without affecting overall
performance when adding more TP or DP nodes
– Allows the integration of several different local DBMSs under a common global schema
33
Distributed Database Transparency Features
Distribution Transparency • Allows management of physically dispersed database as if
centralized • Three levels of distribution transparency:
– Fragmentation transparency • End user does not need to know that a DB is partitioned
– SELECT * FROM EMPLOYEE WHERE…
– Location transparency • Must specify the database fragment names but not the
location – SELECT * FROM E1 WHERE … UNION
– Local mapping transparency • Must specify fragment name and location
– SELECT * FROM E1 “NODE” NY WHERE … UNION
34
35
Distribution Transparency
seen by the DBA – It is distributed and replicated at the network nodes – The database description, known as the distributed
global schema, is the common database schema used by local TPs to translate user requests into subqueries that will be processed by different DPs
36
Transaction Transparency
37
Distributed Requests and Distributed Transactions
remote site
38
Distributed Requests and Distributed Transactions
– Remote transaction is sent to B and executed there
– Transaction can reference only one remote DP
– Each SQL statement can reference only one remote DP and the entire transaction can reference and be executed at only one remote DP
39
Distributed Requests and Distributed Transactions
– The transaction as a whole can reference multiple DP sites because each request can reference a different site
40
Distributed Requests and Distributed Transactions
– Fragmentation transparency: reference one or more of those fragments with only one request
41
Distributed Requests and Distributed Transactions
C2 located at sites B and C
42
Distributed Concurrency Control
inconsistencies and deadlocked transactions
commit the data at each local DP
– The third DP cannot commit the transaction but the first two sites cannot be rolled back since they were committed. This results in an inconsistent database
43
44
Two-Phase Commit Protocol
committed their parts of transaction
– Requires that each DP’s transaction log entry be written before database fragment updated
45
Two-Phase Commit Protocol • DO-UNDO-REDO protocol with write-ahead protocol
– DO performs the operation and records the “before” and “after” values in the transaction log
– UNDO reverses an operation using the log entries written by the DO portion of the sequence
– REDO redoes an operation, using the log entries written by the DO portion
46
Two-Phase Commit Protocol
message to all subordinates • The subordinates receive the message, write the
transaction log using the write-ahead protocol and send an acknowledgement message (YES/PREPARED TO COMMIT or NO/NOT PREAPRED ) to the coordinator
47
Two-Phase Commit Protocol
subordinates and waits for replies
– Each subordinate receives the COMMIT and then updates the database using the DO protocol
– The subordinates replay with a COMMITTED or NOT COMMITTED message to the coordinator
– If one or more subordinates do not commit, the coordinator sends an ABORT message and the subordinates UNDO all changes
48
Performance and Failure Transparency • Performance transparency
– Allows a DDBMS to perform as if it were a centralized database; no performance degradation
network failure
a request (CPU, communication, I/O)
49
Performance and Failure Transparency • In a DDBMS, transactions are distributed among
multiple nodes. Determining what data are being used becomes more complex – Data distribution: determine which fragment to access,
create multiple data requests to the chosen DPs, combine the responses and present the data to the application
– Data Replication: data may be replicated at several different sites making the access problem even more complex as all copies must be consistent
50
Performance and Failure Transparency • Network and node availability
– The response time associated with remote sites cannot be easily predetermined because some nodes finish their part of the query in less time than others and network path performance varies because of bandwidth and traffic loads
– The DDBMS must consider • Network latency
– Delay imposed by the amount of time required for a data packet to make a round trip from point A to point B
– Delay imposed when nodes become suddenly unavailable due to a network failure
51
Distributed Database Design
Database Systems, 10th Edition 52
Data Fragmentation
53
Data Fragmentation Strategies • Horizontal fragmentation
– Division of a relation into subsets (fragments) of tuples (rows)
– Each fragment is stored at a different node and each fragment has unique rows
– Division of a relation into attribute (column) subsets
– Each fragment is stored at a different node and each fragment has unique columns with the exception of the key column which is common to all fragments
54
Data Fragmentation Strategies
55
Data Fragmentation Strategies • Vertical fragmentation based on use by service and
collections departments • Both require the same key column and have the same
number of rows
56
Data Fragmentation Strategies • Mixed fragmentation based on location as well as use by
service and collections departments
57
Data Replication • Data copies stored at multiple sites served by
computer network • Fragment copies stored at several sites to serve
specific information requirements – Enhance data availability and response time
– Reduce communication and total query costs
58
Data Replication • Styles of replication
– Push replication: after a data update, the originating DP node sends the changes to the replica nodes to ensure that data are immediately updated
– Pull replication: after a data update, the originating DP sends “messages” to the replica nodes to notify them of a change. The replica nodes decide when to apply the updates to their local fragment
59
Data Replication • Fully replicated database
– Stores multiple copies of each database fragment at multiple sites
– Can be impractical due to amount of overhead • Partially replicated database
– Stores multiple copies of some database fragments at multiple sites
– Cost: performance, overhead 60
Data Allocation
database is fragmented or divided
– Centralized data allocation • Entire database is stored at one site
– Partitioned data allocation • Database is divided into several disjointed parts
(fragments) and stored at several sites
– Replicated data allocation • Copies of one or more database fragments are
stored at several sites
61
The CAP Theorem • Initials CAP stand for three desirable properties
– Consistency – Availability – Partition tolerance (similar to failure transparency)
through the system until all replicas are eventually consistent
62
Database Systems, 10th Edition 63
64