Data Management Systems • Transaction Processing • Concurrency control and recovery 2 Phase Locking • Transactions Deadlocks • Recovery Locking Table Snapshot isolation Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich Transactions - 2PL 1
2 Phase Locking (2PL) Transactions - 2PL 2
2PL Background • 2 Phase Locking • Widely used since many decades • The canonical implementation of concurrency control • With important implications for recovery • Although alternatives exist (snapshot isolation), the underlying locking mechanisms are still widely used even in those alternative systems • How locking is implemented has a huge impact on performance and scalability. Transactions - 2PL 3
Locking in databases • Concurrency control in databases is implemented using locking • Generally using a lock table (see later) • Locks cover tuples, tables, indexes, blocks … • Locks can have different semantics that indicate whether they can be shared or are exclusive: • Compatibility Matrix NL S X • Shared lock (read) - S • Exclusive lock (write) request • Other modes possible - - X Transactions - 2PL 4
2 Phase Locking Protocol 1. Before accessing an object, a transaction must acquire lock. 2. A transaction acquires a lock only once. Lock upgrades are possible. 3. A transaction is blocked if the lock request cannot be granted according to the compatibility matrix. 4. A transaction goes through two phases: 1. Growth: Acquire locks, but never release a lock. 2. Shrink: Release locks, but never acquire a lock. 5. At EOT (commit or abort) all locks must be released. Transactions - 2PL 5
Example Transactions - 2PL 6
Why two phases? • 2PL prevents histories where a transaction must be aborted due to concurrency control issues: • An operation is not allowed unless we are sure it is safe • Without the two phases, the following can happen: • … L 1 [x]w 1 [x]U 1 [x]L 2 [x]w 2 [x]L 2 [y]w 2 [y]U 2 [y] … • If T1 now accesses y for read or write, it can do so, creating a non serializable history • This cannot happen if the two phase rule is enforced: if transaction T 2 gets a lock from another transaction T 1 , T1 cannot get a lock from T 2 since the two phase rule would be violated (T 2 would be acquiring a lock after it has released a lock) Transactions - 2PL 7
2PL results in conflict serializable histories • Locks prevent from two transactions modifying the same item at the same time • To do an operation, a lock must be held • To get a lock, it must be free or shared • This guarantees the atomicity of single operations • The two phase rule ensure no cycles can occur in the conflict graph • A cycle implies a circular dependency created by conflicting operations • T 1 -> T 2 implies T 1 released a lock before T 2 got it • For a cycle to exist, T 2 must release a lock before T 1 acquires it • That would violate the two phase rule Transactions - 2PL 8
2PL and SQL isolation levels • Uncommitted read • Typically only allowed for read-only transactions (see later) • Implemented by not acquiring a lock before reading • Read committed • Keep write locks until end => read committed • Get a short lived lock for reading (release after reading) => non repeatable read • Repeatable Read • Keep write locks until end • Keep read locks until end => what is read will not change • Serializable • Like repeatable read • Read lock on table to prevent phantoms (or cursor locking) Transactions - 2PL 9
What about recovery? • Locking affects recovery in terms of deciding what to do when a transaction aborts • Basic 2PL does not say anything about aborting or committing transactions: • … L 1 [x]w 1 [x]U 1 [x]L 2 [x]w 2 [x]L 2 [y]w 2 [y]U 2 [y] … • In principle, it is serializable • If T 1 aborts, recovery using before images would not work as undoing T 1 would undo the changes done by T 2 (one could argue that there is no need to recover anything as x has been rewritten, but if T 2 aborts, then T 2 cannot be recovered by restoring a before image) Transactions - 2PL 10
Strict 2PL • Strict 2PL combines 2PL with Strict histories: • Enforce 2PL • Keep read and write locks until the transaction commits or aborts • Strict 2PL results in serializable and strict histories • 2PL ensures serializability • Keeping all locks to the end ensures no uncommitted data can be read or over-written • All engines implement Strict 2PL (research papers sometimes propose to weaken the rule but it is neither a good nor practical idea) Transactions - 2PL 11
Deadlocks Step T 1 T 2 Comment 1. BOT 2. lockX ( A ) 3. BOT 4. lockS (B) 5. read(B) 6. read(A) 7. write(A) 8. lockX (B) T 1 must wait for T 2 9. lockS (A) T 2 must wait for T 1 Deadlock 10. ... ...
Deadlock Detection: in theory Wait-for Graph T 1 T 4 • T 1 T 2 T 3 T 4 T 1 • Aborting T 3 will resolve the cycle T 2 T 3 • Building the graph is expensive in systems running many concurrent transactions
Deadlock detection: in practice • In practice, no wait-for-graph is built: • Transactions/queries have a timer • If the timer expires, the transaction/query is aborted • Deadlocks are considered rare enough to prefer making mistakes every now and then (aborting a transactions that was fine) than implementing something as expensive as a wait for graph • This approach has many implications in practice: • A long transaction might block queries • The queries might abort because their timer expires while waiting for the transaction • Badly written applications can create such scenarios Transactions - 2PL 14
Locking Table Transactions - 2PL 15
Transaction Manager T 1 T 2 T 3 ...... T n • A database engine uses a Transaction-Manager TM transaction manager to enforce Scheduler concurrency control • Implemented in many different Data-Manager forms Recovery-Manager • Deeply connected with the Buffer-Manager recovery manager and I/O system • We will cover the classical implementation Storage System Transactions - 2PL 16
Basic functions of a Transaction Manager Transaction Lock List Lock Table w[x] Lock Hash Table w[y] Lock Header X Lock Lock T i Transaction Table Request T j Request T i handler hash Lock Header Y Lock Request T i Log Log Record Log Record …….. …….. …….. T i , w[x] T i , w[y] Transactions - 2PL 17
Basic transaction Manager • Transaction table: list of active transactions in the system, generally maintained by the engine in the common area • Transaction Handler: pointer to the structures containing al the relevant information related to a transaction (these data structures could be in the private area of a session) • Lock Table: a hash table containing entries that correspond to active locks. Locks on the same item are captured as a linked list • Log: entries that capture the operation performed and are kept in memory until it is time to write them to disk Transactions - 2PL 18
Basic operations transaction manager • Begin Transaction: • create an entry in the transaction table (no log entry is created unless explicitly requested) • Read/write operation: • hash the tuple id to find the corresponding entry in the lock table • If empty, lock is granted • If there is a list, attached request at the end of the list (grant request is earlier requests are compatible) • Write operation: • Create a log entry (with Log Sequence Number (LSN), before and/or after image, transaction id, and pointers (LSN) to the previous log entry of the same transaction) Transactions - 2PL 19
Basic operations transaction manager • Commit transaction: • Release locks using the transaction lock list • Resume transactions waiting for the locks now released • Finalize log entries • Write log entries to storage • May write actual data modified to storage • Abort transaction • Release locks using the transactional lock list • Resume transactions waiting for the locks now released • Use log entries to undo changes (or discard changes) • May write log entries to storage Transactions - 2PL 20
Real systems • Actual database engines implement this basic design in many different ways: • Often, there is an actual lock table similar to the one described • Sometimes, no lock table (e.g., Oracle): locks stored in the blocks containing the tuples (makes sense instead of using the tuple id (block id, offset) to hash to the lock table, use the corresponding block as the entry in the lock table for all tuples in that block => no need to keep a separate table) • Log records can take many forms (redo, undo, undo/redo, etc.; see later) • Lock entries can also vary in their structure • Recall the headers in the blocks: also used to implement some of these data structures, including information about what is locked Transactions - 2PL 21
Implementation aspects Transactions - 2PL 22
What is a lock? What to lock? • We have so far studied locks in a limited context • Read/write, Shared/Exclusive • In reality, many more modes. • Read/write locks are too coarse and may result in more conflicts than necessary • Some type of locks (table locks) are too big (lock many tuples) and having more information helps to reduce conflicts • What lock modes are supported and what can be locked is system specific Transactions - 2PL 23
Recommend
More recommend