garbage collection
play

Garbage Collection COMP 520: Compiler Design (4 credits) Professor - PowerPoint PPT Presentation

COMP 520 Winter 2016 Garbage Collection (1) Garbage Collection COMP 520: Compiler Design (4 credits) Professor Laurie Hendren, hendren@cs.mcgill.ca q q q q q q q q q q q q q q q q q q


  1. COMP 520 Winter 2016 Garbage Collection (1) Garbage Collection COMP 520: Compiler Design (4 credits) Professor Laurie Hendren, hendren@cs.mcgill.ca ✲ ✛ ✲ q ✲ q q q q q q q q ✲ ✛ ✛ q ✲ ✛ q q q q q q q q ✛ ✛ ✲ ✛ q q q q ✲ ✛ ✲ q ✛ q q q q WendyTheWhitespace-IntolerantDragon ✛ ✛ ✛ q q WendyTheWhitespacenogarDtnarelotnI q q q q ✲ ✲ ✲ ✲ ✛ q q q q

  2. COMP 520 Winter 2016 Garbage Collection (2) A garbage collector is part of the run-time system: it reclaims heap-allocated records that are no longer used. A garbage collector should: • reclaim all unused records; • spend very little time per record; • not cause significant delays; and • allow all of memory to be used. These are difficult and often conflicting requirements.

  3. COMP 520 Winter 2016 Garbage Collection (3) Life without garbage collection: MB 31 30 • unused records must be explicitly deal- 29 28 27 located; 26 25 24 23 22 • superior if done correctly; 21 20 19 18 • but it is easy to miss some records; and 17 16 15 14 • it is dangerous to handle pointers. 13 12 11 10 9 8 7 6 Memory leaks in real life ( ical v.2.1 ): 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 hours

  4. COMP 520 Winter 2016 Garbage Collection (4) Which records are dead , i.e. no longer in use? Ideally, records that will never be accessed in the future execution of the program. But that is of course undecidable... Basic conservative assumption: A record is live if it is reachable from a stack-based program variable, otherwise dead. Dead records may still be pointed to by other dead records.

  5. COMP 520 Winter 2016 Garbage Collection (5) 12 ✛ r p r r q 37 ✲ 15 r r r r ✛ r 7 ✲ 37 ✛ A heap with live and dead records: r r 59 ✛ r r r 9 ✲ ✲ 20 r r

  6. COMP 520 Winter 2016 Garbage Collection (6) The mark-and-sweep algorithm: • explore pointers starting from the program variables, and mark all records encountered; • sweep through all records in the heap and reclaim the unmarked ones; also • unmark all marked records. Assumptions: • we know the size of each record; • we know which fields are pointers; and • reclaimed records are kept in a freelist .

  7. COMP 520 Winter 2016 Garbage Collection (7) Pseudo code for mark-and-sweep: function Mark() for each program variable v do DFS( v ) function DFS( x ) if x is a pointer into the heap then function Sweep() if record x is not marked then p := first address in heap mark record x while p < last address in heap do for i := 1 to | x | do if record p is marked then DFS( x.f i ) unmark record p else p.f 1 := freelist freelist := p p := p +sizeof(record p )

  8. COMP 520 Winter 2016 Garbage Collection (8) Marking and sweeping: ✛ ✛ 12 12 r r p r r p r q 37 q 37 ✲ ✲ 15 15 r r r r r r r ✛ ✛ r r 7 7 ✲ ✛ ✲ ✛ 37 37 r r r r ✛ ✛ 59 59 r r r r ✲ r r r ✲ freelist 9 9 ✲ ✲ 20 20 r r r r

  9. COMP 520 Winter 2016 Garbage Collection (9) Analysis of mark-and-sweep: • assume the heap has size H words; and • assume that R words are reachable. The cost of garbage collection is: c 1 R + c 2 H Realistic values are: 10 R + 3 H The cost per reclaimed word is: c 1 R + c 2 H H − R • if R is close to H , then this is expensive; • the lower bound is c 2 ; • increase the heap when R > 0 . 5 H ; then • the cost per word is c 1 + 2 c 2 ≈ 16 .

  10. COMP 520 Winter 2016 Garbage Collection (10) Other relevant issues: • The DFS recursion stack could have size H (and has at least size log H ), which may be too much; however, the recursion stack can cleverly be embedded in the fields of marked records (pointer reversal). • Records can be kept sorted by sizes in the freelist . Records may be split into smaller pieces if necessary. • The heap may become fragmented : containing many small free records but none that are large enough.

  11. COMP 520 Winter 2016 Garbage Collection (11) The reference counting algorithm: • maintain a counter of the references to each record; • for each assignment, update the counters appropriately; and • a record is dead when its counter is zero. Advantages: • is simple and attractive; • catches dead records immediately; and • does not cause long pauses. Disadvantages: • cannot detect cycles of dead records; and • is much too expensive.

  12. COMP 520 Winter 2016 Garbage Collection (12) Pseudo code for reference counting: function Increment( x ) function PutOnFreelist( x ) x .count := x .count +1 Decrement( x.f 1 ) x.f 1 := freelist function Decrement( x ) freelist := x x .count := x .count − 1 if x .count=0 then function RemoveFromFreelist( x ) for i := 2 to | x | do PutOnFreelist( x ) Decrement( x.f i )

  13. COMP 520 Winter 2016 Garbage Collection (13) The stop-and-copy algorithm: • divide the heap into two parts; • only use one part at a time; • when it runs full, copy live records to the other part; and • switch the roles of the two parts. Advantages: • allows fast allocation (no freelist ); • avoids fragmentation; • collects in time proportional to R ; and • avoids stack and pointer reversal. Disadvantage: • wastes half your memory.

  14. COMP 520 Winter 2016 Garbage Collection (14) Before and after stop-and-copy: ✲ q q q q q q ✛ q q q q q ✲ q q q q ✛ q q ✲ q q next ✛ q q ✛ q q q limit ✛ ✛ from-space to-space to-space from-space next limit • next and limit indicate the available heap space; and • copied records are contiguous in memory.

  15. COMP 520 Winter 2016 Garbage Collection (15) Pseudo code for stop-and-copy: function Forward( p ) if p ∈ from-space then if p.f 1 ∈ to-space then function Copy() return p.f 1 scan := next := start of to-space for each program variable v do else for i := 1 to | p | do v := Forward( v ) next . f i := p.f i while scan < next do for i := 1 to | scan | do p.f 1 := next next := next + sizeof(record p ) scan .f i := Forward( scan .f i ) return p.f 1 scan := scan + sizeof(record scan ) else return p

  16. COMP 520 Winter 2016 Garbage Collection (16) Snapshots of stop-and-copy: ✲ 12 15 ✛ ✲ q ✲ q q q p p q q q q q 37 q q 37 ✲ ✛ 15 37 ✛ q ✲ ✛ scan r r q q q q q q q q 12 ✛ ✛ ✲ ✛ q q 7 7 q q ✲ ✛ 37 ✲ q next ✛ q q q q ✛ 59 ✛ 59 ✛ q q q q q q 9 9 ✲ ✲ ✲ 20 ✲ 20 ✛ q q q q before after forwarding p and q and scanning 1 record

  17. COMP 520 Winter 2016 Garbage Collection (17) Analysis of stop-and-copy: • assume the heap has size H words; and • assume that R words are reachable. The cost of garbage collection is: c 3 R A realistic value is: 10 R The cost per reclaimed word is: c 3 R H 2 − R • this has no lower bound as H grows; • if H = 4 R then the cost is c 3 ≈ 10 .

  18. COMP 520 Winter 2016 Garbage Collection (18) Earlier assumptions: • we know the size of each record; and • we know which fields are pointers. For object-oriented languages, each record already contains a pointer to a class descriptor. For general languages, we must sacrifice a few bytes per record.

  19. COMP 520 Winter 2016 Garbage Collection (19) We use mark-and-sweep or stop-and-copy. But garbage collection is still expensive: ≈ 100 instructions for a small object! Each algorithm can be further extended by: • generational collection (to make it run faster); and • incremental (or concurrent) collection (to make it run smoother).

  20. COMP 520 Winter 2016 Garbage Collection (20) Generational collection: • observation: the young die quickly; • hence the collector should focus on young records; • divide the heap into generations: G 0 , G 1 , G 2 , . . . ; • all records in G i are younger than records in G i +1 ; • collect G 0 often, G 1 less often, and so on; and • promote a record from G i to G i +1 when it survives several collections.

  21. COMP 520 Winter 2016 Garbage Collection (21) How to collect the G 0 generation: • it might be very expensive to find those pointers; • fortunately, they are rare; so • we can try to remember them. Ways to remember: • maintain a list of all updated records (use marks to make this a set); or • mark pages of memory that contain updated records (in hardware or software).

  22. COMP 520 Winter 2016 Garbage Collection (22) Incremental collection: • garbage collection may cause long pauses; • this is undesirable for interactive or real-time programs; so • try to interleave the garbage collection with the program execution. Two players access the heap: • the mutator : creates records and moves pointers around; and • the collector : tries to collect garbage. Some invariants are clearly required to make this work. The mutator will suffer some slowdown to maintain these invariants.

Recommend


More recommend