changelog
play

Changelog Changes not seen in fjrst lecture: 19 March 2020: move - PowerPoint PPT Presentation

Changelog Changes not seen in fjrst lecture: 19 March 2020: move page usage slides later 19 March 2020: adjust PF counting exercise to specify addreses, not ofgsets 19 March 2020: Linux maps: correct shown mmap call for 0x400000 0 virtual


  1. MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 16 / bin / cat [ heap ] / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  2. MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 16 / bin / cat [ heap ] / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  3. MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 16 / bin / cat [ heap ] / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  4. MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 16 / bin / cat [ heap ] / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  5. MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 16 / bin / cat [ heap ] / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  6. MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 17 / bin / cat [ heap ] / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  7. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 18

  8. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 18

  9. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 18

  10. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 18

  11. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 18

  12. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 18

  13. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 18

  14. shared mmap int fd = open("/tmp/somefile.dat", O_RDWR); MAP_SHARED, fd, 0); from /proc/PID/maps for this program: 7f93ad877000-7f93ad887000 rw-s 00000000 08:01 1839758 /tmp/somefile.dat 19 mmap(0, 64 * 1024, PROT_READ | PROT_WRITE,

  15. mapped pages (read/write, shared) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk 20

  16. mapped pages (read/write, shared) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk 20

  17. mapped pages (read/write, shared) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk 20

  18. mapped pages (read/write, shared) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk 20

  19. minor and major faults minor page fault page is already in memory (“page cache”) just fjll in page table entry major page fault page not already in memory (“page cache”) need to allocate space possibly need to read data from disk/etc. 21

  20. Linux: reporting minor/major faults Major (requiring I/O) page faults: 0 Exit status: 0 ... Swaps: 0 Involuntary context switches: 53 Voluntary context switches: 1423 Minor (reclaiming a frame) page faults: 230166 Average resident set size (kbytes): 0 $ /usr/bin/time --verbose some-command Maximum resident set size (kbytes): 749820 ... Elapsed (wall clock) time (h:mm:ss or m:ss): 0:19.57 Percent of CPU this job got: 94% System time (seconds): 0.35 User time (seconds): 18.15 Command being timed: "some-command" 22

  21. MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 23 / bin / cat [ heap ] / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  22. protection fault mapped pages (copy-on-write) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed 24

  23. mapped pages (copy-on-write) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed 24 protection fault

  24. protection fault mapped pages (copy-on-write) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed 24

  25. protection fault mapped pages (copy-on-write) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed 24

  26. maps counting 4KB ( 0x1000 byte) pages 0-0x0FFFF map setup private (copy-on-write) bytes 0-0x3FFF and 0x5000-0x6FFF cached in memory program reads addresses 0x13800 – 0x15800 then, program overwrites addresses 0x14800 – 0x15100 assume: program page table fjlled in on demand only smarter OS would probably proactively fjll in multiple pages question: how much page/protection faults? 1: set PTE for ofgset 0x3000-0x3FFF (use cached version) 2,3: read from disk + set PTE for 0x4000-0x4FFF; set PTE for 0x5000-0x5FFF 4,5: copy for 0x4000-0x4FFF, 0x5000-0x5FFF 25 virtual 0x10000-0x1FFFF (64KB) → “foo.dat” bytes

  27. maps counting 4KB ( 0x1000 byte) pages 0-0x0FFFF map setup private (copy-on-write) bytes 0-0x3FFF and 0x5000-0x6FFF cached in memory program reads addresses 0x13800 – 0x15800 then, program overwrites addresses 0x14800 – 0x15100 assume: program page table fjlled in on demand only smarter OS would probably proactively fjll in multiple pages question: how much page/protection faults? 1: set PTE for ofgset 0x3000-0x3FFF (use cached version) 2,3: read from disk + set PTE for 0x4000-0x4FFF; set PTE for 0x5000-0x5FFF 4,5: copy for 0x4000-0x4FFF, 0x5000-0x5FFF 25 virtual 0x10000-0x1FFFF (64KB) → “foo.dat” bytes

  28. maps counting 4KB ( 0x1000 byte) pages 0-0x0FFFF map setup private (copy-on-write) bytes 0-0x3FFF and 0x5000-0x6FFF cached in memory program reads addresses 0x13800 – 0x15800 then, program overwrites addresses 0x14800 – 0x15100 assume: program page table fjlled in on demand only smarter OS would probably proactively fjll in multiple pages question: how much page/protection faults? 1: set PTE for ofgset 0x3000-0x3FFF (use cached version) 2,3: read from disk + set PTE for 0x4000-0x4FFF; set PTE for 0x5000-0x5FFF 4,5: copy for 0x4000-0x4FFF, 0x5000-0x5FFF 26 virtual 0x10000-0x1FFFF (64KB) → “foo.dat” bytes

  29. MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 27 / bin / cat [ heap ] / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  30. Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 28 [ heap ] / bin / cat / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  31. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory data on disk (if any) “swapped out” access new page page fault handler allocates on demand need more memory? save page to disk AKA “swap out” data in memory data in memory 29

  32. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory data on disk (if any) “swapped out” access new page page fault handler allocates on demand need more memory? save page to disk AKA “swap out” data in memory data in memory 29

  33. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory data on disk (if any) “swapped out” access new page page fault handler allocates on demand need more memory? save page to disk AKA “swap out” data in memory data in memory 29

  34. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory data on disk (if any) “swapped out” access new page page fault handler allocates on demand need more memory? save page to disk AKA “swap out” data in memory data in memory 29

  35. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory data on disk (if any) “swapped out” access new page page fault handler allocates on demand need more memory? save page to disk AKA “swap out” data in memory data in memory 29

  36. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory data on disk (if any) “swapped out” access new page page fault handler allocates on demand need more memory? save page to disk AKA “swap out” data in memory data in memory 29

  37. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory data on disk (if any) “swapped out” access new page page fault handler allocates on demand need more memory? save page to disk AKA “swap out” data in memory data in memory 29

  38. MAP_PRIVATE | MAP_ANONYMOUS /* = no file */ , ...); Linux maps allocated using sbrk() [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle but can get same efgect with mmap() call $ cat / proc / self / maps read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); (aside: probably used for global variables) as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0xb000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0x0); as if: mmap(..., 0x5000, PROT_READ | PROT_WRITE, [ stack ] 30 / bin / cat [ heap ] / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7f60c7854000 − 7f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  39. can move copied data to disk swapping with copy-on-write virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? “swapped out” modifjed data ‘swapped out’ modifjed data 31

  40. can move copied data to disk swapping with copy-on-write virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? “swapped out” modifjed data ‘swapped out’ modifjed data 31

  41. swapping with copy-on-write virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? “swapped out” modifjed data ‘swapped out’ modifjed data 31 can move copied data to disk

  42. can move copied data to disk swapping with copy-on-write virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? “swapped out” modifjed data ‘swapped out’ modifjed data 31

  43. swapping historical major use of virtual memory is supporting “swapping” using disk (or SSD, …) as the next level of the memory hierarchy process is allocated space on disk/SSD memory is a cache for disk/SSD only need keep ‘currently active’ pages in physical memory swapping mmap with “default” fjles to use 32

  44. swapping historical major use of virtual memory is supporting “swapping” using disk (or SSD, …) as the next level of the memory hierarchy process is allocated space on disk/SSD memory is a cache for disk/SSD only need keep ‘currently active’ pages in physical memory 32 swapping ≈ mmap with “default” fjles to use

  45. HDD/SDDs are slow HDD reads and writes: milliseconds to tens of milliseconds minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes SSD writes and writes: hundreds of microseconds designed for writes/reads of kilobytes (not much smaller) page fault handler is going switch to another program 33

  46. HDD/SDDs are slow HDD reads and writes: milliseconds to tens of milliseconds minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes designed for writes/reads of kilobytes (not much smaller) page fault handler is going switch to another program 33 SSD writes and writes: hundreds of microseconds

  47. HDD/SDDs are slow HDD reads and writes: milliseconds to tens of milliseconds minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes SSD writes and writes: hundreds of microseconds page fault handler is going switch to another program 33 designed for writes/reads of kilobytes (not much smaller)

  48. the page cache memory is a cache for disk fjles and program memory has a place on disk running low on memory? always have room on disk assumption: disk space approximately infjnite physical memory pages: disk ‘temporarily’ kept in faster storage possibly being used by one or more processes? possibly part of a fjle on disk being read/written? possibly both goal: manage this cache intelligently 34

  49. the page cache memory is a cache for disk fjles and program memory has a place on disk running low on memory? always have room on disk assumption: disk space approximately infjnite physical memory pages: disk ‘temporarily’ kept in faster storage possibly being used by one or more processes? possibly part of a fjle on disk being read/written? possibly both goal: manage this cache intelligently 34

  50. the page cache memory is a cache for disk fjles and program memory has a place on disk running low on memory? always have room on disk assumption: disk space approximately infjnite physical memory pages: disk ‘temporarily’ kept in faster storage possibly being used by one or more processes? possibly part of a fjle on disk being read/written? possibly both goal: manage this cache intelligently 34

  51. the page cache memory is a cache for disk fjles and program memory has a place on disk running low on memory? always have room on disk assumption: disk space approximately infjnite physical memory pages: disk ‘temporarily’ kept in faster storage possibly being used by one or more processes? possibly part of a fjle on disk being read/written? possibly both goal: manage this cache intelligently 34

  52. page cache components [text] handle cache hits fjnd backing location based on virtual address/fjle+ofgset handle cache misses track information about each physical page handle page allocation handle cache eviction 35 mapping: virtual address or fjle+ofgset → physical page

  53. page cache components (recently used? etc.) pointers to remove need reverse mappings to fjnd requires removing pointers to it might need to evict used page choose page that’s not being used much allocating a physical page cache miss : OS looks up location on disk CPU lookup in page table OS lookup for read()/write() cache hit page usage virtual address OS datastructure OS datastructure? OS datastructure page table OS datastructure disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) 37

  54. page cache components (recently used? etc.) pointers to remove need reverse mappings to fjnd requires removing pointers to it might need to evict used page choose page that’s not being used much allocating a physical page cache miss : OS looks up location on disk CPU lookup in page table OS lookup for read()/write() cache hit page usage virtual address OS datastructure OS datastructure? OS datastructure page table OS datastructure disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) 38

  55. virtual addr/fjle ofgset to physical page for cache hit on memory access multiple designs; one idea: balanced tree (or page fault for mmap’d memory) for cache hit on read/write kernel data structure OS datastructure structure determined by hardware! page table virtual address disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) 39

  56. virtual addr/fjle ofgset to physical page for cache hit on memory access multiple designs; one idea: balanced tree (or page fault for mmap’d memory) for cache hit on read/write kernel data structure OS datastructure structure determined by hardware! page table virtual address disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) 39

  57. virtual addr/fjle ofgset to physical page for cache hit on memory access multiple designs; one idea: balanced tree (or page fault for mmap’d memory) for cache hit on read/write kernel data structure OS datastructure structure determined by hardware! page table virtual address disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) 39

  58. Linux: forward mapping process control block ( task_struct ) mmap region info ( vm_area_struct ) open fjle info ( struct file ) fjle on disk info ( struct inode ) cached physical pages for fjle ( address_space ) page table used to fjll (for mmap) read()/write() 40

  59. Linux: forward mapping process control block ( task_struct ) mmap region info ( vm_area_struct ) open fjle info ( struct file ) fjle on disk info ( struct inode ) cached physical pages for fjle ( address_space ) page table used to fjll (for mmap) read()/write() 41

  60. Linux: forward mapping process control block ( task_struct ) mmap region info ( vm_area_struct ) open fjle info ( struct file ) fjle on disk info ( struct inode ) cached physical pages for fjle ( address_space ) page table used to fjll (for mmap) read()/write() 42

  61. Linux: forward mapping process control block ( task_struct ) mmap region info ( vm_area_struct ) open fjle info ( struct file ) fjle on disk info ( struct inode ) cached physical pages for fjle ( address_space ) page table used to fjll (for mmap) read()/write() 43

  62. Linux: forward mapping process control block ( task_struct ) mmap region info ( vm_area_struct ) open fjle info ( struct file ) fjle on disk info ( struct inode ) cached physical pages for fjle ( address_space ) page table used to fjll (for mmap) read()/write() 44

  63. mapped pages (read/write, shared) fjle data, cached in memory fjle data on disk/SSD 45

  64. page replacement step 1: evict a page to free a physical page case 1: there’s an unused page, just use that (easy) case 2: need to remove whatever what’s in that page (more work) step 2: load new, more important in its place needs some way of knowing location of data 47

  65. page replacement step 1: evict a page to free a physical page case 1: there’s an unused page, just use that (easy) case 2: need to remove whatever what’s in that page (more work) step 2: load new, more important in its place needs some way of knowing location of data 48

  66. page cache components (recently used? etc.) pointers to remove need reverse mappings to fjnd requires removing pointers to it might need to evict used page choose page that’s not being used much allocating a physical page cache miss : OS looks up location on disk CPU lookup in page table OS lookup for read()/write() cache hit page usage virtual address OS datastructure OS datastructure? OS datastructure page table OS datastructure disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) 49

  67. 50 OS datastructure swapped out non-fjle: trick: unused PTEs part of fjle: track mmap ‘regions’ (Linux) based on fjlesystem — later topic OS datastructure OS datastructure page table virtual address disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) virtual address/fjle ofgset → location on disk

  68. 50 OS datastructure swapped out non-fjle: trick: unused PTEs part of fjle: track mmap ‘regions’ (Linux) based on fjlesystem — later topic OS datastructure OS datastructure page table virtual address disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) virtual address/fjle ofgset → location on disk

  69. 50 OS datastructure swapped out non-fjle: trick: unused PTEs part of fjle: track mmap ‘regions’ (Linux) based on fjlesystem — later topic OS datastructure OS datastructure page table virtual address disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) virtual address/fjle ofgset → location on disk

  70. 51 OS datastructure swapped out non-fjle: trick: unused PTEs part of fjle: track mmap ‘regions’ (Linux) based on fjlesystem — later topic OS datastructure OS datastructure page table virtual address disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) virtual address/fjle ofgset → location on disk

  71. Linux maps: list of maps $ cat / proc / self / maps … info about sharing of non-fjle data (e.g. heap after fork) (not shown): pointer to backing fjle (if any) ofgset in backing fjle (if any) permissions virtual address start, end (shown in this output): PCB contains list of struct vm_area_struct with: [ vdso ] [ vvar ] [ stack ] 52 [ heap ] / bin / cat / bin / cat / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060 b000 − 0060 c000 rw − p 0000 b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  72. 53 OS datastructure swapped out non-fjle: trick: unused PTEs part of fjle: track mmap ‘regions’ (Linux) based on fjlesystem — later topic OS datastructure OS datastructure page table virtual address disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) virtual address/fjle ofgset → location on disk

  73. Linux: tracking swapped out pages need to lookup location on disk potentially one location for every virtual page trick: store location in “ignored” part of page table entry instead of physical page #, permission bits, etc., store ofgset on disk 54

  74. page replacement step 1: evict a page to free a physical page case 1: there’s an unused page, just use that (easy) case 2: need to remove whatever what’s in that page (more work) step 2: load new, more important in its place needs some way of knowing location of data 55

  75. evicting a page remove victim page from page table, etc. every page table it is referenced by every list of fjle pages … if needed, save victim page to disk going to require: way to fjnd page tables, etc. using page way to detect whether it needs to be saved to disk 56

  76. page cache components (recently used? etc.) pointers to remove need reverse mappings to fjnd requires removing pointers to it might need to evict used page choose page that’s not being used much allocating a physical page cache miss : OS looks up location on disk CPU lookup in page table OS lookup for read()/write() cache hit page usage virtual address OS datastructure OS datastructure? OS datastructure page table OS datastructure disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) 57

  77. page cache components (recently used? etc.) pointers to remove need reverse mappings to fjnd requires removing pointers to it might need to evict used page choose page that’s not being used much allocating a physical page cache miss : OS looks up location on disk CPU lookup in page table OS lookup for read()/write() cache hit page usage virtual address OS datastructure OS datastructure? OS datastructure page table OS datastructure disk location (if cached) physical page (for read()/write()) fjle + ofgset (used by program) 57

  78. tracking physical pages: fjnding mappings want to evict a page? remove from page tables, etc. need to track where every page is used! common solution: structure for every physical page with info about every cached fjle/page table using page 58

  79. Linux: reverse mapping (fjle pages) page table (e.g. to remove/change them) fjnd references to that page given page number page number ( struct page ) per-physical page info ( address_space ) process control block ( task_struct ) cached physical pages for fjle ( struct inode ) fjle on disk info ( struct file ) open fjle info ( vm_area_struct ) mmap region info 59

Recommend


More recommend