virtual memory 2
play

Virtual Memory 2 1 Changelog Changes made in this version not seen - PowerPoint PPT Presentation

Virtual Memory 2 1 Changelog Changes made in this version not seen in fjrst lecture: 23 October: mapped pages (no backing fjle): fjx end of animation to have page on disk 23 October: separate out discussion of readahead from other reasons why


  1. xv6: adding space on demand struct proc { uint sz; // Size of process memory (bytes) ... }; adding allocate on demand logic: kill process — out of bounds fjnd virtual page number of address allocate page of memory, add to page table return from interrupt 15 on page fault: if address > sz on page fault: if address ≤ sz

  2. versus more complicated OSes range of valid addresses is not just 0 to maximum need some more complicated data structure to represent will get to that later 16

  3. copy-on write cases trying to write forbidden page (e.g. kernel memory) kill program instead of making it writable trying to write read-only page and… only one page table entry refers to it make it writeable return from fault multiple process’s page table entries refer to it copy the page replace read-only page table entry to point to copy return from fault 17

  4. mmap Linux/Unix has a function to “map” a fjle to memory int file = open("somefile.dat", O_RDWR); // data is region of memory that represents file // read byte 6 from somefile.dat char seventh_char = data[6]; // modifies byte 100 of somefile.dat data[100] = 'x'; // can continue to use 'data' like an array 18 char *data = mmap(..., file, 0);

  5. mmap options (1) #include <sys/mman.h> int fd, off_t offset); length bytes from open fjle fd starting at byte offset protection fmags prot , bitwise or together 1 or more of: PROT_READ PROT_WRITE PROT_EXEC PROT_NONE (for forcing segfaults) 19 void *mmap( void *addr, size_t length, int prot, int flags,

  6. mmap options (1) #include <sys/mman.h> int fd, off_t offset); length bytes from open fjle fd starting at byte offset protection fmags prot , bitwise or together 1 or more of: PROT_READ PROT_WRITE PROT_EXEC PROT_NONE (for forcing segfaults) 19 void *mmap( void *addr, size_t length, int prot, int flags,

  7. mmap options (1) #include <sys/mman.h> int fd, off_t offset); length bytes from open fjle fd starting at byte offset protection fmags prot , bitwise or together 1 or more of: PROT_READ PROT_WRITE PROT_EXEC PROT_NONE (for forcing segfaults) 19 void *mmap( void *addr, size_t length, int prot, int flags,

  8. mmap options (2) #include <sys/mman.h> int fd, off_t offset); flags , choose at least MAP_SHARED — changing memory changes fjle and vice-versa MAP_PRIVATE — make a copy of data in fjle (using copy-on-write) …along with additional fmags: MAP_ANONYMOUS (not POSIX) — ignore fd , just allocate space … (and more not shown) addr , suggestion about where to put mapping (may be ignored) can pass NULL — “choose for me” address chosen will be returned 20 void *mmap( void *addr, size_t length, int prot, int flags,

  9. mmap options (2) #include <sys/mman.h> int fd, off_t offset); flags , choose at least MAP_SHARED — changing memory changes fjle and vice-versa MAP_PRIVATE — make a copy of data in fjle (using copy-on-write) …along with additional fmags: MAP_ANONYMOUS (not POSIX) — ignore fd , just allocate space … (and more not shown) addr , suggestion about where to put mapping (may be ignored) can pass NULL — “choose for me” address chosen will be returned 20 void *mmap( void *addr, size_t length, int prot, int flags,

  10. Linux maps more on what this means when we talk about fjlesystems [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 heap — no corresponding fjle $ cat / proc / self / maps just read/write memory read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000); [ stack ] 22 / bin / cat / bin / cat [ heap ] / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  11. Linux maps more on what this means when we talk about fjlesystems [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 heap — no corresponding fjle $ cat / proc / self / maps just read/write memory read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000); [ stack ] 22 / bin / cat / bin / cat [ heap ] / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  12. Linux maps more on what this means when we talk about fjlesystems [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 heap — no corresponding fjle $ cat / proc / self / maps just read/write memory read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000); [ stack ] 22 / bin / cat / bin / cat [ heap ] / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  13. Linux maps more on what this means when we talk about fjlesystems [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 heap — no corresponding fjle $ cat / proc / self / maps just read/write memory read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000); [ stack ] 22 / bin / cat / bin / cat [ heap ] / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  14. Linux maps more on what this means when we talk about fjlesystems [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 heap — no corresponding fjle $ cat / proc / self / maps just read/write memory read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000); [ stack ] 22 / bin / cat / bin / cat [ heap ] / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  15. Linux maps more on what this means when we talk about fjlesystems [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 heap — no corresponding fjle $ cat / proc / self / maps just read/write memory read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000); [ stack ] 23 / bin / cat / bin / cat [ heap ] / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  16. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 24

  17. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 24

  18. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 24

  19. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 24

  20. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 24

  21. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 24

  22. mapped pages (read-only) update page table, retry now point to page PF handler: read in page fjrst read in page PF handler: no cached page page fault read from fjrst page? PF handler: fjnd cached page virtual pages mapped to fjle page fault read from second page? (could also prefjll entries…) initially — all invalid? fjle data on disk/SSD fjle data, cached in memory page table (part) 24

  23. shared mmap int fd = open("/tmp/somefile.dat", O_RDWR); MAP_SHARED, fd, 0); from /proc/PID/maps for this program: 7f93ad877000-7f93ad887000 rw-s 00000000 08:01 1839758 /tmp/somefile.dat 25 mmap(0, 64 * 1024, PROT_READ | PROT_WRITE,

  24. mapped pages (read/write, shared) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk 26

  25. mapped pages (read/write, shared) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk 26

  26. mapped pages (read/write, shared) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk 26

  27. mapped pages (read/write, shared) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk 26

  28. knowing when to write to disk? need a dirty bit per page (“was page modifjed”) D bit on PTEs we’ve seen entry (on write) bit means “physical page was modifjed using this PTE” option 2: OS sets page read-only, fmips read-only+dirty bit on fault 27 x86: kept in the page table! option 1 (most common): hardware sets dirty bit in page table

  29. multiple dirty bits? what if a page is in multiple page tables? each page table has a dirty bit… check all of them to decide if it was modifjed 28

  30. Linux maps more on what this means when we talk about fjlesystems [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 heap — no corresponding fjle $ cat / proc / self / maps just read/write memory read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000); [ stack ] 29 / bin / cat / bin / cat [ heap ] / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  31. protection fault mapped pages (copy-on-write) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed 30

  32. mapped pages (copy-on-write) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed 30 protection fault

  33. protection fault mapped pages (copy-on-write) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed 30

  34. protection fault mapped pages (copy-on-write) virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed 30

  35. Linux maps more on what this means when we talk about fjlesystems [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 heap — no corresponding fjle $ cat / proc / self / maps just read/write memory read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000); [ stack ] 31 / bin / cat / bin / cat [ heap ] / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  36. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory 32

  37. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory 32

  38. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory 32

  39. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory 32

  40. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory 32

  41. mapped pages (no backing fjle) virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory 32

  42. Linux maps more on what this means when we talk about fjlesystems [ vvar ] [ vdso ] at virtual addresses 0x400000 – 0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 heap — no corresponding fjle $ cat / proc / self / maps just read/write memory read/write, copy-on-write (private) mapping int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000); as if: int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000); [ stack ] 33 / bin / cat / bin / cat [ heap ] / bin / cat 00400000 − 0040b000 r − xp 00000000 08:01 48328831 0060 a000 − 0060 b000 r − − p 0000 a000 08:01 48328831 0060b000 − 0060c000 rw − p 0000b000 08:01 48328831 01974000 − 01995000 rw − p 00000000 00:00 0 7 f60c718b000 − 7 f60c7490000 r − − p 00000000 08:01 77483660 / usr / lib / locale / locale − archive 7 f60c7490000 − 7 f60c764e000 r − xp 00000000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c764e000 − 7 f60c784e000 − − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c784e000 − 7 f60c7852000 r − − p 001 be000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7852000 − 7 f60c7854000 rw − p 001 c2000 08:01 96659129 / lib / x86_64 − linux − gnu / libc − 2.19. so 7 f60c7854000 − 7 f60c7859000 rw − p 00000000 00:00 0 7 f60c7859000 − 7 f60c787c000 r − xp 00000000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a39000 − 7 f60c7a3b000 rw − p 00000000 00:00 0 7 f60c7a7a000 − 7 f60c7a7b000 rw − p 00000000 00:00 0 7 f60c7a7b000 − 7 f60c7a7c000 r − − p 00022000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7c000 − 7 f60c7a7d000 rw − p 00023000 08:01 96659109 / lib / x86_64 − linux − gnu / ld − 2.19. so 7 f60c7a7d000 − 7 f60c7a7e000 rw − p 00000000 00:00 0 7 ffc5d2b2000 − 7 ffc5d2d3000 rw − p 00000000 00:00 0 7 ffc5d3b0000 − 7 ffc5d3b3000 r − − p 00000000 00:00 0 7 ffc5d3b3000 − 7 ffc5d3b5000 r − xp 00000000 00:00 0 ffffffffff600000 − ffffffffff601000 r − xp 00000000 00:00 0 [ vsyscall ]

  43. can move copied data to disk swapping with copy-on-write virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? “swapped out” modifjed data ‘swapped out’ modifjed data 34

  44. can move copied data to disk swapping with copy-on-write virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? “swapped out” modifjed data ‘swapped out’ modifjed data 34

  45. swapping with copy-on-write virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? “swapped out” modifjed data ‘swapped out’ modifjed data 34 can move copied data to disk

  46. can move copied data to disk swapping with copy-on-write virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? “swapped out” modifjed data ‘swapped out’ modifjed data 34

  47. swapping historical major use of virtual memory is supporting “swapping” using disk (or SSD, …) as the next level of the memory hierarchy process is allocated space on disk/SSD memory is a cache for disk/SSD only need keep ‘currently active’ pages in physical memory swapping mmap with “default” fjles to use 35

  48. swapping historical major use of virtual memory is supporting “swapping” using disk (or SSD, …) as the next level of the memory hierarchy process is allocated space on disk/SSD memory is a cache for disk/SSD only need keep ‘currently active’ pages in physical memory 35 swapping ≈ mmap with “default” fjles to use

  49. HDD/SDDs are slow HDD reads and writes: milliseconds to tens of milliseconds minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes SSD writes and writes: hundreds of microseconds designed for writes/reads of kilobytes (not much smaller) page fault handler is going switch to another program 36

  50. HDD/SDDs are slow HDD reads and writes: milliseconds to tens of milliseconds minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes designed for writes/reads of kilobytes (not much smaller) page fault handler is going switch to another program 36 SSD writes and writes: hundreds of microseconds

  51. HDD/SDDs are slow HDD reads and writes: milliseconds to tens of milliseconds minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes SSD writes and writes: hundreds of microseconds page fault handler is going switch to another program 36 designed for writes/reads of kilobytes (not much smaller)

  52. the page cache memory is a cache for disk fjles, program memory has a place on disk running low on memory? always have room on disk assumption: disk space approximately infjnite physical memory pages: disk data ‘temporarily’ kept in faster storage possibly being used by one or more processes? possibly part of a fjle on disk? possibly both goal: manage this cache intelligently 37

  53. memory as a cache for disk fully associative any virtual address/fjle part can be stored in any physical page replacement is managed by the OS normal cache hits happen without OS common case that needs to be fast 38 “cache block” ≈ physical page

  54. page cache components handle cache hits fjnd backing location based on virtual address/fjle+ofgset handle cache misses track information about each physical page handle page allocation handle cache eviction 39 mapping: virtual address or fjle+ofgset → physical page

  55. page cache components handle cache hits fjnd backing location based on virtual address/fjle+ofgset handle cache misses track information about each physical page handle page allocation handle cache eviction 40 mapping: virtual address or fjle+ofgset → physical page

  56. “cache hits” mapping found? cache hit on program memory access structure determined by hardware — involved in every memory access mapping found? cache hit on read/write system call (or cache hit on page fault for mmap’d memory) multiple possible designs (software data structure) 41 virtual address/fjle ofgset → physical page page table: virtual address → physical page (if any) kernel data structures: fjle ofgset → physical page (if any) one idea: balanced tree: ofgset → physical page

  57. copies can be shared between processes Linux: tracking fjles in memory struct radix_tree_root needed when freeing up memory from cahced pages list of every place this fjle is mmap’d Linux’s choice: tree of cached pages way to fjnd cached copies of parts of fjle (versus a fjle that’s a pipe, terminal, etc.) struct inode represents fjle on disk ... i_mmap; struct rb_root_cached atomic_t struct file { i_pages; ... ... ... ... }; ... struct inode { struct address_space i_data; struct address_space { ... }; ... 42 struct inode *f_inode; /* cached pages */ i_mmap_writable; /* count VM_SHARED mappings */ /* tree of private and shared mappings */

  58. copies can be shared between processes Linux: tracking fjles in memory struct radix_tree_root needed when freeing up memory from cahced pages list of every place this fjle is mmap’d Linux’s choice: tree of cached pages way to fjnd cached copies of parts of fjle (versus a fjle that’s a pipe, terminal, etc.) struct inode represents fjle on disk ... i_mmap; struct rb_root_cached atomic_t struct file { i_pages; ... ... ... ... }; ... struct inode { struct address_space i_data; struct address_space { ... }; ... 42 struct inode *f_inode; /* cached pages */ i_mmap_writable; /* count VM_SHARED mappings */ /* tree of private and shared mappings */

  59. Linux: tracking fjles in memory struct radix_tree_root needed when freeing up memory from cahced pages list of every place this fjle is mmap’d Linux’s choice: tree of cached pages way to fjnd cached copies of parts of fjle (versus a fjle that’s a pipe, terminal, etc.) struct inode represents fjle on disk ... i_mmap; struct rb_root_cached atomic_t struct file { i_pages; ... struct address_space { ... ... }; ... struct inode { ... struct address_space i_data; ... }; ... 42 struct inode *f_inode; copies can be shared between processes /* cached pages */ i_mmap_writable; /* count VM_SHARED mappings */ /* tree of private and shared mappings */

  60. mapped pages (read/write, shared) fjle data, cached in memory fjle data on disk/SSD 43

  61. page cache components handle cache hits fjnd backing location based on virtual address/fjle+ofgset handle cache misses track information about each physical page handle page allocation handle cache eviction 44 mapping: virtual address or fjle+ofgset → physical page

  62. “cache miss” for memory mapped to fjles: need data structure saying where fjles are mapped (then rely on fjlesystem) seen dump of this data structure in Linux: /proc/PID/maps for “swapped out” data outside of fjles: (heap memory, modifjed copy-on-write copies of fjles, etc.) need some way to track swapped out, modifjed pages hopefully not too big… for data in fjles: depends on fjlesystem (topic for later) 45 virtual address/fjle ofgset → location on disk

  63. “cache miss” for memory mapped to fjles: (then rely on fjlesystem) seen dump of this data structure in Linux: /proc/PID/maps for “swapped out” data outside of fjles: (heap memory, modifjed copy-on-write copies of fjles, etc.) need some way to track swapped out, modifjed pages hopefully not too big… for data in fjles: depends on fjlesystem (topic for later) 46 virtual address/fjle ofgset → location on disk need data structure saying where fjles are mapped

  64. Linux: tracking memory regions permissions (read/write/execute) } __randomize_layout; virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle fmags: private or shared? … struct vm_area_struct { ... private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory) ... 47 unsigned long vm_flags; unsigned long vm_end; unsigned long vm_start; unsigned long vm_pgoff; ... ... pgprot_t vm_page_prot; ... /* Our start address within vm_mm. */ /* The first byte after our end address within vm_mm. */ /* Access permissions of this VMA. */ /* Flags, see mm.h. */ struct anon_vma *anon_vma; /* Serialized by page_table_lock */ /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */

  65. Linux: tracking memory regions permissions (read/write/execute) } __randomize_layout; virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle fmags: private or shared? … struct vm_area_struct { ... private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory) ... 47 unsigned long vm_flags; unsigned long vm_end; unsigned long vm_start; unsigned long vm_pgoff; ... ... pgprot_t vm_page_prot; ... /* Our start address within vm_mm. */ /* The first byte after our end address within vm_mm. */ /* Access permissions of this VMA. */ /* Flags, see mm.h. */ struct anon_vma *anon_vma; /* Serialized by page_table_lock */ /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */

  66. Linux: tracking memory regions permissions (read/write/execute) } __randomize_layout; virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle fmags: private or shared? … struct vm_area_struct { ... private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory) ... 47 unsigned long vm_flags; unsigned long vm_end; unsigned long vm_start; unsigned long vm_pgoff; ... ... pgprot_t vm_page_prot; ... /* Our start address within vm_mm. */ /* The first byte after our end address within vm_mm. */ /* Access permissions of this VMA. */ /* Flags, see mm.h. */ struct anon_vma *anon_vma; /* Serialized by page_table_lock */ /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */

  67. Linux: tracking memory regions permissions (read/write/execute) } __randomize_layout; virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle fmags: private or shared? … struct vm_area_struct { ... private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory) ... 47 unsigned long vm_flags; unsigned long vm_end; unsigned long vm_start; unsigned long vm_pgoff; ... ... ... /* Our start address within vm_mm. */ /* The first byte after our end address within vm_mm. */ /* Access permissions of this VMA. */ pgprot_t vm_page_prot; /* Flags, see mm.h. */ struct anon_vma *anon_vma; /* Serialized by page_table_lock */ /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */

  68. Linux: tracking memory regions permissions (read/write/execute) } __randomize_layout; virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle fmags: private or shared? … struct vm_area_struct { ... private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory) ... 47 unsigned long vm_flags; unsigned long vm_end; unsigned long vm_start; unsigned long vm_pgoff; ... ... pgprot_t vm_page_prot; ... /* Our start address within vm_mm. */ /* The first byte after our end address within vm_mm. */ /* Access permissions of this VMA. */ /* Flags, see mm.h. */ struct anon_vma *anon_vma; /* Serialized by page_table_lock */ /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */

  69. “cache miss” for memory mapped to fjles: need data structure saying where fjles are mapped (then rely on fjlesystem) seen dump of this data structure in Linux: /proc/PID/maps for “swapped out” data outside of fjles: (heap memory, modifjed copy-on-write copies of fjles, etc.) hopefully not too big… for data in fjles: depends on fjlesystem (topic for later) 48 virtual address/fjle ofgset → location on disk need some way to track swapped out, modifjed pages

  70. Linux: tracking swapped out pages need to lookup location on disk potentially one location for every virtual page trick: store location in page table entry instead of physical page #, permission bits, etc., store ofgset on disk on page fault: examine page table entry to read from disk 49

  71. page cache components handle cache hits fjnd backing location based on virtual address/fjle+ofgset handle cache misses track information about each physical page handle page allocation handle cache eviction 50 mapping: virtual address or fjle+ofgset → physical page

  72. tracking physical pages: fjnding free pages Linux has list of “least recently used” pages: struct page { ... struct list_head lru; ... }; how we’re going to fjnd a page to allocate (and evict from something else) later — what this list actually looks like (how many lists, …) 51 /* list_head ~ next/prev pointer */

  73. page cache components handle cache hits fjnd backing location based on virtual address/fjle+ofgset handle cache misses track information about each physical page handle page allocation handle cache eviction 52 mapping: virtual address or fjle+ofgset → physical page

  74. tracking physical pages: fjnding mappings want to evict a page? remove from page tables, etc. need to track where every page is used! 53

  75. 54 Linux tracking where fjle pages are in page tables: rather complicated look up (but writing ot disk is already slow) tree of mappings lets us fjnd vm_area_structs and PTEs }; ... struct rb_root_cached ... struct address_space { }; ... pgoff_t index; ... struct page { Linux: physical page → fjle → PTE struct address_space *mapping; /* Our offset within mapping. */ i_mmap; /* tree of private and shared mappings */

  76. Linux also tracks location of “anonymous” (non-fjle) pages recall: vm_area_struct: one memory allocation in one process exercise: why a list? what’s one case when non-fjle memory is shared between processes? 55 Linux: physical page → PTE w/o fjle mapping from page to list of vm_area_structs that contain page

Recommend


More recommend