Linux memory management at scale Chris Down Kernel, Facebook - PowerPoint PPT Presentation

Linux memory management at scale Chris Down Kernel, Facebook https://chrisdown.name

server

Image: Spc. Christopher Hernandez, US Military Public Domain

Image: Simon Law on Flickr, CC-BY-SA

Image: Orion J on Wikimedia Commons, CC-BY ■ Memory is divided in to multiple “types”: anon, cache, bufgers, etc ■ “Reclaimable” or “unreclaimable” is important, but not guaranteed ■ RSS is kinda bullshit, sorry

bit.ly/whyswap ■ Swap isn’t about emergency memory, in fact that’s probably harmful ■ Instead, it increases reclaim equality and reliability of forward progress of the system ■ Also promotes maintaining a small positive pressure (similar to make -j cores+1 )

■ OOM killer is reactive, not proactive, based on reclaim failure ■ Hotness obscured by MMU ( pte_young ), we don’t know we’re OOMing ahead of time ■ Can be very, very late to the party, and sometimes go to the wrong party entirely

■ kswapd reclaim: background, started when resident pages goes above a threshold ■ Direct reclaim: blocks application when have no memory available to allocate frames ■ Tries to reclaim the coldest pages fjrst ■ Some things might not be reclaimable. Swap can help here ( bit.ly/whyswap )

“If I had more of this resource, I could probably run N % faster” $ cat /sys/fs/cgroup/system.slice/memory.pressure some avg10=0.21 avg60=0.22 total=4760988587 full avg10=0.21 avg60=0.22 total=4681731696 ■ Find bottlenecks ■ Detect workload health issues before they become severe ■ Used for resource allocation, load shedding, pre-OOM detection

bit.ly/fboomd ■ Early-warning OOM detection and handling using new memory pressure metrics ■ Highly confjgurable policy/rule engine ■ Workload QoS and context-aware decisions

Shift to “protection” mentality ■ Limits (eg. memory.{high,max}) really don’t compose well ■ Prefer protection (memory.{low,min}) if possible ■ Protections afgect memory reclaim behaviour

fbtax2 ■ Workload protection : Prevent non-critical services degrading main workload ■ Host protection : Degrade gracefully if machine cannot sustain workload ■ Usability : Avoid introducing performance or operational costs

fbtax2 Base OS Filesystems Swap Kernel tunables … cgroup v2 Default hierarchy Resource confjguration Applications oomd Metric exporting for cgroups

Base OS ■ btrfs as / ■ ext4 has priority inversions ■ All metadata is annotated ■ Swap ■ Yes, you really still want it ( bit.ly/whyswap ) ■ Allows memory pressure to build up gracefully ■ Usually disabled on main workload ■ btrfs swap fjle support to avoid tying to provisioning ■ Kernel tunables ■ vm.swappiness ■ Writeback throttling

fbtax2 cgroup hierarchy: old web system.slice memory.high: 8G memory.max: 10G Chef hostcritical.slice sshd syslog workload.slice workload-container.slice HHVM workload-deps.slice Service discovery Confjg service

fbtax2 cgroup hierarchy memory.low: 17G Service discovery memory.low: 2.5G workload-deps.slice HHVM memory.low: max workload-container.slice io.latency: 50ms workload.slice web syslog sshd io.latency: 50ms memory.min: 352M hostcritical.slice Chef io.latency: 75ms system.slice Confjg service

webservers: protection against memory starvation

Try it out: bit.ly/fbtax2

Linux memory management at scale Chris Down Kernel, Facebook - PowerPoint PPT Presentation

Linux memory management at scale Chris Down Kernel, Facebook https://chrisdown.name server Image: Spc. Christopher Hernandez, US Military Public Domain Image: Simon Law on Flickr, CC-BY-SA Image: Orion J on Wikimedia Commons, CC-BY Memory

Introduction to Linux Aline Abler Aline Abler Linux, whats that? The pieces of a Linux

Linux Overview Amir Hossein Payberah payberah@gmail.com 1 Agenda Linux Overview Linux

Linux from Sensors to Servers ! When is Linux Not Linux? ! 1 1 Linux runs across a huge range

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Linux Kung Fu Introduction What is Linux? Why Linux? What is the difference between a client

Dynamic Memory Management 333 Dynamic Memory Management Process Memory Layout Process Memory

Linux-iSCSI.org BoF Linux-iSCSI.org BoF Current Status and Future of iSCSI on the Current Status

The State of the Linux Desktop An OSDL Perspective John Cherry OSDL Desktop Linux (DTL)

Introduction to Linux Introduction to Linux Phil Mercurio The Scripps Research Institute

28.05.04 09:50 Memory Management The computer memory is a limited resource so the Memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Memory Management Memory Manager Requirements Minimize primary memory access time

Memory Management Ideally programmers want memory that is large fast non

Memory Management Memory Management 5A. Memory Management and Address Spaces 1. allocate/assign

Memory Management Memory Management 5A. Memory Management and Address Spaces 1. allocate/assign

Memory Management for Self-Adjusting Computation Matthew Hammer Umut Acar Toyota Technological

HOOP: Efficient Hardware-Assisted Out-of-Place Update for Non-Volatile Memory Miao Cai Chance

Abstract We present a novel model for an origin of life situated within inland freshwater

Boot-strapping a WordNet using multiple existing WordNets Francis Bond, Hitoshi Isahara, Kyoko

CS/COE 1520 pitt.edu/~ach54/cs1520 WebAssembly WebAssembly WebAssembly is a low-level

Virtual Memory Anne Bracy CS 3410 Computer Science Cornell University The slides are the

Kernel dynamic memory allocation tracking and reduction Consumer Electronics Work Group Project

Efficient Memory Tracing by Program Skeletonization Alain Ketterlin Philippe Clauss Universit