Lecture 08: When disaster strikes and all else fails Hands-on Unix system administration DeCal 2012-10-22 1 / 27
Projects groups of four people ❖ Projects ● Tools of the submit one form per group with trade ● Disasters proposed project ideas and SSH public Alleviating the pain keys we’ll be provisioning VMs and sending ● out an announcement 2 / 27
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in Tools of the trade use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 3 / 27
What’s up? uptime : how long continuously ❖ Projects ● Tools of the running, what’s the load average trade ❖ What’s up? ❖ What’s 1, 5, 15 min average number of hosing? ✦ ❖ What’s in processes waiting for CPU (or IO) use? ❖ Too much traffic ❖ Too many w , who : who’s logged in on machine ● files ❖ Low-level “files” write : write to a logged-in user ❖ Too many ✦ terminals wall : write to all logged-in users ❖ sudo ✦ ❖ Other tools Disasters Alleviating the pain 4 / 27
What’s hosing? ❖ Projects top , htop (Linux), ps ( ps aux , ● Tools of the ps elf ) trade ❖ What’s up? similarly iftop for network i nter f ace ❖ What’s ● hosing? bandwidth, iotop (Linux) for disk IO ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 5 / 27
What’s in use? “The action can’t be completed. . . in use” ❖ Projects Tools of the (Windows) trade ❖ What’s up? “The operation can’t be completed. . . in ❖ What’s hosing? use” (Mac OS X) ❖ What’s in use? ❖ Too much lsof for files traffic ● ❖ Too many files lsof -i for network ports ● ❖ Low-level “files” see also : netstat -pant , fuser ● ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 6 / 27
Too much traffic ❖ Projects netcat : “pipe” over TCP/UDP ● Tools of the wireshark , tshark , tcpdump : trade ● ❖ What’s up? packet sniffer/analyzer ❖ What’s hosing? nmap : network scanner ❖ What’s in ● use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 7 / 27
Too many files du , df : directory, filesystem disk ❖ Projects ● Tools of the space usage trade ❖ What’s up? scp ( s ecure c o p y): transfer files over ❖ What’s ● hosing? SSH ❖ What’s in use? ❖ Too much rsync ( r emote sync ): intelligently ● traffic ❖ Too many transfer files (often over SSH) files ❖ Low-level tar ( t ape ar chiver): combine files “files” ● ❖ Too many terminals into a tarball ❖ sudo ❖ Other tools Disasters Alleviating the pain 8 / 27
Low-level “files” ❖ Projects fdisk , parted (Linux): edit ● Tools of the partition table trade ❖ What’s up? fsck : check filesystem for errors ❖ What’s ● hosing? dd : copy block devices ❖ What’s in ● use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 9 / 27
Too many terminals screen , tmux ❖ Projects ● Tools of the “metaterminal” ● trade ❖ What’s up? ❖ What’s access multiple terminal sessions ✦ hosing? ❖ What’s in inside a single terminal session use? ❖ Too much traffic ❖ Too many other features: persistence (after ● files ❖ Low-level logging off), session sharing (between “files” ❖ Too many users) terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 10 / 27
sudo ❖ Projects sudo : s witch u ser do (usually used ● Tools of the to give your command root powers) trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools via xkcd.com Disasters Alleviating the pain 11 / 27
Other tools ❖ Projects ldd (shared library dependencies), ● Tools of the truss or strace (trace system trade ❖ What’s up? calls) ❖ What’s hosing? md5sum : file checksum ❖ What’s in ● use? ❖ Too much watch : execute command and ● traffic ❖ Too many repeatedly show output files ❖ Low-level “files” seq : print sequence of numbers ● ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 12 / 27
❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware Disasters meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 13 / 27
Software meltdowns ❖ Projects system load ( uptime command) too ● Tools of the damn high trade Disasters remote access (networking, firewall, ● ❖ Software meltdowns SSH) broken ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 14 / 27
Hardware meltdowns failed hard drives ❖ Projects ● Tools of the failed fans, power supplies, CPU, RAM trade ● Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 15 / 27
Criminals on the loose crackers will do Bad Things ❖ Projects ● Tools of the compromised accounts trade ● Disasters looks can be deceiving, uncertain what ● ❖ Software meltdowns to trust ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 16 / 27
Escalation of problems we like to build systems on top of each ❖ Projects ● Tools of the other trade Disasters if one thing fails, it may break other ● ❖ Software meltdowns things, causing other things to fail ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 17 / 27
2003 Northeast blackout August 13, 2003, 9:21pm EDT (via en.wikipedia.org ) 18 / 27
2003 Northeast blackout August 14, 2003, 9:03pm EDT (via en.wikipedia.org ) 19 / 27
❖ Projects Tools of the trade Disasters Alleviating the pain Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 20 / 27
Be Prepared Boy Scout motto ❖ Projects ● Tools of the Murphy’s Law: “Anything that can go trade ● Disasters wrong, will go wrong.” Alleviating the pain s— happens ● ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 21 / 27
Power management ❖ Projects Uninterruptible Power Supply (UPS) ● Tools of the many UPSes can remotely power cycle trade ● Disasters servers Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 22 / 27
Out-of-band management separate hardware that can be ❖ Projects ● Tools of the remotely accessed trade Disasters independent from rest of hardware, ● Alleviating the pain dedicated NIC ❖ Be Prepared ❖ Power can access BIOS, power cycle, provide ● management ❖ Out-of-band visual display management ❖ Redundancy e.g., IPMI, Dell DRAC, Sun LOM ● ❖ Monitoring ❖ Security ❖ Backups 23 / 27
Redundancy dual redundant power supplies typical ❖ Projects ● Tools of the RAID trade ● Disasters failover servers for high availability ● Alleviating the pain spare parts (hard drives!) for swapping ● ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 24 / 27
Monitoring ❖ Projects many large scale operations (Google, ● Tools of the Facebook) have many failed servers at trade Disasters any point in time, monitoring servers Alleviating the pain reroute traffic appropriately ❖ Be Prepared ❖ Power monitor syslog ● management ❖ Out-of-band SNMP traps ● management ❖ Redundancy alarm notification by email, text ● ❖ Monitoring ❖ Security message ❖ Backups 25 / 27
Security subscribe to OS security ❖ Projects ● Tools of the announcements trade Disasters Intrusion Detection Software (e.g., ● Alleviating the pain snort, bro) ❖ Be Prepared ❖ Power be wary of lax permissions ● management ❖ Out-of-band limit root access ● management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 26 / 27
Backups user data, system configuration ❖ Projects ● Tools of the ideally daily, weekly, monthly rotations trade ● Disasters RAID is not a backup ● Alleviating the pain e.g., rsync , cron , rsnapshot ● ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 27 / 27
Recommend
More recommend