Unit OS11: Performance Evaluation 11.2. Boot/Startup Troubleshooting Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Roadmap for Section 11.2 Windows Boot Process Shutdown Causes for Crashes Recovery Console and Safe-Mode Boot System Restore 3 1
x86 and x64 Boot Process Boot begins during installation when Setup writes various things to disk System volume: Master Boot Record (MBR) Boot sector NTLDR – NT Boot Loader NTDETECT.COM BOOT.INI SCSI driver – Ntbootdd.sys (not present on all systems) Boot volume: System files – %SystemRoot%: Ntoskrnl.exe, Hal.dll, etc. 4 The Boot Process 1. BIOS � Reads MBR from boot device 2. MBR Contains small amount of code that scans partition table 4 entries First partition marked active is selected as the system volume Loads boot sector of system volume C: 3. Boot sector (NT-specific code) Reads root directory of volume and loads NTLDR 5 2
x86 and x64 Boot Process 4. NTLDR Moves system from 16-bit to 32-bit mode and enables paging Reads and uses Ntbootdd.sys to perform disk I/O if the boot volume is on a SCSI disk different than the system volume This is a copy of the SCSI miniport driver used when the OS is booted Reads Boot.ini Boot.ini selections point to boot drive Specifies OS boot selections and optional switches (most for debugging/troubleshooting) that passed to kernel during boot If more than one selection, NTLDR displays boot menu (with timeout) If you select a 64-bit installation, NTLDR moves the CPU into 64-bit mode 7 Boot Process 4. NTLDR (continued) Once boot selection made, user can type F8 to get to special boot menu Last Known Good, Safe modes, hardware profile, Debugging mode NTLDR loads and executes Ntdetect.com to perform BIOS hardware detection (x86 and x64 only) Later saved into HKLM\Hardware\Description NTLDR loads: Ntoskrnl.exe, Hal.dll, and Bootvid.dll (and Kdcom.dll for XP and later) The registry SYSTEM hive (\Windows\System32\Config\System) Later this becomes HKLM\System Based on the SYSTEM hive, the boot drivers are loaded Boot driver: critical to boot process (e.g. boot file system driver) Transfers control to main entry point of Ntoskrnl.exe 8 3
The Boot Process (cont) 5. Ntoskrnl.exe (splash screen appears) Initializes kernel subsystems in two phases: First phase is object definition (process, thread, driver, etc) Second builds on the base that the objects provide This is done in the context of a kernel-mode system thread that becomes the idle thread I/O Manager starts boot-start drivers and then loads and starts system-start drivers 9 Driver Load Order Every driver has a key in HKLM\System\CurrentControlSet\Services Type: 1 for driver, 2 for file system driver, others are Win32 services Start: 0 = boot, 1 = system, 2 = auto, 3 = manual, 4 = disabled Some drivers need fine-grained control over load order to satisfy dependencies with other drivers A driver’s optional Group value controls load order within a start phase (boot, system, auto) HKLM\System\CurrentControlSet\Control\ServiceGroupOrder A driver’s optional Tag value control’s startup within its group Note: Plug-and-play (discussed in the I/O section) controls load order of PnP drivers Special case: the file system driver for the boot volume is always loaded and started, regardless of what its start type is Lab: run LoadOrd from Sysinternals to see driver ordering 10 4
Boot Process 5. Ntoskrnl.exe (continued) � Creates the Session Manager process (\Windows\System32\Smss.exe), the first user-mode process 6. Smss.exe: Runs programs specified in BootExecute e.g. autochk, the native API version of chkdsk Processes “Delayed move/rename” commands Used to replace in-use system files by hotfixes, service packs, etc. Initializes the paging files and rest of Registry (hives or files) Loads and initializes kernel-mode part of Win32 subsystem (Win32k.sys) Starts Csrss.exe (user-mode part of Win32 subsystem) Starts Winlogon.exe 11 Boot Process 7. Winlogon.exe: Starts Lsass.exe (Local Security Authority) Loads GINA DLL (Graphical Identification and Authentication) Default is Msgina.dll Displays logon dialog Starts Services.exe (the service controller) 8. Services.exe starts Win32 services marked as “automatic” start Also includes any drivers marked Automatic start Service startup continues asynchronous to logons End of normal boot process 12 5
Logon Process Winlogon sends username/password to Lsass Either on local system for local logon, or to Netlogon service on a domain Creates processes for executables listed in HKLM\Software\Microsoft\Windows NT \CurrentVersion\WinLogon\Userinit By default: Userinit.exe Runs logon script, restores drive-letter mappings, starts shell Userinit creates a process to run HKLM\Software\Microsoft\Windows NT \CurrentVersion\WinLogon\Shell By default: Explorer.exe There are other places in the Registry that control programs that start at logon 13 Logon Process Use Autoruns (Sysinternals) or Msconfig (new in Windows XP) to see order of process startup at logon time To run Msconfig, click on Start->Help, then “Use Tools…”, then System Configuration Utility Msconfig shows what’s defined to start vs Autoruns which shows all places things CAN be defined to start Msconfig (in \Windows\PCHEALTH Autoruns (Sysinternals) \HELPCTR\Binaries 14 6
Normal vs. Abnormal Shutdown Normal shutdown Required reboots (e.g. installing a service pack replaces critical system files) Hardware maintenance But normally don’t need to shutdown—just hibernate! Abnormal shutdown System crash - something wrong in kernel mode Hardware error 15 System Shutdown Procedure What happens when Windows performs a normal shutdown? ExitWindowsEx function sent to Csrss Start menu->shutdown: Explorer calls it CTRL+ALT+DEL->shutdown: Winlogon calls it If not a forced shutdown, Csrss sends query message to all threads owning top- level windows Processes can cancel shutdown if not a “forced” shutdown Interactive shutdowns are not forced If all answer ok, Csrss sends shutdown message Csrss waits for time defined by HKCU\Control Panel\Desktop\HungAppTimeout If timeout expires, shows popup: 16 7
Shutdown Procedure (contd). Csrss tells Service Control Manager (Services.exe) to exit, which tells all Win32 services to exit Csrss.exe waits for HKLM\System\CurrentControlSet\Control\WaitToKillServiceTimeout After the timeout, Services.exe is terminated (even though service processes may still be shutting down) Example: IIS, Exchange Some sites lengthen the value to accommodate long shutdowns Finally, calls NtShutdownSystem, which calls the Plug and Play manager’s NtSetSystemPowerState orchestrates final system shutdown Drivers are called to shut down (e.g. flush data to disk) Finally, the HAL is called, which then tells the hardware either to reboot or power off Systems without power management end with the dialog “it is safe to power off your system now” 17 Hibernate & Resume Hibernation was introduced with Windows 2000 power management System memory saved to hiberfil.sys on system volume On power-on NTLDR reads hiberfil.sys and continues where the system left off No boot.ini or boot option menu if hiberfil.sys has valid data Not supported on x86 Server systems (works on x64 Server 2003 systems) XP has some hibernate/resume enhancements Hibernation file is better compressed I/O overlapped on IDE drives Resume is faster because reads are larger Device parallelization during power up improved Power up done asynchronously in the background by drivers (specifically power-pageable devices without children) 18 8
What triggers a Windows Crash? Something’s wrong in kernel-mode: Unhandled exception (e.g. executing invalid instruction) OS or driver detects severe inconsistency Referencing paged out memory at interrupt level (famous “IRQL_NOT_LESS_EQUAL” crash) A reschedule is attempted at dispatch level IRQL or higher Hardware error 19 Why Does Windows Crash? Top 100 Reported Crashing Issues (reported at WinHEC 2004 conference) ~70% caused by 3rd party driver code ~15% caused by unknown (memory is too corrupted to tell) ~10% caused by hardware issues ~5% caused by Microsoft code There are lots of third party drivers! From online crash analysis database: 55,000 unique drivers - 24 new / day (28,000 in 2004) 220,000 total drivers - 98 revised / day (130,000 in 2004) Many Devices Over 1,263,300 distinct Plug and Play (PnP) IDs (680,000 in2004) 1,600 PnP IDs added every day 20 9
What Happens At The Crash When a condition is detected that requires a crash, KeBugCheckEx is called Takes five arguments: Stop code (also called bugcheck code) 4 stop-code defined parameters KeBugCheckEx: Turns off interrupts Tells other CPUs to stop Paints the blue screen Notifies registered drivers of the crash If a dump is configured (and it is safe to do so), writes dump to disk 21 After the Crash - Causes for Boot Problems Boot may be failing because of… Master Boot Record (MBR) corruption Boot.ini problems System hive corruption Crash at boot System file corruption 22 10
Recommend
More recommend