The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
<Insert Picture Here> Fault-Diagnostic Best Practices: What Every DBA Must Know About Oracle Database 11g Mark Ramacher Director, Server Management Server Technologies
Oracle Database 11 g R1 Fault Diagnostic Automation Goal: Reduce Time to Problem Resolution Realistic Testing Diagnostic Proactive and Intelligent Workflow Patching Resolution Automatic Health Automation Checks Diagnostic Solution Delivery Prevention Resolution
Diagnostic Workflow Automation
Historic Issues with RDBMS Diagnostic Data • No organization • DBA must search around for relevant diagnostics to send • No catalog of failures • Just a text stream (alert log) for history • DBA: How healthy has my database been this last quarter? • DBA: Have I seen this failure before? • Not always sufficient on first failure • DBA must reproduce the failure with debug switches • Cause of multiple round trips between customer and support
Historic Issues with RDBMS Diagnostic Data (continued) • Unmanaged • Grows forever • DBA must decide when and which files to delete • Unrestrained • Floods of data from repeated occurrences of an error • DBA must perform emergency space management
The New World of 11g Diagnostics • Organized • Diagnostic data is annotated and can be queried and correlated • DBA uses automated tool to find failure data • Cataloged • Automated Problem and incident management • DBA can query to see history of failures and which are duplicates • First Failure Capture • DBA’s work is done after sending initial diagnostic package
The New World of 11g Diagnostics • Managed • Auto purging • DBAs don’t have to monitor space usage of trace files • Constrained • Flood control • One less worry for a DBA in time of crisis
Concepts : Problems and Incidents • Problems are fundamental code or configuration issues that can cause execution failures • They exist until they are corrected, e.g. by patch • They are managed to resolution • An incident is a single occurrence of a problem • They happen at point(s) in time and thus have timestamps • They induce diagnostic actions like dumps and traces • They are associated to problems by a “problem key” • E.g. error code
Automatic Diagnostic Repository (ADR) • Stores diagnostic data in a directory hierarchy • Holds data concurrently for multiple Oracle products • Each product instance has its own diagnostic workspace • ADR data is highly structured • Formalizes incidents and problems and assigns them IDs • Metadata is kept for each incident and problem • Incident related diagnostic data is placed in its own directory • Alert log and trace files are annotated and can be queried
Automatic Diagnostic Repository DIAGNOSTIC_DEST Support Workbench BACKGROUND_DUMP_DEST $ORACLE_BASE CORE_DUMP_DEST USER_DUMP_DEST $ORACLE_HOME/log ADR Base diag rdbms DB Name ADR metadata SID Home hm alert cdump trace incpkg incident (others) … incdir_1 incdir_ n ADRCI V$DIAG_INFO log.xml alert_SID.log
Automatic Diagnostic Repository (ADR) • Self-managing • Trace files purged after 1 month (configurable) • Incident/Problem metadata purged after 1 year (configurable) • Note: incidents can be flagged as “don’t purge” to override purging • Repeated incidents are flood controlled (5 dumps per hour per problem) • Recreates itself as needed
Incident Packaging Service (IPS) Where there is structure, there can be automation… • IPS uses the ADR structure to automate the packaging of diagnostic data • Solves the problem of “what needs to be sent” • Gathers all relevant diagnostic data for a problem • Correlates related incidents to make sure it captures root cause
Incident Packaging Service (IPS) ADR Problem ID Correlation Package BOM Add Delete Scrub Diagnostic Generate Package Modify Contents Zip File
Incident Packaging Service (IPS) • Recommends further diagnostic actions for DBA • For example “build SQL test case” • Packages structure and metadata so that Oracle side automation can takes place • Use of IPS is critical to speed up problem resolution!
11g Health Monitoring • Health Monitoring is designed to help the DBA: • Find problems before they impact service availability • Determine the scope of a problem • Validate that a problem is resolved • Provides a number of “health checkers” • Dictionary • DB structure integrity (control files, data file headers, etc.) • Redo log content • Undo Segment integrity • Data block integrity
11g Health Monitoring • Checkers can be “reactively” activated during incidents • Targeted • E.g. check integrity of blocks near a corrupted block • All checkers can be activated on demand
First Failure Analysis • Dumping the required diagnostic data “out of the box” • Reduces round trips with Oracle support • Internal logic activates detailed dumps for the given failure circumstances • Additional data is detailed but targeted • Minimal increase in overall diagnostic data size
EM Support Workbench • Support Workbench reduces the DBA’s diagnostic management to a few clicks • Two main entry flows • From an incident alert on the DB home page • From the Support Workbench home page • Support Workbench home page • View recent and historical problems • View diagnostics packages • View health checker findings
Support Workbench Home
Support Workbench – Problem Details • View and process a problem • View “reactive” checker findings for this problem • View all incidents of this problem • View associated Metalink service request • Package diagnostics for the problem • Perform guided resolution on the problem • Data Repair advisor (link appears only if relevant) • SQL Repair advisor
Problem Details
Support Workbench - Packaging • Two flows that guide you through IPS packaging • Quick package • Wizard to guide you through the basic packaging steps • Cannot modify contents • Advanced packaging • Content Editing • Additional user dumps • Automated upload to Oracle* • Automated service request creation * Requires OCM (Oracle Configuration Manager)
Advanced Packaging
Intelligent Repair Advisors • Data Recovery Advisor • Guided expert data recovery system using diagnostic data and health check output • SQL Test Case Builder • Automatically retrieves exact environment information from ADR to build SQL test cases and replicate SQL issues • SQL Repair Advisor • Analyzes failing SQL statements to isolate bug • May recommend SQL Patch as work around
Data Recovery Advisor • In an outage, uncertainty and confusion are common Investigation • Largest part of downtime is: Time • Investigating the problem, planning a solution • Data Recovery Advisor • Automates investigation, reports all problems • Intelligently determines plan for recovery Planning Time • Handles multiple failure situations Recovery • Presents only feasible recovery options Time • Are there backups, is there a standby? • Ranked by repair time and data loss Time to Repair • Can automatically apply recovery plan Reduces downtime by Eliminating Confusion
SQL Test Case Builder Business Requirement • Bug resolution • Test case required for fast bug resolution • Not always easy to provide a test case • What information should be provided? • How much data is need? • Getting the test case to Oracle can be tricky Solution • Oracle automatically creates a test case • Collects necessary information relating to a SQL incident • Collected data is packaged to be sent to Oracle • Collected data allows a developer to reproduce the problem
SQL Repair Advisor Business Requirement • The most common types of SQL problems - exception, performance regression etc., are hard to diagnose • A lot of time is spent trying to reproduce the problem • If a workaround is found it has to be applied to entire system Solution • Advisor • Investigates the incident locally • Automatically determines the root cause • Provides a workaround (SQL Patch) for just the effected SQL • If not, sends necessary diagnostic information to Oracle
SQL Repair Advisor Flow SQL Generate Statement statement incident in ADR Execute crashes automatically Trace files DBA runs DBA gets SQL Repair Advisor alerted SQL Repair Advisor investigates Statement DBA accepts executes SQL patch successfully again Execute SQL patch SQL statement generated patched
Automatic Diagnostic Workflow Critical ADR DBA Error alert Repair Support Workbench Checkers Oracle Support IPS Zip File IPS
Recommend
More recommend