IT & PH DEPT. Process Monitoring Of Nightly Builds PH/SFT & IT/CF Summer Student Willem Van Lint
Overview IT & PH DEPT. ● Nightly builds & problem statement ● Lemon monitoring framework ● Design of the process monitoring tool ● Use in the nightly builds
Overview IT & PH DEPT. ● Nightly builds & problem statement ● Lemon monitoring framework ● Design of the process monitoring tool ● Use in the nightly builds
Nightly builds: IT & PH Introduction DEPT. ● Nightly builds compile and test projects like ROOT, GAUDI, CORAL, … ● Different platforms and slots ( = build environment) → Server – Client build system
Nightly builds: IT & PH Architecture DEPT. ● Nightly builds use a server-client architecture through RPC to distribute the architectures to be built. Stores Server MySQL Web interface Get work unit Windows Client Mac Client Linux Client runs ... ... DoBuild.py Compile ROOT Compile ... Install ... Install … ... Test ... Test ...
Nightly builds: IT & PH Problems DEPT. Problems Solutions Processes hanging Detect and terminate → high CPU time processes and write reason → large log files in log files Low on disk usage A: Clean up old builds B: Stop building
Nightly builds: IT & PH Problem example DEPT. ● Sometimes hanging processes in tests or make ● Example of process tree: ● Client → doBuild.py → compile ROOT → subprocesses PID TTY STAT TIME COMMAND 19485 ? Ss 0:00 /bin/sh 19486 ? S 0:00 \_ /bin/sh /afs/cern.ch/sw/lcg/app/nightlies/scripts/launch_client.sh lxbuild147 8002 19594 ? S 0:00 | \_ python /afs/cern.ch/sw/lcg/app/nightlies/scripts/client.py --machine lxbuild147 19600 ? Z 0:00 | \_ [uptime] <defunct> 15940 ? S 0:00 | \_ python /afs/cern.ch/sw/lcg/app/nightlies/scripts/doBuild.py --slots dev1 15947 ? S 0:00 | | \_ /bin/sh -c source{SITEROOT}/sw/contrib/ 21661 ? S 0:00 | | \_ cmt pkg_make 4 21683 ? S 0:00 | | \_ sh -c mkdir -p logs; 21690 ? S 0:00 | | \_ sh -x /build/nightlies/dev1/Fri/LCGCMT/LCGCMT_59 21695 ? R 5781:25 | | | \_ make -k -j4 21691 ? S 0:00 | | \_ tee -a logs/ROOT_x86_64-slc5-gcc43-dbg_make.log 8628 ? Z 0:00 | \_ [python] <defunct> 19487 ? S 0:00 \_ tee /afs/cern.ch/sw/lcg/app/nightlies/nightlies-logs/crncli64148.txt
Overview IT & PH DEPT. ● Nightly builds & problem statement ● Lemon monitoring framework ● Design of the process monitoring tool ● Use in the nightly builds
Lemon monitoring framework IT & PH DEPT. ● Monitoring of sensor values ● 3 interactions: ● Sensor – Agent ● Agent – Server ● UI - User
Lemon architecture IT & PH DEPT. Repository backend Applicati RRDT ool SQL / PHP on Server Oracle Database apache TCP/UDP HTTP Nodes Web Lemon Monitoring Agent browser CLI User Lemon-host-check Sensor Sensor Sensor
Lemon agent & sensors IT & PH DEPT. ● Sensor = executable Nodes Monitoring Agent ● Agent ↔ sensors: ● Few simple commands. Sensor Sensor Sensor ● Interaction: supported API for Perl, C++ ● Sensors provide metric classes that can be instantiated (e.g. with different parameters).
Lemon exceptions & IT & PH actuators DEPT. ● Exceptions can be defined in the Lemon Agent based on values of the sensors. ● An actuator can be called to resolve the exception. 30010 MetricName exception.hangingcpu MetricClass alarm.exception Timing 20 5 Parameters Correlation (33:2 > 1000) && (33:1 > 0) Actuator /usr/bin/lemon-actuator-kill cputime $act_value_02 MaxRuns 3 900 Timeout 100
Overview IT & PH DEPT. ● Nightly builds & problem statement ● Lemon monitoring framework ● Design of the process monitoring tool ● Use in the nightly builds
Monitoring the nightly builds IT & PH system DEPT. ● Reuse and enhancement of Lemon sensor wrapper in Python ● Implementation of new metric classes ● Implementation of an actuator Monitoring Agent Exception Actuator Sensor Sensor wrapper Metric module Metric module
Lemon sensor wrapper IT & PH DEPT. ● Normal case: sensor delivers metric values to agent
Lemon sensor wrapper IT & PH DEPT. ● Wrapper acts as a sensor but asks metric values from other modules
Metrics IT & PH DEPT. ● Searching the process tree: ● Select branches ● Extract specific information ● Modular: other select and extract functions can be used
Metrics IT & PH DEPT. ● Cputime metric: ● Returns the total cpu time, project name of a project build branch and the PID of the root process of the branch ● Example: Machi Metric Time PID Total Project ne nr cpu name time lxbuild 5380 12825 2912 9 Project0 148 52613 lxbuild 5380 12825 3813 5 Project1 148 52613
Metrics IT & PH DEPT. ● Files metric: ● For each open log file in a project branch, returns the PID of the process with the handle, file path, amount written by that process and project name ● Example: Machine Metric nr Time PID Path Amount Project written lxbuild148 5381 Mon Aug 16231 .../x86_64- 12028 GAUDI 23 slc5- 04:03:54 gcc43- 2010 opt.log lxbuild148 5381 Mon Aug 17053 …/x86_64- 0 COOL 23 slc5-icc11- 04:03:54 dbg- 2010 tests.log
Actuator IT & PH DEPT. ● Different actions for when limits are exceeded: ● Total cpu time > 2h: Search the process with the largest cputime in the branch and terminate the branch from that process down. ● Log files > 2Mb: Terminate the branch from the responsible process down. ● Partition usage > 85%: Delete builds from the previous day. ● Partition usage > 95%: Terminate all the builds. ● Always write an error in log files from terminated processes
Code optimalization IT & PH DEPT. ● Unittests: goal = attaining full code coverage ● Documentation generated from docstrings by Doxygen ● Static code analysing by pylint, checks for: ● Errors and warnings ● Refactoring possibilities e.g. too many attributes ● Coding conventions e.g. indentation
Overview IT & PH DEPT. ● Nightly builds & problem statement ● Lemon monitoring framework ● Design of the process monitoring tool ● Use in the nightly builds
Use in the nightly builds IT & PH system DEPT. ● Project builds will be followed. ● Hanging builds will be detected by the actuator limit. ● Reason for termination in the logs.
Outlook IT & PH DEPT. ● A Windows and Mac version of the Lemon sensor wrapper to report sensor values through RPC to a Linux agent ... Windows/Mac machine(s) Linux machine Sensor wrapper Metric module Monitoring Agent client (repeater) RPC Sensor wrapper Sensor wrapper Sensor client (repeater) server
Conclusion IT & PH DEPT. ● Modular system for process monitoring based on cputime and open file size ● Used in production ● Reuse and enhancement of Lemon sensor wrapper for Python.
Recommend
More recommend