EOS monitoring metrics and user access pattern analysis Philipp Zigann CERN IT-DSS-DT 9th Oct. 2012 Zigann (CERN) EOS user access analysis 9th Oct. 2012 1 / 17
Outline EOS 1 Targets of Data Analysis 2 Data Acquisition 3 Metrics 4 User Pattern 5 Future Work 6 Zigann (CERN) EOS user access analysis 9th Oct. 2012 2 / 17
Outline EOS 1 Targets of Data Analysis 2 Data Acquisition 3 Metrics 4 User Pattern 5 Future Work 6 Zigann (CERN) EOS user access analysis 9th Oct. 2012 3 / 17
EOS Exploration of storage Pure disk based storage In-memory namespace (no DB) Mainly used during data analysis by physicists Developed for fast random file access Zigann (CERN) EOS user access analysis 9th Oct. 2012 4 / 17
Outline EOS 1 Targets of Data Analysis 2 Data Acquisition 3 Metrics 4 User Pattern 5 Future Work 6 Zigann (CERN) EOS user access analysis 9th Oct. 2012 5 / 17
Targets of Data Analysis Automated recognition of system anomalies Improves reaction time of system admins Fast error recognition (and therefore faster solving) Detection of user access pattern classification of typical use cases determination of (in)efficient access pattern optimize inefficient access Zigann (CERN) EOS user access analysis 9th Oct. 2012 6 / 17
Outline EOS 1 Targets of Data Analysis 2 Data Acquisition 3 Metrics 4 User Pattern 5 Future Work 6 Zigann (CERN) EOS user access analysis 9th Oct. 2012 7 / 17
Data Acquisition Xroot built-in monitoring Generating udp packages for each read/write request Detailed information about single reads/writes EOS is based on xroot Analysed by Domenico Giordano (CERN) et al. Lemon monitoring System monitoring tool, mainly used at CERNs infrastructure EOS log file Zigann (CERN) EOS user access analysis 9th Oct. 2012 8 / 17
EOS log file Entry describes what happened between the open and close of a file log =7677503c-adc7-11e1-9083-003048f0e00c& path =/eos/atlas/atl...Ele.root& ruid =38112& rgid =1307& td =username.12459:127@lxplus309& host =lxfsrg15a07.cern.ch& lid =6291730& fid =45557244& fsid =2246 & ots =1338760799& otms =547& cts =1338760890& ctms =654& rb =615562& wb =0& srb =368145245672& swb =0 & nrc =70& nwc =0& rt =28.48& wt =0.00& osize =5671631075& csize =5671631075 Parameters File information User identification Number of seeked, written, read bytes (and used calls) Open and close time Waiting time for io No information about a single read/write call! Zigann (CERN) EOS user access analysis 9th Oct. 2012 9 / 17
Outline EOS 1 Targets of Data Analysis 2 Data Acquisition 3 Metrics 4 User Pattern 5 Future Work 6 Zigann (CERN) EOS user access analysis 9th Oct. 2012 10 / 17
Metrics I Throughput [MB/s] Read+written Bytes divided by open duration of a file Reopened Files per Job Number of reopened files during one job Read Bytes / File Size Read ratio of a write request compared to the file size Written Bytes / File Size Write ratio that indicates file updates and full (re)writing Written Bytes / Number of Write Calls Average transfer volume of a write call during a request (writing) Zigann (CERN) EOS user access analysis 9th Oct. 2012 11 / 17
Metrics II Disk Wait Time of waiting for an io request divided by open duration of the file. disk wait ratio disk wait ratio disk wait ratio disk wait ratio disk wait ratio Entries Entries Entries 1.13926e+08 1.13926e+08 1.13926e+08 Mean Mean Mean 0.04746 0.04746 0.04746 2012 CERN - Zigann Underflow Underflow Underflow 0 0 0 Overflow Overflow Overflow count 7 7 7 8 10 7 10 6 10 5 10 4 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 disk wait ratio Zigann (CERN) 2012-10-08 19:18:07 EOS user access analysis 9th Oct. 2012 12 / 17
Metrics III Read Bytes / Number of Read Calls [MB/Call] Average transfer volume of a read call during a request (reading) Read bytes / call Read bytes / call Read bytes / call Read bytes / call Read bytes / call Entries Entries Entries 1.139261e+08 1.139261e+08 1.139261e+08 Mean Mean Mean 8.22e+04 8.22e+04 8.22e+04 2012 CERN - Zigann Underflow Underflow Underflow 0 0 0 Overflow Overflow Overflow 0 0 0 count 8 10 7 10 6 10 5 10 4 10 3 10 2 10 10 × 3 10 1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 read bytes/read call 2012-10-04 18:36:11 Zigann (CERN) EOS user access analysis 9th Oct. 2012 13 / 17
Outline EOS 1 Targets of Data Analysis 2 Data Acquisition 3 Metrics 4 User Pattern 5 Future Work 6 Zigann (CERN) EOS user access analysis 9th Oct. 2012 14 / 17
User Pattern File Transfer Accessing a file completely (read or write bytes / file size = 1) No reopens Significant Number of MB/Call (2MB, 512kB or 256kB) Event Mixing Mixing one single event with a bunch of other events Low read radio Large files Many reopens Zigann (CERN) EOS user access analysis 9th Oct. 2012 15 / 17
Outline EOS 1 Targets of Data Analysis 2 Data Acquisition 3 Metrics 4 User Pattern 5 Future Work 6 Zigann (CERN) EOS user access analysis 9th Oct. 2012 16 / 17
Future Work Wanted Information Information about the users target (which kind of information is he really looking for) Make vector reads visible Clearly concatenation of single events to jobs Inefficient System Usage Determine and try to reduce it Adaptation of systems to requirements Adaptation of user behaviour Zigann (CERN) EOS user access analysis 9th Oct. 2012 17 / 17
Recommend
More recommend