A tool for bookkeping and management of Monte Carlo datasets Student: Monika Šostkaitė Supervisors: Danillo Piparo, Piergiulio Lenzi, Jean- Roch Vlimant 2012 CERN
Internship goals get familiar with the Monte Carlo datasets bookkeeping; create a highly optimized Oracle database schema; insert and manage the data in the database; write a python component to interface an existing PREP-2 JSON data layer to the Oracle database; estimate the effectiveness of the Oracle solution; contribute to PREP-2 CherryPy web application;
PREP-2 (Production and Reprocessing management tool for CMS ) The testbed of two databases.
Oracle database schema
Oracle layer to define a unified set of required operations - an abstract Python class; five basic methods - get , update , save , delete and query - were identified and implemented; Additional methods: get_all , save_all , update_all , delete_self ; to exchange data between database and Python layer cx_Oracle was used; main focus was to design the most efficient SQL queries and to provide JSON files input/output functionality;
Oracle performance measurements QUERY TIME FULL TIME ACTION (sec) (sec) 1. Getting 23 campaigns prepid list 0.006 0.008 2. Getting Summer12 campaign 6 approvals 0.007 0.035 3. Getting Summer12 campaign 2 sequences 0.007 0.013 4. Getting Summer12 campaign 77 comments 0.011 0.191 5. Getting 16390 requests prepid list 0.060 1.093 6. Getting EXO-Summer12-00929 request 3 0.004 0.018 approvals 7. Getting EXO-Summer12-00929 request 3 0.008 0.137 comments 8. Getting EXO-Summer12-00929 request 2 0.005 0.011 sequences 9. Inserting one campaign 0.126 0.126 10. Inserting one request 0.095 0.095 11. Deleting one request 0.027 0.027 12. Deleting one campaign 0.250 0.250
Oracle performance measurements NUMBER OF CONTENT SIZE RETRIEVING TIME CAMPAIGN REQUESTS (KB) (SEC) 1. Summer12 2623 3819 11.820 2. Summer11_R1 2359 3604 10.602 3. Summer12_TSG 67 106 0.359 4. Summer12_DR52X 677 1085 2.849 5. Fall11_TSG 153 233 0.661 6. Fall11_HLTMuonia 5 8 0.218 7. Fall11_Chamonix 12 18 0.247 Data retrieved from Request table (approvals, comments, sequences are not included)
Conclusions Main goal of this internship, i.e. creating a solid and effective Oracle database back-end, has been achieved; The complex data relative to the Monte Carlo datasets bookkeeping has been studied; Oracle database schema has been created and organized in the most efficient way; The former production and reprocessing management tool for CMS experiment has been analyzed, as well as the new replacement model; The Oracle database back-end software interface has been developed; Time measurements of the Oracle layer performance have been provided. Oracle performance is suitable for the web application.
Recommend
More recommend