CMS OPEN DATA FOR MACHINE LEARNING: JET DATASET DATA SCIENCE @ HIGH ENERGY PHYSICS 2017 Group Members: Gabriele Benelli, Javier Duarte, Raghav Elayavalli, Frank Golf, Burt Holzman, Michael Krohn, Joe Pastika, Kevin Pedro, Uzziel Perez, Alexx Perloff , Sezen Sekmen, Devin Taylor, Caterina Vernieri, Andrew Whitbeck 1 CMS OPEN DATA ML - JETS Friday, May 12, 17
CMS OPEN DATA • Public version of 2011 CMS data [link] • Data format: • AOD -> Numpy Array -> Pandas DataFrame • CMS Jet Tuple production 2011 [link] • Event Features: ('run', 'lumi', 'event', 'met', 'sumet', 'rho', 'pthat', • 'mcweight’) Jet-Level Features: • ('njet_ak7', 'jet_pt_ak7', 'jet_eta_ak7', • 'jet_phi_ak7', 'jet_E_ak7', 'jet_msd_ak7', 'jet_area_ak7', 'jet_jes_ak7', 'jet_tau21_ak7', 'jet_isW_ak7’) PF Candidate-Level Features: • 'jet_ncand_ak7', 'ak7pfcand_pt', • 'ak7pfcand_eta', 'ak7pfcand_phi', 'ak7pfcand_id', 'ak7pfcand_charge', 'ak7pfcand_ijet') Asked to add in gen jet information for use in later • projects 5 CMS OPEN DATA ML - JETS Friday, May 12, 17
WORKSPACE Worked on Amazon Web Services (AWS) instances • • Deep Learning Amazon Machine Image (AMI) Amazon Linux Version 2.0 • For use on Amazon Elastic Compute Cloud(Amazon EC2) • 64-bit • p2.xlarge instances (designed for general-purpose GPU compute applications using CUDA and OpenCL) • 1 Tesla K80 GPU • 4 vCPUs • 61 GiB RAM • 8 Deep Learning Frameworks • MXNet, Caffe, Caffe2, Tensorflow, Theano, Torch, CNTK, and Keras (1.2.2) • Other packages and platforms: • Jupyter notebooks with Python 2.7 and Python 3.4 kernels, Matplotlib, Scikit-image, CppLint, Pylint, pandas, Graphviz, Bokeh Python packages, Boto and Boto 3, the AWS CLI, Anaconda 2, and Anaconda 6 CMS OPEN DATA ML - JETS Friday, May 12, 17
HANDS-ON SESSION: DAY 1 Group exercise to learn how to work with a fully connected NN and a convolutional NN • [link] Problem: Create a classifier to identify boosted W jets • Basic Skills: • Tuning metaparameters, testing different pre-processing steps, separating image • representation into layers based PF candidate classes, training a recursive NN, etc. [J. Thaler, et al. arXiv:1011.2268] 7 CMS OPEN DATA ML - JETS Friday, May 12, 17
HANDS-ON SESSION: DAY 1 Densely-Connected Neural Network Convolutional Neural Network [L. de Oliveira, et al. arXiv:1511.05190] 8 CMS OPEN DATA ML - JETS Friday, May 12, 17
LSTM TO CLASSIFY MERGED W VS QCD JETS Inspired by “Recursive jets” talk on Wednesday • Recursive neural nets too difficult for this short time period, so tried recurrent neural • network = LSTM (Long-short term memory unit) For each jet, gave network a list of PF candidates (pt, eta, phi, ID, charge) • Tried sorting candidates by pt or DeltaR w/in jet • Varied # of layers in network and # of training epochs • Did not use embedding layer - need more work to understand this feature • Network seemed to get stuck at low accuracy while training (regardless of variations) • Performed worse than chance • Based on example using IMDB movie review data [link] • Needs more work before it’s ready for primetime • 9 CMS OPEN DATA ML - JETS Friday, May 12, 17
SIMPLE JES REGRESSION Trying to predict the jet energy scale • (JES) Trained a fully connected NN • 1 hidden layer with 100 nodes • Inputs were jet pT and eta • The plots show the true (blue) and • predicted (red) JES versus pT (top) and eta (bottom) We are able to predict and • improve upon the pT dependence, but we can’t predict the eta dependence 10 CMS OPEN DATA ML - JETS Friday, May 12, 17
CNN JES REGRESSION Once again we were trying to predict Unfortunately our NN • • the JES, but this time we used the jet was unable to predict images (PF candidates) as inputs. the JES and seems to simply return a random value. Also tried adding the • jet pT and eta as additional inputs, but this just confused the NN. 11 CMS OPEN DATA ML - JETS Friday, May 12, 17
JES REGRESSION – IT CAN BE DONE Another group has successfully done • this same regression with a fully connected NN Based on the DeepJet framework • 9 layers, 1 with 350 nodes and 8 • with 100 nodes 600+ input variables • See Markus Stoye’s talk from • Monday So this is technically possible • Havukainen, Joona. Sneak peek at regression using DNN. Helsinki JetMET Workshop, Helsinki Institute of Physics. [link] 12 CMS OPEN DATA ML - JETS Friday, May 12, 17
THANK YOU! Special thanks to Javier Duarte, Burt Holzman, and Caterina Vernieri 13 CMS OPEN DATA ML - JETS Friday, May 12, 17
