eml update
play

EML Update Nick Amin July 10, 2018 Overview Last update (SNT), - PowerPoint PPT Presentation

EML Update Nick Amin July 10, 2018 Overview Last update (SNT), also presented at ML workshop Feedback from ML talk There were some concerns I might be taking advantage of b,c e (or just being unfair in general) if my network is


  1. EML Update Nick Amin July 10, 2018

  2. Overview ⚫ Last update (SNT), also presented at ML workshop ⚫ Feedback from ML talk • There were some concerns I might be taking advantage of b,c → e (or just being unfair in general) if my network is learning isolation ‣ Next few slides • Consider tracks for full electron ID if that’s the desired direction ‣ Claimed before that raw track information underperformed wrt the 21-variable MVA (BDT-based) ‣ Try to replicate MVA via DNN rather than BDT and repeat the check using raw track information with latest architecture, changes, etc to confirm • Could also do e vs 𝛿 ‣ This has already been done? ‣ Barrel only, and 32x32 crystals ‣ For each crystal, they consider energies and pulse profiles → CNN, LSTMs ⚫ Note: in rest of slides here, consider barrel only � 2

  3. Network performance vs flavor ⚫ Is the network using isolation? (29x15 used vs 𝜏 i 𝜃 i 𝜃 ’s 5x5) ⚫ In DY, virtually all events are "truth-matched" to not be in signal the b,c → e category, so we can separate out the two to see how important isolation would be ⚫ Comparing AUCs in legends, b/c class backgrounds are actually worse than unmatched in all pT bins except p T >45 signal vs all signal vs b,c → e signal vs unmatched � 3

  4. Training/testing in tt ̅ ⚫ Switch from DY to tt ̅ unmatched ⚫ From pictures below, network should have no trouble distinguishing isolated/ nonisolated electron candidates since we’re using a large 29x15 window around b/c signal the seed bkg 20<p T <25 sig 20<p T <25 � 4

  5. Training/testing in tt ̅ ⚫ 𝜏 i 𝜃 i 𝜃 (calculated signal vs all signal vs all from 5x5 shape BDT vs 𝜏 i 𝜃 i 𝜃 shape BDT vs CNN crystals) does slightly better than in DY ⚫ CNN has large improvement signal vs all signal vs b,c → e signal vs unmatched CNN vs 𝜏 i 𝜃 i 𝜃 � 5

  6. Training/testing in DY with SC all cells SC only ⚫ Now, as an exercise for just this slide and the next, we try training/testing with electron images made only with avg. supercluster cells/energies sig. (implementation in backup) • This should make it so isolation can’t be learned by the CNN as isolation quantities do not consider deposits avg. belonging to the supercluster bkg. ⚫ Both signal and background images become sparser bkg 20<p T <25 SC only sig 20<p T <25 � 6

  7. Training/ testing in DY with SC For background e ffi ciency of ⚫ Using SC-only degrades the performance wrt the 20%, signal e ffi ciency increase of CNN wrt BDT original implementation ("all cells" in 29x15 all cells SC only around the seed) → ~half of the gain is lost p T <15 25% 11% ⚫ Below, show CNN, 𝜏 i 𝜃 i 𝜃 , and 6-variable BDT 15<p T <25 10% 5% trained on shower shape variables 25<p T <45 10% 6% p T >45 9% 5% 6-var BDT vs 𝜏 i 𝜃 i 𝜃 CNN vs 𝜏 i 𝜃 i 𝜃 CNN vs 6-var BDT all cells SC only � 7

  8. Tracks � 8

  9. Replicating MVA with DNN ⚫ First, see if feeding 21 MVA inputs into DNN achieves comparable performance to (BDT-trained) MVA InputLayer (21) • With a 100k-parameter network of purely fully-connected layers, achieve similar 2816 AUC to BDT Dense (128) ‣ Train with 2M samples, test on 1M Dropout(0.1), LeakyReLU ‣ Tried a 400k-parameter network and saw the same results, but have not yet 33024 Dense (256) tried a network smaller than 100k parameters Dropout(0.1), LeakyReLU • Log scale amplifies slightly worse ROC curves for DNN, but performance is 32896 ~identical (compare AUC values in legend) Dense (128) Dropout(0.1), LeakyReLU 16512 linear log Dense (128) LeakyReLU 8256 Dense (64) LeakyReLU 2080 Dense (32) LeakyReLU 66 Dense (2) � 9

  10. Track information ⚫ Now try to feed "lower-level" track information into the network ⚫ 19 variables • 5 pre-computed position/momenta triplets (R, 𝜃 , 𝜚 ) • supercluster 𝜃 , 𝜚 , E • charge ⚫ Subtract supercluster 𝜃 , 𝜚 from the 𝜃 , 𝜚 components of the triplets math::XYZPointF positionAtVtx ; // the track PCA to the beam spot math::XYZPointF positionAtCalo ; // the track PCA to the supercluster position math::XYZVectorF momentumAtVtx ; // the track momentum at the PCA to the beam spot math::XYZVectorF momentumAtCalo ; // the track momentum extrapolated at the supercluster position from the innermost track state math::XYZVectorF momentumOut ; // the track momentum extrapolated at the seed cluster position from the outermost track state https://github.com/cms-sw/cmssw/blob/0b70aea1b7723a6dfd453d9d015b670d0f735256/ DataFormats/EgammaCandidates/interface/GsfElectron.h#L279-L283 � 10

  11. Track information 1. E SC / PCA momentum Most important 2. 1/E - 1/p 3. 𝜏 i 𝜃 i 𝜃 4. 𝛦𝜚 (track,SC) 5. Brem. fraction ⚫ With the previous positions/momenta, 6. 𝛦𝜃 (track,SC) 7. SC 𝜃 width can compute the red track variables 8. SC 𝜚 width 9. E 3x3 /E SC • Coupled with shape information 10. 𝜏 i 𝜚 i 𝜚 11. E SC / calo momentum from the CNN, this should account 12. N CTF hits 13. GSF 𝜓 2 for nearly all of the performance of 14. N GSF hits the MVA 15. E preshower / E SC 16. H/E Least important 17. 𝛦𝜃 (calo track,seed) 18. SC circularity 19. CTF 𝜓 2 20. missing inner hits 21. conversion vertex probability � 11

  12. Track variable distributions ⚫ After reweighting background to signal, split by charge vtx pos. calo pos. vtx mom. calo mom. outer mom. R 𝛦𝜃 ( · ,SC) 𝛦𝜚 ( · ,SC) � 12

  13. Training & Results ⚫ Using two-prong network 19 track variables • ~440k parameters • No batch normalization, as I found it to be unstable InputLayer (19) InputLayer (15,29,1) 2560 320 ⚫ At 10% bkg e ffi ciency, signal e ffi ciency for "NN" is Conv2D 3x3 (15,29,32) Dense (128) LeakyReLU Dropout(0.1), LeakyReLU 33024 ~2-8% worse than the full MVA Dense (256) MaxPooling2D (7,14,32) Dropout(0.1), LeakyReLU ⚫ Can the remaining/lower-ranked variables make up 131584 18496 Conv2D 3x3 (7,14,64) Dense (512) Dropout(0.1), LeakyReLU LeakyReLU this di ff erence? 131328 Dense (256) MaxPooling2D (3,7,64) Dropout(0.1), LeakyReLU 32896 9232 Dense (128) Conv2D 3x3 (3,7,16) Dropout(0.1), LeakyReLU Dropout(0.2), LeakyReLU 8256 Dense (64) Flatten (336) LeakyReLU Concatenate (400) 60150 Dense (150) Dropout(0.3), LeakyReLU 7550 Dense (50) Dropout(0.1), LeakyReLU 765 Dense (15) Dropout(0.1), LeakyReLU 32 Dense (2) � 13

  14. Appending remaining variables ⚫ Now take the network from the previous slide and append 9 variables from the 19 track variables MVA (not covered by the CNN shape information or the 19 track variables), and +9 MVA variables retrain to see the e ff ect of the "lower-ranked" variables • AUC improves noticeable over previous slide (flippable) — together, these InputLayer (19) InputLayer (15,29,1) 2560 320 lower-ranked variables are not negligible Conv2D 3x3 (15,29,32) Dense (128) • Still, after including this information, the network is still not matching the LeakyReLU Dropout(0.1), LeakyReLU 33024 performance of the MVA, so it is not fully utilizing the 19 raw track variables Dense (256) MaxPooling2D (7,14,32) Dropout(0.1), LeakyReLU that we are feeding in (?) 131584 18496 ⚫ Note, another way of viewing this network/training configuration… Conv2D 3x3 (7,14,64) Dense (512) Dropout(0.1), LeakyReLU LeakyReLU • Same as 21-variable MVA/BDT, except the 6 shape variables are replaced by a 131328 CNN on raw 29x15 crystals, and 6 high-level track variables are replaced by Dense (256) MaxPooling2D (3,7,64) Dropout(0.1), LeakyReLU 19 raw track variables … and the performance is slightly worse? 32896 9232 Dense (128) Conv2D 3x3 (3,7,16) Dropout(0.1), LeakyReLU Dropout(0.2), LeakyReLU 8256 Dense (64) Flatten (336) LeakyReLU Concatenate (400) 60150 Dense (150) Dropout(0.3), LeakyReLU 7550 Dense (50) Dropout(0.1), LeakyReLU 765 Dense (15) Dropout(0.1), LeakyReLU 32 Dense (2) � 14

  15. Next steps ⚫ Try photons ⚫ Eventually get back to endcap training � 15

  16. Backup � 16

Recommend


More recommend