discrimination via
play

discrimination via semantic segmentation Andy Chappell 11/12/2019 - PowerPoint PPT Presentation

Pandora track/shower discrimination via semantic segmentation Andy Chappell 11/12/2019 DUNE UK Meeting 2 Roadmap Overview and project goal Model architecture Approach to tuning (Very) preliminary performance figures 3


  1. Pandora track/shower discrimination via semantic segmentation Andy Chappell 11/12/2019 DUNE UK Meeting

  2. 2 Roadmap • Overview and project goal • Model architecture • Approach to tuning • (Very) preliminary performance figures

  3. 3 Overview and project goal • Assign track and shower probabilities to every hit in U, V and W planes • Train a neural network to calculate the probabilities • Pass these probabilities to downstream Pandora algorithms for cluster creation, merging, etc • Currently cluster property-based cuts x w

  4. 4 Architecture • U-Net architecture developed for biomedical image segmentation in 2015 • Convolutions form the down- sampling part of the U • Transpose convolutions form the up-sampling part of the U • Skip connections add images from down-sampling path to up- sampling path • Track and shower probabilities https://arxiv.org/abs/1505.04597 assigned to each pixel

  5. 5 Architecture • Building on work started by Steven Green Rectified Linear Unit • PyTorch implementation 1 • Two key blocks in the network • Down-sampling convolution block [Conv2d, ReLU, BatchNorm] [Conv2d, ReLU, BatchNorm] 0 • Up-sampling transpose convolution block -1 0 1 Batch normalisation [ConvTranspose2d, ReLU, BatchNorm] [Conv2d, ReLU, BatchNorm] 𝑦 𝑗 = 𝑦 𝑗 − 𝜈 𝐶 ො • Loss: Categorical cross-entropy 𝜏 𝐶 𝑚𝑝𝑡𝑡 = −𝑚𝑜𝑧 𝑢𝑠𝑣𝑓_𝑑𝑚𝑏𝑡𝑡 𝑧 𝑗 = 𝛿 ො 𝑦 𝑗 + 𝛾 • Accuracy: Fraction classified == truth

  6. 6 Architecture Down-sample Up-sample Credit: V. Dumoulin & F. Visin Credit: T. Lane • Multiple input pixels map to one output • Each input pixel maps to multiple output pixel pixels • Each layer increases number of kernels to • Effective stride 1/2 up-samples to return to build more complex features original image size • Stride 2 down-samples to reduce computational overhead

  7. 7 Inputs • Trained on a 980 event subset of MCC11 (no space charge) • 80% training, 20% validation • Split into batches of 48 images • Images 512 x 208 (likely to change) • Images generated to cover full extent of plane in drift and wire positions (only W plane in provisional tests) x w

  8. 8 Activations

  9. 9 Learning rate optimisation • What’s the fastest we can train? • Learning rate controls step size in weights relative to gradient • Start with a very low learning rate, increase each batch • Try different weight decays • Getting this right allows faster training

  10. 10 • Clearly very high learning rates attainable, but… • Not for a single value over the whole training cycle • Highest rates either fails to progress, or starts poorly • Notable differences in accuracy evolution

  11. 11 • Linear decay better, but… • Higher rates still fail to progress or start poorly • Still notable differences in accuracy evolution

  12. 12 • One-cycle learning: https://arxiv.org/abs/1803.09820 • Start slow -> Accelerate -> Decelerate • Smoother loss evolution • Train at higher rates • Consistent accuracy evolution across a range of maximum learning rates

  13. 13 Performance • Overall accuracy ~87% • Track accuracy ~82% • Shower accuracy ~95% MC Classification x Network Classification w

  14. 14 Future plans • Further network refinements: • Further refine One-Cycle learning rate • Architectural tweaks (e.g. ResBlocks, depth) • Vary learning rate by layer group • Regularisation • Data set refinement • Crop image plane to hit region and higher resolution in x • Image augmentation (e.g. randomly rotate images in each batch) • Train on a much larger data set • Transfer learning with larger images

  15. 15 Backup

  16. 16 • One-cycle policies best performing • Able to go to higher learning rates • Constant and linear decay performance ok • Expect constant rate to plateau • Exponential decay fails to make progress

  17. 17 Conv2d initialization in PyTorch • Conv2d weights use Kaiming uniform initialisation with 𝑏 = √5 • Historical artefact. Jeremy Howard discovered this causes gradients to vanish in deep networks • Will be fixed in a future release of PyTorch • For now, useful to reinitialise weights with Kaiming normal initialisation with 𝑏 = 0 (for standard ReLU)

Recommend


More recommend