an introduction to deep learning for astronomy
play

AN INTRODUCTION TO DEEP LEARNING FOR ASTRONOMY Marc - PowerPoint PPT Presentation

AN INTRODUCTION TO DEEP LEARNING FOR ASTRONOMY Marc Huertas-Company IAC WINTER School 2018 REFERENCES SEVERAL SLIDES / INFOS SHOWN HERE ARE INSPIRED/ TAKEN FROM OTHER WORKS / COURSES FOUND ONLINE Deep Learning: Do-It-Yourself! [Bursuc,


  1. In practice OPTIMIZATION ERROR TRAINING VALIDATION TEST training set: use to train the classifier validation set: use to monitor performance in real time - check for overfitting test set: use to train the classifier

  2. In practice OPTIMIZATION ERROR TRAINING VALIDATION TEST NO CHEATING! NEVER USE TRAINING TO VALIDATE YOUR ALGORITHM!

  3. The algorithm used to minimize is called OPTIMIZATION THERE ARE SEVERAL OPTIMIZATION TECHNIQUES

  4. Optimization THERE ARE SEVERAL OPTIMIZATION TECHNIQUES THEY DEPEND ON THE MACHINE LEARNING ALGORITHM

  5. Optimization THERE ARE SEVERAL OPTIMIZATION TECHNIQUES THEY DEPEND ON THE MACHINE LEARNING ALGORITHM NEURAL NETWORKS USE THE GRADIENT DESCENT AS WE WILL SEE LATER W t +1 = W t � λ h 5 f ( W t ) learning rate weights to be learned epoch

  6. f W ( ~ x ) The differences are 
 in the function 
 that is used ARTIFICAL 
 RANDOM FORESTS NEURAL NETWORKS (DEEP LEARNING) CARTS decision trees SUPPORT VECTOR MACHINES kernel algorithms

  7. HOW TO CHOOSE YOUR CLASSICAL CLASSIFIER? NO RULE OF THUMB - REALLY DEPENDS ON APPLICATION ++ — Python ML METHOD Easy to interpret (“White CARTS / Over-complex trees sklearn.ensemble.RandomFo box”) Unstable restClassifier RANDOM Litte data preparation Biased tress if some classes sklearn.ensemble.RandomFo Both numerical + dominate restRegressor FOREST categorical Easy to interpret + Fast not very well suited to sklearn.svm SVM Kernel trick allows no linear multi-class problems sklearn.svc problems sklearn.neural_network.MP seed of deep-learning more difficult to interpret L_CLassifier very efficient with large NN computing intensive sklearn.neural_network.MP amount of data as we will L_Regressor see

  8. CAN DEPEND ON YOUR MAIN INTEREST credit

  9. ALSO INFLUENCED BY “MAINSTREAM” TRENDS Source

  10. PART II: A FOCUS ON “SHALLOW” NEURAL NETWORKS

  11. THE NEURON INSPIRED BY NEURO - SCIENCE? Credit: Karpathy

  12. THE NEURON INSPIRED BY NEURO - SCIENCE? Credit: Karpathy

  13. Mark I Perceptron FIRST IMPLEMENTATION OF NEURAL NETWORK [Rosenblatt, 1957! ] INTENDED TO BE A MACHINE (NOT AN ALGORITHM) it had an array of 400 photocells, randomly connected to the "neurons". Weights were encoded in potentiometers, and weight updates during learning were performed by electric motors

  14. TODAY’S ARTIFICIAL NEURON x ) = ~ x ) = g ( ~ z ( ~ x + b W. ~ f ( ~ x + b ) W. ~ Pre-Activation Weights Bias Activation Function Output Input

  15. LAYER OF NEURONS x + ~ f ( ~ x ) = g ( W . ~ b ) SAME IDEA. NOW W becomes a matrix and b a vector

  16. Hidden Layers of Neurons FIRST LAYER z h ( x ) = W h x + b h INPUT

  17. ACTIVATION FUNCTION HIDDEN LAYER h ( x ) = g ( z h ( x )) = g ( W h x + b h )

  18. OUTPUT LAYER z 0 ( x ) = W 0 h ( x ) + b 0

  19. PREDICTION LAYER f ( x ) = softmax ( z 0 )

  20. LABEL f W ( ~ x ) = ~ “CLASSICAL” y MACHINE LEARNING Q , SF REPLACE THIS BY A GENERAL 
 NON LINEAR FUNCTION WITH SOME PARAMETERS W NETWORK 
 p = g 3 ( W 3 g 2 ( W 2 g 1 ( W 1 ~ x 0 ))) FUNCTION

  21. WHY HIDDEN LAYERS? More complex functions allow increasing complexity Credit: Karpathy

  22. SO LET’S GO DEEPER AND DEEPER!

  23. SO LET’S GO DEEPER AND DEEPER! YES BUT… NOT SO STRAIGHTFORWARD, DEEPER MEANS MORE WEIGHTS, MORE DIFFICULT OPTIMIZATION, RISK OF OVERFITTING…

  24. LET’S FIRST EXAMINE IN MORE DETAIL HOW SIMPLE “SHALLOW” NETWORKS WORK

  25. ACTIVATION FUNCTIONS? Function ADD NON LINEARITIES TO THE PROCESS

  26. ACTIVATION FUNCTIONS Function

  27. ACTIVATION FUNCTIONS 1 Sigmoid: f ( x ) = T anh : f ( x ) = tanh ( x ) 1 + e − x ReLu : f ( x ) = max (0 , x ) Soft ReLu : f ( x ) = log (1 + e x ) Leaky ReLu : f ( x ) = ✏ x + (1 − ✏ ) max (0 , x )

  28. ACTIVATION FUNCTIONS + 
 MANY 
 OTHERS! 1 Sigmoid : f ( x ) = T anh : f ( x ) = tanh ( x ) 1 + e − x ReLu : f ( x ) = max (0 , x ) Soft ReLu : f ( x ) = log (1 + e x ) Leaky ReLu : f ( x ) = ✏ x + (1 − ✏ ) max (0 , x )

  29. WHAT IS THE MEANING OF THE ACTIVATION FUNCTION? Any real function in a interval (a,b) can be approximated with a linear combination of translated and scaled ReLu functions

  30. WHAT IS THE MEANING OF THE ACTIVATION FUNCTION? Any real function in a interval (a,b) can be approximated with a linear combination of translated and scaled ReLu functions

  31. WHAT IS THE MEANING OF THE ACTIVATION FUNCTION? Any real function in a interval (a,b) can be approximated with a linear combination of translated and scaled ReLu functions

  32. WHAT IS THE MEANING OF THE ACTIVATION FUNCTION? Any real function in a interval (a,b) can be approximated with a linear combination of translated and scaled ReLu functions

  33. WHAT IS THE MEANING OF THE ACTIVATION FUNCTION? Any real function in a interval (a,b) can be approximated with a linear combination of translated and scaled ReLu functions

  34. WHAT IS THE MEANING OF THE ACTIVATION FUNCTION? Any real function in a interval (a,b) can be approximated with a linear combination of translated and scaled ReLu functions

  35. WHAT IS THE MEANING OF THE ACTIVATION FUNCTION? Any real function in a interval (a,b) can be approximated with a linear combination of translated and scaled ReLu functions

  36. WHAT IS THE MEANING OF THE ACTIVATION FUNCTION? Any real function in a interval (a,b) can be approximated with a linear combination of translated and scaled ReLu functions

  37. WHAT IS THE MEANING OF THE ACTIVATION FUNCTION? Any real function in a interval (a,b) can be approximated with a linear combination of translated and scaled ReLu functions

Recommend


More recommend