feature extraction from deep models
play

Feature extraction from deep models Olgert Denas Synopsis Intro - PowerPoint PPT Presentation

Feature extraction from deep models Olgert Denas Synopsis Intro to deep models Applications Neurons & Nets dimer Learning & Depth G1E model Feature extraction Theory 1 Layer Nets Neural computation


  1. Feature extraction from deep models Olgert Denas

  2. Synopsis Intro to deep models Applications ● Neurons & Nets ● dimer ● Learning & Depth ● G1E model Feature extraction ● Theory ● 1 Layer ● Nets

  3. Neural computation Inspired by organic neural systems A system of simple computing units with learnable parameters Intended for conventional computing efficient arithmetic and calculus but von Neumann’s architecture “won”

  4. Neural computation Mainly in machine learning Declarative: unambiguous sort an array of integers Procedural: can only state by examples find fraud in network logs

  5. Artificial Neural Nets

  6. Neurons

  7. Neurons The artificial neuron is very different from the biological one, after all it is a model

  8. Neurons Natural - organic Artificial transfer function complicated parametric function mixed communication discrete or continuous continuous/impulse no state, output is f(x; θ), state, chemical, physical changes fixed connections synaptic delays, long axon computational delays

  9. Nets of neurons

  10. Computers and brains Brain Computer speed ms / operation ns / operation size Tera nodes, Peta conns Giga nodes Memory content addressable, in contiguous, random access connections Computing Distributed / fault tolerant Centralized / non-ft Power 10W ~ 300W (GPU)

  11. Organic vs. artificial computer

  12. ANNs architectures Feed forward NNs (and CNNs) Recurrent NNs RBMs

  13. Feed Forward Directed Acyclic Graph Input (first), hidden, and output (last) layers Connections from a layer to next Transfer functions are nonlinearities

  14. Recurent Directed graph with cycles Possibly, hidden layers More complicated, realistic, and powerful Well-suited to sequential input Unroll the hidden state, just like DBNs

  15. Restricted Boltzman Machines Probabilistic model (energy function) A bipartite graph (visible <->hidden) Efficient inference

  16. ANN: Learning

  17. Learning: perceptron Loop through labeled examples Output Unit - on incorrect output: * case 0: w <- w + x W1 W2 * case 1: w <- w - x Input Units X1 X2 Guaranteed separating hyperplane

  18. Learning: perceptron Parity, or counting problem: recognize binary strings of length 2 with exactly one 1 red class: 01, 10 green class: 00, 11 Many other problems (Minsky & Papert 1969)

  19. Learning: features Output Unit Input Units

  20. Learning: features 00: no unit is activated => 0 11: hidden unit cancels inputs 01, 10: inputs connect directly to output 0 0

  21. Learning: features 00: no unit is activated => 0 11: hidden unit cancels inputs 01, 10: inputs connect directly to output 1 1

  22. Learning: features 00: no unit is activated => 0 .5 11: hidden unit cancels inputs 01, 10 : inputs connect directly to output 0 1

  23. Learning: features 00: no unit is activated => 0 .5 11: hidden unit cancels inputs 01 , 10: inputs connect directly to output 1 0

  24. Learning: perceptron Perceptron guaranties a SH if a SH exists Learning from input features requires a lot of “(big) data science” Have the NN do the “(big) data science!”

  25. Deep supervised learning paradigm Map “raw” input into intermediate hidden layers Deep means: more layers, means more efficient, means harder-to-train Classify the hidden representation of data Learn weights for both steps above using backprop or pre- training

  26. Feature extraction

  27. Feature extraction Trained NNs can be used to predict, but they are black boxes It is hard to relate high weights with input features How do we map features from hidden layers back to the input space?

  28. Learning W, b Batch SGD Early stop, regularization and a lot of tricks Maximize average of P(Y|X;θ) over training data I.e., find a θ with low entropy

  29. Feature extraction: 1 layer P(Y|X;θ) = f(WX T + b)

  30. Feature extraction: 1 layer Y 0 1 Given trained model and label, find input: P(Y | E[X 0 ]) c 0 = f θ (E[X 0 ]) 2/3 1/3 * with that label θ = {W, b} * minimized gray area E[X 0 ]

  31. Feature extraction: 1 layer Y 0 1 Given trained model and label, find input: P(Y | E[X 0 ]) c 0 = f θ (E[X 0 ]) 2/3 1/3 * with that label θ = {W, b} * minimized gray area E[X 0 ]

  32. Feature extraction: 1 layer l: label X l : input features E[ X l ]: input average for that label f θ (E[ X ]): decision boundary c l : f θ (E[ X l ]), constraint boundary ε: slack (see below) This is an LP !

  33. Feature extraction on a stack

  34. Feature extraction: ε The slack variable is a control on the CE achieved by extracted features Useful, if avg. input achieves 0.01 CE, but you are happy with 0.2

  35. Linear programing (in 1 page) Optimization problems that: minimize a linear cost function satisfy linear constraints very efficient, for continuous variables (simplex)

  36. Feature extraction: implementation

  37. Mnist digits 28x28 pixel binarized handwritten digit images pick pairs and extract differentiating features

  38. Effect of ε on |X l |

  39. Effect of optimization

  40. Features

  41. Feature extraction: applications

  42. Hematopoiesis & erythroid diff. Genes dev. 8(10):1184-97, 1994 Genome Res. 21(10):1659-71, 2011

  43. Application: G1e Model

  44. dimer

  45. dimer is @ http://bitbucket.org/gertidenas/dimer PULL IT!

Recommend


More recommend