random forests vs deep learning
play

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, - PowerPoint PPT Presentation

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26 th , 2015 RF vs. DL Goal: prediction (classification, regression) max pooling ConvD2 HLV1 HLV2 (, x) vD1 shared hid tree 1


  1. Random Forests vs. Deep Learning Christian Wolf Université de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26 th , 2015

  2. RF vs. DL Goal: prediction (classification, regression) max pooling ConvD2 HLV1 HLV2 (𝐽, x) vD1 shared hid tree 1 HLS vC1 ConvC2 ConvD2 HLV1 𝑄 � 𝑄 � (𝑑) vD1 HLV2 (𝐽, x) vC1 ConvC2 HLM2 HLM3 tree 𝑈 … HLM1 eature extractor 𝑄 � (𝑑) HLA2 HLA1 equency ConvA1 ograms

  3. Deep networks - Many layers, many parameters …. and all of them are used for testing, for each single sample! - Feature learning integred into classification - End-to-end training, using gradient of the loss function max pooling ConvD2 Layer Filter size / n.o. units N.o. parameters Pooling HLV1 Paths V1, V2 HLV2 Input D1,D2 72 × 72 × 5 - 2 × 2 × 1 ConvD1 25 × 5 × 5 × 3 1900 2 × 2 × 3 vD1 ConvD2 25 × 5 × 5 650 1 × 1 shared hid HLS Input C1,C2 72 × 72 × 5 - 2 × 2 × 1 vC1 ConvC1 25 × 5 × 5 × 3 1900 2 × 2 × 3 ConvC2 ConvD2 ConvC2 25 × 5 × 5 650 1 × 1 HLV1 HLV1 3 240 900 - 900 HLV2 450 405 450 - vD1 Path M Input M 183 - HLV2 HLM1 700 128 800 - vC1 ConvC2 HLM2 HLM3 HLM2 700 490 700 - HLM3 350 245 350 - Path A Input A 40 × 9 - 1 × 1 HLM1 eature ConvA1 25 × 5 × 5 650 1 × 1 extractor HLA1 700 3 150 000 - HLA2 350 245 350 - Shared layers HLA2 HLA1 HLS1 3 681 600 - equency ConvA1 1600 ograms HLS2 84 134 484 - Output layer 21 1785 - 12.4M per scale = 37.2M parameters total!

  4. Random Forests - Many levels, many parameters …. but only log 2 (N) of them are used for testing! - Training is done layer-wise, no end-to-end. No gradient on the objective function - No/limited feature learning (𝐽, x) tree 1 𝑄 𝑄 � (𝑑)

  5. RF vs DL : applications (1) Full body pose, Random Forest Hand pose with a deep network 3 Trees, depth 20 Semi/weakly supervised training >10M parameters 8 layers, ~5M parameters [Shotton et al., CVPR 2011] [Neverova, Wolf, Nebout, Taylor, (Microsoft Research) under review, arXiv 2015]

  6. RF vs DL : applications (2) Scene parsing with structured Scene parsing with deep networks random forests (5 layers, ~2M parameters) [Kontschieder et al., CVPR 2014] [Fourure, Emonet, Fromont, Muselet, (Microsoft Research) Tremeau, Wolf, under review]

  7. Types of random forests (𝐽, x) Classical random forests tree 1 𝑄 � 𝑄 � (𝑑) Structured random forests R Neural random forests R 2 X R R f (0) : X → R 3 Deep convolutional random forests f 11 f 9 f 12 f 8 f 13 f 10 f 14 d 8 d 9 d 10 d 11 d 12 d 13 d 14 π 9 π 10 π 11 π 12 π 13 π 14 π 15 π 16

  8. Example for classical RF Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton Andrew Fitzgibbon Mat Cook Toby Sharp Mark Finocchio Richard Moore Alex Kipman Andrew Blake Microsoft Research Cambridge & Xbox Incubation [Shotton et al., CVPR 2011] (Best Paper!)

  9. Depth images -> 3D joint locations depth image body parts 3D joint proposals [Shotton et al., CVPR 2011]

  10. synthetic (train & test) 31 ¡body ¡parts (Labels) real (test) [Shotton et al., CVPR 2011]

  11. Classification with random forests (𝐽, x) (𝐽, x) tree 1 tree 𝑈 … 𝑄 � (𝑑) 𝑄 � (𝑑) Each split node thresholds one of the features Each leaf node contains a class distribution ution P t ( c | I, x ) Class distributions are averaged over the trees: utions are averaged T P ( c | I, x ) = 1 X P t ( c | I, x ) . T t =1 [Shotton et al., CVPR 2011]

  12. Learning & Entropy A good split function minimizes entropy in the label distributions. Q 444 0.2 0.4 0.4 Q l ( θ ) Q r ( θ ) 0.33 0.66 0.01 0.33 0.01 0.66

  13. Random forests: learning algorithm 1. Randomly propose a set of splitting candidates φ = (𝐽, x) ( θ , τ ) (feature parameters θ and thresholds τ ). tree 1 2. Partition the set of examples Q = { ( I, x ) } into left and right subsets by each φ : Q l ( φ ) = { ( I, x ) | f ✓ ( I, x ) < τ } (3) 𝑄 𝑄 � (𝑑) Q r ( φ ) = Q \ Q l ( φ ) (4) 3. Compute the φ giving the largest gain in information: Training: φ ? = argmax G ( φ ) (5) 3 ¡trees ¡ • � depth ¡20 ¡ • | Q s ( φ ) | X G ( φ ) = H ( Q ) − H ( Q s ( φ )) (6) 1.000.000 ¡images ¡ • | Q | s 2 { l , r } 2000 ¡candidate ¡features • where Shannon entropy H ( Q ) is computed on the nor- 50 ¡thresholds ¡ per ¡feature • malized histogram of body part labels l I ( x ) for all ( I, x ) ∈ Q . 4. If the largest gain G ( φ ? ) is sufficient, and the depth in 1 ¡day ¡: 1000 ¡cores ¡cluster the tree is below a maximum, then recurse for left and right subsets Q l ( φ ? ) and Q r ( φ ? ) . [Shotton et al., CVPR 2011]

  14. Examples Figure 5. Example inferences. Synthetic (top row); real (middle); failure [Shotton et al., CVPR 2011]

  15. Dependencies of results on hyper-parameters [Shotton et al., CVPR 2011]

  16. Types of random forests (𝐽, x) Classical random forests tree 1 𝑄 � 𝑄 � (𝑑) Structured random forests R Neural random forests R 2 X R R f (0) : X → R 3 Deep convolutional random forests f 11 f 9 f 12 f 8 f 13 f 10 f 14 d 8 d 9 d 10 d 11 d 12 d 13 d 14 π 9 π 10 π 11 π 12 π 13 π 14 π 15 π 16

  17. Structured random forests In the classical version, decision (=leaf) nodes contain predictions for a single pixel (a label or a posterior distribution) In the structured version, a decision node is assigned a rectangular patch of predictions. x ( u, v ) p 1. Training data example, as used in our proposed [Kontschieder et al., ICCV 2011]

  18. Structured version : integration Integration over multiple pixels by vote: [Kontschieder et al., ICCV 2011]

  19. Types of random forests (𝐽, x) Classical random forests tree 1 𝑄 � 𝑄 � (𝑑) Structured random forests R Neural random forests R 2 X R R f (0) : X → R 3 Deep convolutional random forests f 11 f 9 f 12 f 8 f 13 f 10 f 14 d 8 d 9 d 10 d 11 d 12 d 13 d 14 π 9 π 10 π 11 π 12 π 13 π 14 π 15 π 16

  20. Neural Decision Forests for Semantic Image Labelling Samuel Rota Bul` o Peter Kontschieder Fondazione Bruno Kessler Microsoft Research Trento, Italy Cambridge, UK rotabulo@fbk.eu pekontsc@microsoft.com [Rota Bulo and Kontschieder, CVPR 2014]

  21. Neural split functions Classical random forest with neural split functions (𝐽, x) tree 1 W (1) ∈ R 4 × 4 +1 W (2) ∈ R 5 × 1 +1 R R R f ( x ) x 2 X R f (2) : R 4 → R 𝑄 𝑄 � (𝑑) R f (0) : X → R 3 f (1) : R 3 → R 4 [Rota Bulo and Kontschieder, CVPR 2014]

  22. Learning the neural split function (1) Probabilistic loss function: 𝑄 = ( π ( L ) ) , π ( R ) ) Q ( Θ ) = max P [ y | X , π , Θ ] , π 𝑄 � (𝑑) Labels Node input samples Latent distributions of labels routed to Network parameters left and right child nodes of π = ( π ( L ) , π ( R ) ) , n Y P [ y | X , π , Θ ] = P [ y s | x s , π , Θ ] , Samples are independent: s =1 X P [ y s | x s , π , Θ ] = P [ y s | ψ s = d, π ] P [ ψ s = d | x s , Θ ] d ∈ { L , R } X π ( d ) = y s f d ( x s | Θ ) . d ∈ { L , R } [Rota Bulo and Kontschieder, CVPR 2014]

  23. Learning the neural split function (2) Learning procedure alternates between two steps : Q ( Θ ) = max P [ y | X , π , Θ ] , π 1. Update child distributions 2. Update network parameters (backprop) [Rota Bulo and Kontschieder, CVPR 2014]

  24. Results on semantic labelling φ ( x ) Input layer f (0) h x ∈ X Normalization Figure 1. Example input RGB image and learned representations of our rMLP taken from a hidden layer, visualized using heat-maps. E TRIMS 8 C AMVID Method Global Class-Avg Jaccard Global Class-Avg Jaccard RF Baseline 64.5 ± 1.6 59.6 ± 1.7 40.3 ± 1.1 64.0 41.6 27.2 NDF P 69.8 ± 1.8 64.3 ± 2.2 45.0 ± 1.9 67.4 46.5 30.8 NDF MLP 68.9 ± 2.0 62.4 ± 2.3 44.2 ± 2.1 67.1 44.4 30.1 NDF MLPC 69.7 ± 1.7 62.5 ± 2.1 44.7 ± 1.9 67.4 44.2 30.2 NDF MLPC − ` 1 71.7 ± 2.0 ( +7.2) 65.3 ± 2.3 ( +5.7) 46.9 ± 2.0 ( +6.6) 69.0 ( +5.0) 46.8 ( +5.2) 31.7 ( +4.5) RF Baseline 72.2 ± 1.9 68.0 ± 0.8 47.5 ± 1.0 68.5 50.3 32.4 NDF MLPC − ` 1 80.8 ± 0.7 ( +8.6) 74.6 ± 0.7 ( +6.6) 56.9 ± 1.2 ( +9.4) 82.1 ( +13.6) 56.1 ( +5.8) 43.3 ( +10.9) Best RF in [13] 76.1 72.3 - - - - Best in [14] 75.1 72.4 - - - - Best RF in [19] - - - - - 38.3 Best RF in [20] - - - 72.5 51.4 36.4 Best in [8] - - - 69.1 53.0 - Best in [35] - - - 73.7 36.3 29.6 [Rota Bulo and Kontschieder, CVPR 2014]

  25. Types of random forests (𝐽, x) Classical random forests tree 1 𝑄 � 𝑄 � (𝑑) Structured random forests R Neural random forests R 2 X R R f (0) : X → R 3 Deep convolutional random forests f 11 f 9 f 12 f 8 f 13 f 10 f 14 d 8 d 9 d 10 d 11 d 12 d 13 d 14 π 9 π 10 π 11 π 12 π 13 π 14 π 15 π 16

  26. One model to rule them all … Deep Neural Decision Forests Peter Kontschieder 1 Madalina Fiterau ∗ , 2 Antonio Criminisi 1 o 1 , 3 Samuel Rota Bul` Microsoft Research 1 Carnegie Mellon University 2 Fondazione Bruno Kessler 3 Cambridge, UK Pittsburgh, PA Trento, Italy [Kontschieder et al., ICCV 2015]

  27. Goals Combine neural networks and random forests Advantage of NN : representation learning Advantage of RR : divide and conquer Différentiable loss function, allowing gradient backprop « Backpropagation trees » [Kontschieder et al., ICCV 2015]

Recommend


More recommend