Xiao CHU ( 初晓 ) Supervisor: Xiaogang Wang • The Chinese University of Hong Kong • 4 th year Ph.D. student • Computer vision, Human pose estimation • 1. Structured Feature Learning for Pose Estimation Xiao Chu , Wanli Ouyang, Hongsheng Li, and Xiaogang Wang, CVPR , 2016 2. CRF-CNN: Modelling Structured Information in Human Pose Estimation Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang, NIPS , 2016.
Structured Feature Learning for Human Pose Estimation Xiao Chu, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang
Human Pose estimation is estimate the joint location of each body part.
Structured Prediction CNN + Structured Prediction Structured Feature 1. Build up structure Tompson et al., NIPS'2014 at feature level Chen&Yuille, NIPS'2014 2. Pass message with Fan et al. , CVPR’2015 geometrical transfer kernel Yang et al. , CVPR’2016 3. Bidirectional tree
Fully convolutional net for Human pose estimation Fully Convolutional layers Head 1 × 1 kernel VGG Neck 𝑑𝑝𝑜𝑤 1~6\{𝑞𝑝𝑝𝑚4,5} 𝑔𝑑𝑝𝑜𝑤 7 Wrist Input Image 448 × 448 0.9 Prediction 56 × 56 0.02
Fully convolutional net for Human pose estimation VGG 𝑑𝑝𝑜𝑤 1~6\{𝑞𝑝𝑝𝑚4,5} e1 h1 e2 h2 e3 h3 e4 h4 e5 h5 e6 h6 e5 h2 Consistent Exclusive e7 h7 e1 h1 h2 e2 h3 e3 h4 e4 h5 e5 h6 e6 e4 h6 h7 e7
Structured Structured Feature Prediction VGG 𝑑𝑝𝑜𝑤 1~6\{𝑞𝑝𝑝𝑚4,5}
VGG 𝑑𝑝𝑜𝑤 1~6\{𝑞𝑝𝑝𝑚4,5} Structured Feature Learning B1 A1 Positive Direction Revert Direction B2 A2 B7 A7 B3 A3 B8 A8 B4 A4 B5 B9 A5 A9 B6 B10 A6 A10
⨁ 𝑓 𝑜 Input image Feature maps for elbow Updated feature maps for elbow ⨂ Learned kernel ℎ 𝑛 Feature maps for Shifted feature maps downward lower arm
Experimental Results on FLIC dataset Percentage of Correct Parts ( strict PCP) MODE[1] Tompson et al. [2] Tompson et al. [3] Chen&Yuille [4] Ours 97.9 95.2 94.7 93.7 92.4 97 91.9 88.8 87.3 86.8 84.4 82.8 80.9 68.3 52.1 U.ARMS L.ARMS MEAN Percentage of Detected joints (PDJ) fi fl fi fi fl ⇥ ⇥ fi ⇥ ⌦ fi fi “ Fr am es C i nem a” “ Leeds Poses” − − − − fi
Experimental Results on LSP dataset Andriluka et al. [5] Yang&Ramanan [6] Pishchulin et al. [7] Eichner&Ferrari et al.[8] Ouyang et al. [9] Pishchulin et al. [10] Chen&Yuille[4] Ours 6.1% 95.4 92.7 89.6 88.7 87.5 87.8 87.6 86.2 85.8 85.1 82.9 83.1 82.9 83.2 81.1 80.9 80.1 79.3 78.9 78.1 76.5 75.7 74.9 74.3 77 73.2 77 72.2 75 70.3 69.2 69.3 69.2 68.6 67.1 65.2 68 64.3 63.3 67 62.8 62.9 61.8 60.7 56.5 55.4 55.7 54.2 56 46.5 46.6 45 39.8 37.4 33.9 TORSO HEAD U.ARMS L.ARMS U.LEGS L.LEGS MEAN Percentage of Correct Parts ( strict PCP)
Pose estimation results on the FLIC dataset Robust to disturbance Robust to occlusion
Pose estimation results on the LSP dataset Correct reasoning on extreme poses.
CRF-CNN: Modeling Structured Information in Human Pose Estimation Xiao Chu, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang
fi fi Structure fi ✓ 1. Making CNN deeper with advanced structured design fi R N N ’ 2. Build up structure at feature level or prediction level fi fi Lower Arm: Upper Arm: Elbow: Wrist: C R F- R N N ’ FCN CRF-RNN Conditional Random field Tree-Structured fi solved with mean field graphical model fi approximation fi We need a graphical model at … feature level to guide the design fi … … of structured feature fl … … … fi … fi − fi fi fi fi − fi ✓ φ fl ✓ ✓ φ fi fi fi fi fi fi ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ ⌦ ⌦ fi fi ⇥ fl fi fi ⇥ fi fl fi fi
𝒜 … … … … 𝜁 𝑨 𝜁 𝑨ℎ … … … … 𝒊 𝜁 ℎ 𝑱 (b) Structured (c) Structured (a) Multi-layer (d) Our output space hidden layer neural network implementation Model (a) 𝐹𝑜 𝐴, 𝐢, 𝐉, Θ = 𝜔 𝑨ℎ (𝐴 𝑗 , ℎ 𝑙 ) + ∅ ℎ (ℎ 𝑙 , 𝐉) (𝑗,𝑙)∈𝜁 𝑨ℎ 𝑙 Model (b) 𝐹𝑜 𝐴, 𝐢, 𝐉, Θ = 𝜔 𝑨 (𝐴 𝑗 , 𝒜 𝑘 ) + 𝜔 𝑨ℎ (𝐴 𝑗 , ℎ 𝑙 ) + ∅ ℎ (ℎ 𝑙 , 𝐉) (𝑗,𝑘)∈𝜁 𝑨 (𝑗,𝑙)∈𝜁 𝑨ℎ 𝑙 𝑗<𝑘 𝐹𝑜 𝐴, 𝐢, 𝐉, Θ = 𝜔 𝑨 (𝐴 𝑗 , 𝒜 𝑘 ) + 𝜔 𝑨 (ℎ 𝑙 , ℎ 𝑚 ) + 𝜔 𝑨ℎ (𝐴 𝑗 , ℎ 𝑙 ) + ∅ ℎ (ℎ 𝑙 , 𝐉) Model (c) (𝑗,𝑘)∈𝜁 𝑨 (𝑙,𝑚)∈𝜁 ℎ (𝑗,𝑙)∈𝜁 𝑨ℎ 𝑙 𝑗<𝑘 𝑙<𝑚 Model (d) 𝐹𝑜 𝐴, 𝐢, 𝐉, Θ = 𝜔 𝑨 (ℎ 𝑙 , ℎ 𝑚 ) + 𝜔 𝑨 (𝐴 𝑗 , 𝒜 𝑘 ) + 𝜔 𝑨ℎ (𝐴 𝑗 , ℎ 𝑗 ) + ∅ ℎ (ℎ 𝑙 , 𝐉) (𝑙,𝑚)∈𝜁 ℎ (𝑗,𝑘)∈𝜁 𝑨 𝑗 𝑙 𝑙<𝑚 𝑗<𝑘
Target 𝑞 𝐢 𝐉, Θ = 𝑅(𝐢 𝑗 |𝐉, Θ) 𝑗 Mean Field Approximation 𝑅 𝐢 𝑗 𝐉, Θ = 1 𝑓𝑦𝑞 − ∅ ℎ ℎ 𝑙 , 𝐉 − 𝜒 ℎ (𝐢 𝑗 , 𝑅(𝐢 𝑘 |𝐉, Θ) 𝑨 ℎ,𝑗 ℎ 𝑙 ∈𝐢 𝑗 (𝑗,𝑘)∈𝛇 ℎ 𝑗<𝑘
ℎ 3 Flooding update 𝑔 𝑐 ℎ 2 𝑅 𝑢 𝐢 𝑗′ ⨂𝐱 𝑗 ′ →𝑗 𝑅 𝑢+1 𝐢 𝑗 = 𝜐 ∅ 𝐢 𝑗 + 𝑔 𝑔 𝑏 𝑗′∈𝒲 𝑂(𝑗)\i 𝑑 ℎ 1 ℎ 4 𝑔 𝑒 ′ < −ℎ 1 ′ < −ℎ 2 ℎ 4 ℎ 1 < −ℎ 2 < −ℎ 4 ℎ 5 < −ℎ 5 ′ < −ℎ 1 ′ < −ℎ 2 ℎ 2 ℎ 3 < −ℎ 3 < −ℎ 4 ′ < −ℎ 1 ℎ 5
Serial update ℎ 3 ℎ 1 → ℎ 2 𝑔 𝑐 ℎ 2 𝑔 𝑔 𝑏 𝑑 ℎ 1 ℎ 4
Serial update ℎ 3 ℎ 1 → ℎ 2 𝑔 𝑐 ℎ 4 → ℎ 2 ℎ 2 𝑔 𝑔 𝑏 𝑑 ℎ 1 ℎ 4
Serial update ℎ 3 ℎ 1 → ℎ 2 𝑔 𝑐 ℎ 4 → ℎ 2 ℎ 2 𝑔 ℎ 2 ′ → ℎ 3 𝑔 𝑏 𝑑 ℎ 1 ℎ 4 ℎ 3 is marginalized.
Serial update ℎ 3 ℎ 1 → ℎ 2 𝑔 𝑐 ℎ 4 → ℎ 2 ℎ 2 𝑔 ℎ 2 ′ → ℎ 3 𝑔 𝑏 𝑑 ℎ 1 ℎ 4 ℎ 3 is marginalized. ℎ 3 → ℎ 2 ′ ℎ 2 is marginalized.
Serial update ℎ 3 ℎ 1 → ℎ 2 𝑔 𝑐 ℎ 4 → ℎ 2 ℎ 2 𝑔 ℎ 2 ′ → ℎ 3 𝑔 𝑏 𝑑 ℎ 1 ℎ 4 ℎ 3 is marginalized. ′ ′ ℎ 3 → ℎ 2 ℎ 2 is marginalized. ′′ \ℎ 1 −> ℎ 1 ℎ 2 ′′ \ℎ 4 −> ℎ 4 ℎ 2 ℎ 1 and ℎ 4 is marginalized.
CVPR’16 V.S. NIPS’16 A1 Positive Direction 1 path A2 A7 A3 A8 A4 A5 A9 A6 A10 B1 Revert Direction B2 B7 B3 B8 B4 B5 B9 B6 B10
RESULTS ON LSP (PCP) Chen&Yuille NIPS'2014 Yang et al. CVPR'2016 Chu et al. CVPR'2016 Ours 96.5 95.4 96 92.7 91.3 89.6 89.5 88.7 87.8 87.6 83.1 83.2 83.1 82.9 85 81.7 81.1 81.1 78.8 80 76.9 77 75 69.2 67.1 66.7 65.2 55.4 TORSO HEAD U. ARMS L.ARMS U.LEGS L.LEGS MEAN
COMPONENT ANALYSIS (PCP) Flooding-2itrs-tree Flooding-2itrs-loopy Serial-tree(ReLU) Serial-tree(Softmax) 95.5 93.5 96 91.3 94 89.5 88.9 88.2 87.1 86.7 84.3 83.7 83.1 85 81.4 80.1 78.4 77.1 80 80 75.9 79 74.4 73 67.1 63.8 62.1 59.8 TORSO HEAD U.ARMS L.ARMS U.LEGS L.LEGS MEAN
(a) (b) (c) (b) (a) (c) (a) Flooding-2itr-tree (b) Flooding-2itr-loopy (c) Final model
Thank you!
Conditional Random Field 𝑞 𝐴 𝐉, Θ = 𝑞(𝐴, 𝐢|𝐉, Θ) ℎ Where, 𝑓 −𝐹𝑜(𝐴,𝐢,𝐉,Θ) 𝑞 𝐴, 𝐢 𝐉, Θ = 𝑨∈𝒶,ℎ∈ℋ 𝑓 −𝐹𝑜(𝐴,𝐢,𝐉,Θ)
Recommend
More recommend