muxconv information multiplexing in convolutional neural
play

MUXConv: Information Multiplexing in Convolutional Neural Networks - PowerPoint PPT Presentation

MUXConv: Information Multiplexing in Convolutional Neural Networks Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University { luzhicha, kdeb, vishnu } @msu.edu https://github.com/human-analysis/MUXConv MUXConv New


  1. MUXConv: Information Multiplexing in Convolutional Neural Networks Zhichao Lu, Kalyanmoy Deb and Vishnu Naresh Boddeti Michigan State University { luzhicha, kdeb, vishnu } @msu.edu https://github.com/human-analysis/MUXConv

  2. MUXConv ◮ New layer: Multiplexed Convolutions (Spatial + Channel Multiplexing) ◮ Idea: increase flow of information between space and channels ◮ Goal: smaller model size, increased efficiency, maintain/increase performance

  3. Spatial Multiplexing: Idea 2𝐼 𝐷 4 2𝑋 𝐷 𝐼 𝐼 𝐷 𝐷 𝐷 𝐼 𝐷 𝐷 𝑋 𝑋 𝐷 𝑋 𝐼 4𝐷 2 𝑋 Group-wise Conv 2 Spatial Multiplexing Spatial to Channel 𝐼 𝑠 𝐼 channel 𝐼 to spatial 𝐼 𝑠 $ 𝑠 " 𝑠 𝑋 𝑋 𝑋 𝑋 𝑠 𝑋 𝑠 𝑠 Subpixel Superpixel

  4. Spatial Multiplexing: Evaluation 75 72.7 70.4 70 65.5 ImageNet
Top­1
accuracy
(%) 65 60 56.0 55 50 45 40.7 40 5.78 MobileNetV2 MobileNetV2
w/
spatial
multiplexing 35 9 10 9 100 8 2 3 4 5 6 7 8 2 3 Number
of
MAdds
(Millions) ◮ Consistent improvement on accuracy over the original depth-wise separable convolution ◮ Particularly effective in low MAdds Regime

  5. Channel Multiplexing:Idea 1 x 1 conv 1 x 1 conv 1 x 1 conv 3 x 3 conv group-wise + depth-wise + + 3 x 3 conv + 3 x 3 conv 3 x 3 conv 3 x 3 conv 1 x 1 conv 1 x 1 conv 1 x 1 conv (a) Original residual block (b) Bottleneck (c) Inverted bottleneck (d) Proposed (ResNet-18, ResNet-34) (ResNet-50, DenseNet-BC) (MobileNetV2/V3, MNASNet) 8 8 8 8 8 SpatialMUX 7 7 4 4 6 SpatialMUX 1 × 1 1×1 ↓ ↓ ↓ ↓ 1 × 1 1 × 1 6 6 7 7 4 Reduction Block ↓ ↓ ↓ ↓ 5 3 3 5 2 ↓ ↓ ↓ ↓ ↓ 4 4 6 6 7 leave out 3 3 mix-up 2 2 mix-up 5 leave out 2 copy 2 5 5 3 copy 1 1 1 1 1

  6. Channel Multiplexing: Evaluation width
multiplier input
resolution channel
multiplexing 73 73 (w=1.0,
r=224,
l=0.0) (w=1.0,
r=224,
l=0.0) l=0.25 72 72 l=0.25 l=0.5 l=0.5 71 71 r=192 Top
1
accuracy
(%) r=192 70 70 w=0.75 w=0.75 l=0.75 l=0.75 69 69 r=160 r=160 68 68 67 67 66 66 w=0.5 r=128 w=0.5 r=128 65 65 100 150 200 250 300 2 2.5 3 3.5 Number
of
MAdds
(Millions) Number
of
Parameters
(Millions) ◮ Consistently outperforming existing scaling methods

  7. Tri-Objective Search: Idea + 𝒙 = 𝑥 # ,𝑥 " Reference Direction Reference Direction 𝑒 " Region of Region of Interest Interest Reference Point Pareto Surface 𝑒 # Ideal Reference Point Point Ideal Attainable ∗ 𝑨 Point Objective set ◮ Simultaneously optimize for accuracy ( ↑ ), #Params ( ↓ ), and #MAdds ( ↓ ) ◮ User preference guided search through PBI 1 decomposition 1Qingfu Zhang and Hui Li. Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation , 11(6):712731, 2007

  8. Tri-Objective Search: Evaluation 0.96 Attainable models Reference Point-1 Reference Point-2 Reference Point-3 0.94 0.92 Top-1 Accuracy 0.90 0.88 0.94 0.86 0.92 0.84 y 0.90 c a 0.82 r u c 0.88 c A 0.80 1 - p 0 20 40 60 80 o 0.86 Training Time (mins) T 0.84 0.950 0.82 0.925 0.80 0.900 Top-1 Accuracy 80 0.875 Training Time (mins) 60 0 0.850 10 40 20 N u m b e r o f 0.825 P a r 30 20 a m e t e r s ( M i l l i o n ) 40 0.800 50 0 10 20 30 40 50 Ours Ours Regularized Evolution Regularized Evolution Number of Parameters (Million) ◮ NASBench101: our search is more efficient than regularized evolution 2 2Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc VLe. Regularized evolution for image classifier architecture search. In AAAI , 2019

  9. ImageNet-1K Classification MUXNet MobileNetV2 MobileNetV3
large MobileNetV3
small MnasNet MixNet FBNet ChamNet ProxylessNAS
GPU NASNet­A AmoebaNet­A DARTS 80 80 78 78 76 76 74 74 Top
1
accuracy
(%) 72 72 70 70 68 68 66 66 64 64 62 62 60 60 9 100 2 3 4 5 6 7 8 3 4 5 6 7 8 2 3 4 5 6 7 Number
of
Parameters
(Millions) Number
of
MAdds
(Millions) Model Type #MAdds Ratio #Params Ratio CPU(ms) GPU(ms) Top-1 (%) Top-5 (%) 66M ‡ 1.8M ‡ MUXNet-xs (ours) auto 1.0x 1.0x 6.8 18 66.7 86.8 MobileNetV2 0.5 manual 97M 1.5x 2.0M 1.1x 6.2 17 65.4 86.4 6.2 ‡ MobileNetV3 small combined 66M 1.0x 2.9M 1.6x 14 67.4 - MUXNet-s (ours) auto 117M ‡ 1.0x 2.4M ‡ 1.0x 9.5 25 71.6 90.3 MobileNetV1 manual 575M 4.9x 4.2M 1.8x 7.3 20 70.6 89.5 ShuffleNetV2 manual 146M 1.3x - - 6.8 11 ‡ 69.4 - ChamNet-C auto 212M 1.8x 3.4M 1.4x - - 71.6 - 218M ‡ 3.4M ‡ MUXNet-m (ours) auto 1.0x 1.0x 14.7 42 75.3 92.5 8.3 ‡ MobileNetV2 manual 300M 1.4x 3.4M 1.0x 23 72.0 91.0 22 ‡ ShuffleNetV2 2 × manual 591M 2.7x 7.4M 2.2x 11.0 74.9 - MnasNet-A1 auto 312M 1.4x 3.9M 1.1x 9.3 ‡ 32 75.2 92.5 10.0 ‡ MobileNetV3 large combined 219M 1.0x 5.4M 1.6x 33 75.2 - 318M ‡ 4.0M ‡ MUXNet-l (ours) auto 1.0x 1.0x 19.2 74 76.6 93.2 MnasNet-A2 auto 340M 1.1x 4.8M 1.2x - - 75.6 92.7 9.1 ‡ FBNet-C auto 375M 1.2x 5.5M 1.4x 31 74.9 - 390M ‡ EfficientNet-B0 auto 1.2x 5.3M 1.3x 14.4 46 76.3 93.2 360M ‡ MixNet-M auto 1.1x 5.0M 1.2x 24.3 79 77.0 93.3 ‡ indicates the objective that the method explicitly optimizes through NAS.

  10. Additional Experiments Generalization to ImageNet-V2 94 ImageNet ImageNet­V2 92 90 7.7 Top­5
Accuracy
(%) 8.2 7.8 8.3 88 8.1 9.1 9.1 86 8.9 9.5 8.6 8.8 84 10.0 82 80 78 ShuffleNetV2 ResNet18 MUXNet­s
(ours) GoogLeNet MobileNetV2 DARTS MnasNet­A1 NASNet­A
mobile MUXNet­m
(ours) MUXNet­l
(ours) DenseNet­169 ResNeXt50
32x4d PASCAL VOC2007 Detection ADE20K Semantic Segmentation Network #MAdds #Params mAP (%) Network #MAdds #Params mIoU (%) Acc (%) VGG16 + SSD 35B 26.3M 74.3 ResNet18 + C1 1.8B 11.7M 33.82 76.05 MobileNet + SSD 1.6B 9.5M 67.6 MobileNetV2 + C1 0.3B 3.5M 34.84 75.75 MobileNetV2 + SSDLite 0.7B 3.4M 67.4 MUXNet-m + C1 0.2B 3.4M 32.42 75.00 MobileNetV2 + SSD 1.4B 8.9M 73.2 ResNet18 + PPM 1.8B 11.7M 38.00 78.64 MUXNet-m + SSDLite 0.5B 3.2M 68.6 MobileNetV2 + PPM 0.3B 3.5M 35.76 77.77 MUXNet-l + SSD 1.4B 9.9M 73.8 MUXNet-m + PPM 0.2B 3.4M 35.80 76.33

  11. Additional Experiments Transfer Learning on CIFAR MUXNet ResNet-50 DenseNet-169 Inception v3 MobileNetV1 MobileNetV2 NASNet-A mobile EfficientNet-B0 MixNet-M CIFAR-10 CIFAR-10 CIFAR-100 CIFAR-100 88 88 98 98 Top 1 accuracy (%) 87 87 97.5 97.5 86 86 85 85 97 97 84 84 96.5 96.5 83 83 82 82 96 96 81 81 2 3 4 5 6 7 8 9 10 2 100 2 5 1000 2 5 2 3 4 5 6 7 8 9 10 2 100 2 5 1000 2 5 1 1 Number of Mult-Adds (Millions) Number of Mult-Adds (Millions) Number of Parameters (Millions) Number of Parameters (Millions) Robustness to Degradations Visualization on Segmentation Results 2.3 Test 2.2 ShuffleNetV2 2.1 images MobileNetV2 2 DARTS 1.9 Normalized
Top­5
Acc. 1.8 MnasNet­A1 MUXNet­m 1.7 1.6 Ground truth 1.5 1.4 1.3 1.2 MUXNet-m 1.1 +PPM b r c o d e e l a o f g r f o g a g a g l a m i p j e m o p i x a s s h s n s p s p z o i g h n t o f t s s t u s u s s s p u g i t e l t u r o t o w a t t e c m o n t r a s c u i c _ s i a s a i _ b l s _ c o n a t t a _ n e r k l e _ e s t s _ t r n _ n _ l u e _ o m _ b e e o s i _ n b l u s b l u a n b n r n o p r l u e o r r s f u l r o s i i s e e s r i s e r o m e i s o n

Recommend


More recommend