A Solution for Densely Annotated Large Scale Object Detection Task Yuan Gao, Hui Shen, Donghong Zhong, Jian Wang, Zeyu Liu, Ti Bai, Xiang Long and Shilei Wen
Ob Object 365 365 Da Datas aset # Box in Box Num Image Image Box Area Pretrain # Class # Image Max # Box Total Avg Height Avg Width Avg Avg (Pixel) COCO17 80 118287 0.86M 7.27 484 577 12025 93 (Train) Object 365 365 608606 9.62M 15.81 536 662 14074 835 (Train)
Ful Full Track ck
R50 50 Ca Cascade RC RCNN Object365 Va Ob Validation(mA mAP) 38 36 34 32 30 28 26 24 22.73 22 20 Baseline R50
Neural Ar Ne Architecture Se Search ch P6 RPN An RL based Neural Architecture Search • P5 C5 is adopted. RPN C4 P4 NASFPN RPN C3 The NAS-FPN module is directly P3 • RPN C2 cascaded behind the original FPN P2 RPN module. A strong architecture found by prior knowledge[1] is used to initialized the NAS-FPN • searching procedure. [1] Ghiasi G, Lin T Y, Pang R, et al. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. arXiv:1904.07392, 2019.
Ne Neural Ar Architecture Se Search ch P6 N6 P6 N5 C5 P5 P5 P4 N4 C4 P4 P3 N3 C3 P3 P2 N2 C2 P2 (b) Our NAS-FPN (a) FPN The architecture graph of original FPN and NAS-FPN after ~400 episodes
Ne Neural Ar Architecture Se Search ch Object365 Va Ob Validation (mA mAP) 38 36 34 32 30 28 26 23.66 24 22.73 22 20 Baseline R50 NASFPN
Cl Class Div Diversit ity Se Sensitive Sa Samp mpling 15 Classes 4 Classes Sampling probability equally is not appropriate.
Class Div Cl Diversit ity Se Sensitive Sa Samp mpling The i th image contains 15 Classes " = ln 𝐷 " + 𝜁 ∑ %-. / 𝑋 𝑄 % 𝐼 % " = Sampling weight of the i th image. 𝑋 C " = Total number of the classes of the i th image. N = Total number of the classes of the dataset. % = The c th class prior probability, according to the total box 𝑄 number of the dataset. 𝐼 % = 1 if the images contains class c or 0.
Cl Class Div Diversit ity Se Sensitive Sa Samp mpling Object365 Validation (mAP) 30 25 20 15 10 5 0 iter 49999 99999 149999 199999 249999 299999 349999 Random Class Diverisity Sensitive
Cl Class Div Diversit ity Se Sensitive Sa Samp mpling Object365 Va Ob Validation (mA mAP) 38 36 34 32 30.7 30.1 30 28 26 25.01 23.66 24 22.73 22 20 Baseline R50 NASFPN CDSS SENet154+GN OHEM+Deformable
La Large Re Resolution Bo Box Head Head 7X7 Box conv conv conv conv RoI-Align GN GN GN GN Class 9X9 Box conv conv conv conv conv RoI-Align GN GN GN GN GN Class
La Large Re Resolution Bo Box Head Head Object365 Va Ob Validation (mA mAP) 38 36 34 32 30.9 30.7 30.1 30 28 25.01 26 23.66 24 22.73 22 20 0 N S N e d 5 S P l a G b D R e F + a C H S e 4 m A n 5 n r N i 1 o o l e t f i e t s e u a N D l B E o + s S M e R E H e O g r a L
Cascade RC Ca RCNN Te Testing C C1 C2 B3 B0 B1 B2 C3 Use the predicted bbox of the 2nd • stage to extract the feature. H2 H1 H2 H3 H1 (Standard Cascade RCNN) pool pool pool F [1] Cai Z, Vasconcelos N. Cascade R-CNN: Delving into high quality object detection. CVPR.2018
Ca Cascade RC RCNN Ad Adaptive Te Testing C B0 C1 B2 C2 B3 C3 B1 Use the predicted bbox of each • stage itself to extract the feature. H1 H1 H2 H2 H3 H3 pool pool pool pool F
Ca Cascade RC RCNN Ad Adaptive Te Testing Object365 Va Ob Validation (mA mAP) 38 36 33.1 34 31.1 30.9 30.7 32 30.1 30 28 25.01 26 23.66 24 22.73 22 20 0 N N d S e g g S n 5 l a P G n b R D e i F + t i a C H t S s e m 4 s e A e n 5 n T r N T i 1 o o l e e t f i d e t d e s u n a N a D l a B c E o + s g S s M a e n C E R i n H e e i a O v g r i r t T a p L S a M d A Validation
Im Implem plemen entatio tion De Details ails Use COCO Pretrained model, mAP 52.9 on COCO17 minival. • Training multiscale size (400, 1400), max size 1600. • Testing multiscale size (400, 1400), max size 2100. • 8 V100(32GB) x 2 for 7 days. • Weight Standardization brings model diversity. • SoftNMS is adopted. •
Im Implem plemen entatio tion De Details ails Object365 Va Ob Validation (mA mAP) 38 36.5 36 33.1 34 31.1 30.9 30.7 32 30.1 30 28 25.01 26 23.66 24 22.73 22 20 0 N N d s S e g g S n l 5 l a P G n e b R D e i F d + t i a C H t S s e m o 4 s e A e m n 5 n T r N T i 1 o o l e e t 5 f i d e t d e s u n e a N a D l a l B c b E o + s g m S s M a e n C e E R i n s H e n e i a O v g E r i r t T a p L S a M d A Validation
Vi Visualization
Vi Visualization
Ti Tiny Track ck
Full Full Tr Track Pr Pretrain Pretrain Full Val mAP Tiny Val mAP Gain Tiny Test mAP COCO Pretrain - 28.9 - - Obj365 Full 30.7 33.0 +4.1 - Pretrain Obj365 Full 32.9 34.8 +5.9 - Pretrain Ensemble 8 - 37.6 +8.7 29.0 models • Multiscale input with flip in Training and Testing
Vi Visualization • Full Track Pretrained • COCO17 Pretrained
Pa Paddle Pa Paddle De Detect ctio ion • Fast/Faster R-CNN, FPN, Mask RCNN, Cascade R-CNN, Yolo v3, RetinaNet, SSD …… • GN, SyncBN, Deformable Conv v1/v2 …… • https://github.com/PaddlePaddle/models/tree/develo p/PaddleCV/object_detection • Training framework will be released soon.
Thank you! Please feel free to contact us, if you have any questions. gaoyuan18@baidu.com
Recommend
More recommend