vgg mobilenet ssd
play

VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR - PowerPoint PPT Presentation

LOGO DETECTION WITH VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR FINE-TUNING BelgaLogos FlickrLogos Combined LogosInTheWild DATASETS FOR FINE-TUNING (POST-CLEANING) BelgaLogos BelgaLogos-top LogosInTheWild


  1. LOGO DETECTION WITH VGG/MOBILENET SSD Michael Sun July – September, 2019

  2. DATASETS FOR FINE-TUNING BelgaLogos FlickrLogos Combined LogosInTheWild

  3. DATASETS FOR FINE-TUNING (POST-CLEANING) BelgaLogos BelgaLogos-top LogosInTheWild LogosInTheWild-top FlickrLogos Available 9844 1063 20085 6775 5968 Instances Images 2650 688 6692 2090 2235 Classes 37 (all positive) 10 (all positive) 102 11 47

  4. NOTE ON CLEANING • A DDRESSING CLASS IMBALANCE ( KL - DIVERGENCE ) S CRIPT TO RANDOMLY SAMPLE INSTANCES • • N EGATIVE SAMPLES ( POS - NEG RATIO , NEGATIVE MINING )

  5. HYPERPARAMETERS A T T RAINING T IME A T M ODEL C OMPILE L EARNING RATE SCHEDULE (E POCHS , R ATES ) L AYER FREEZING • • B ATCH SIZE L OSS FUNCTION • • M OMENTUM I MAGE DIM , S CALES , AR, MISC . • • A UGMENTATION P IPELINE •

  6. FREEZING LAYERS (BELGAS-TOP) 0.662/0.353 mAP 0.623/0.332 mAP 0.634/0.367 mAP 0.396/0.285 mAP 0.341/0.224 mAP

  7. MOMENTUM (LOGOSINTHEWILD) 0.013/0.008 mAP 0.001/0.0 mAP 0.019/0.007 mAP • I NHERENT DIFFICULTY OF THIS DATASET • B IAS - VARIANCE TRADEOFF 0.033/0.022 mAP 0.043/0.012 mAP

  8. LOSS FUNCTION Classification loss (cross-entropy) SSD Positive ground truths Localization loss (smooth L1) Negative ground truths • T OTAL LOSS = LOCALIZATION + ALPHA * CLASSIFICATION • L OCALIZATION = SUM _( POS BOXES ) SMOOTH -L1 + SUM _( NEG BOXES ) SMOOTH -L1 NEG _ POS _ RATIO AND N _ NEG _ MIN • • D EFAULT : ALPHA =1, NP =3, N _ NEG _ MIN =0

  9. BATCH SIZE 0.158/0.169 mAP 0.257/0.232 mAP 0.327/0.308 mAP • B IAS OF EACH MODEL • E XPLANATION OF LOSS SPIKES AT LR DROPS

  10. CLASSIFICATION LOSS • (M ETHOD 1, ORIGINAL REPO ) R EGULAR CROSS - ENTROPY • (M ETHOD 2) W EIGH EACH CLASS C BY ( TOTAL - # SAMPLES IN C) • (M ETHOD 3) M ETHOD 2 BUT ONLY FOR POSITIVE CLASSES

  11. WEIGHTING METHOD (ALL CARS COMBINED DATASET)

  12. ALPHA AND NEG-POS RATIO (NBA LOGOS: KIA AND ADIDAS)

  13. EXPERIMENTS: LR SCHEDULE (BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) Epochs LR drops Val mAP 0, 10 5e-5, 1e-5 0, 10 2e-5, 1e-5 0.029 0 5e-5 0.125 0 3e-5 0.083 0 1e-5 0.019

  14. EXPERIMENTS: MOMENTUM (BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) Momentum (all 5e-5) Val mAP 0.9 0.7 0.045 0.5 0.02 0.3 0.0 0.1 0.015

  15. EXPERIMENTS: NEG-POS-RATIO & ALPHA (BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) Alpha Val mAP 1 0.115 0.9 0.145 0.8 0.169 0.7 0.168 NP Val mAP 4 0.03 5 0.115 6 0.015

  16. EXPERIMENTS: MISC (BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) • C ONF THRESH ( DEFAULT 1 E -2); 1 E -3 GOT 0.11 VAL M AP • I MG DIM (300, 350, 400, …); 400 DID WORSE ON VAL ; > 400 GPU CAN ’ T HANDLE WITH BATCH SIZE 32 • M OMENTUM SCHEDULER ; TRIED ONCE , COULDN ’ T CONVERGE • X AVIER - INITIALIZE CLASS / LOC LAYERS GOT 0.132 VAL M AP

  17. ASPECT-RATIOS, SCALES • D EFAULT scales_pascal = [0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05] # The anchor box scaling factors used in the original SSD300 for the Pascal VOC datasets aspect_ratios = [[1.0, 2.0, 0.5], [1.0, 2.0, 0.5, 3.0, 1.0/3.0], [1.0, 2.0, 0.5, 3.0, 1.0/3.0], [1.0, 2.0, 0.5, 3.0, 1.0/3.0], [1.0, 2.0, 0.5], [1.0, 2.0, 0.5]] • A FTER ANALYSIS scales = [0.05, 0.10, 0.15, 0.2, 0.25, 0.37, 0.50] aspect_ratios = [[1.0, 1.5, 2.0/3.0], [1.0, 1.5, 2.0/3.0, 1.25, 0.8], [1.0, 1.5, 2.0/3.0, 1.25, 0.8], [1.0, 1.5, 2.0/3.0, 1.25, 0.8], [1.0, 1.5, 2.0/3.0], [1.0, 1.5, 2.0/3.0]]

  18. AUGMENTATION PIPELINE • P HOTOMETRIC ( BRIGHTNESS , CONTRAST , • O RDER OF TRANSFORMATIONS SATURATION , HUE , RANDOM CHANNEL • M IN - SCALE , MAX - SCALE SWAP ) • M IN - SCALE , MAX - SCALE • E XPAND • F REQUENCY • R ANDOM C ROP • I NTERPOLATION • R ANDOM F LIP • R ESIZE

  19. DOUBLE FINE-TUNING Dataset of all cars, 16 classes, ~1500 images Dataset of cars of VGG- interest, 3 classes, All-cars COCO ~250 images model SSD Final model 0.018 val mAP; 10.08 val loss 0.021 val mAP; 9.63 val loss 0.0 val mAP; 10.43 val loss Class-diff ~0.4 AP vs 0.0 AP

  20. WEIGHTING METHOD (ALL CARS FINE TUNED DATASET) W HY I TUNED THIS HYPERPARAMETER • 0.34 mAP 0.337 mAP 0.34 mAP (0.325, 0.337, 0.358) (0.256, 0.354, 0.4) (0.327, 0.337, 0.339) (After 50 epochs)

  21. REAL-TIME INFERENCE PIPELINE Training VGG/ Initial Final Anchor Top-k/ (#frames, (#frames, (#frames, (#frames, (#frames,?, MNet filtering NMS Filter #boxes, pad-k Boxes ?, 6) k, 6) #boxes, 6) 6) SSD 12) (0.01) (0.5) Keras and numpy TensorFlow User VGG/ Non-max- Filtering (#frames, MNet suppression Top-k (#frames, k, 6) (#frames, ?, 6) (#frames,) (0.5) #boxes, 6) (NMS) SSD TFLite graph Android

  22. TOP-K (TFLITE) VGG/ • IN: (1, 8732, 6); OUT: (1, 20, 6) (#frames, MNet Top-k #boxes, 6) SSD

  23. FILTERING (TFLITE) • IN: (1, 20, 6); OUT: (1, ?, 6) Filtering Top-k (#frames, k, 6) (0.5) • W HY 0.5 ( OURS ) VS 0.01 THRESHOLD ( ORIGINAL ) IS OK

  24. NON-MAX-SUPPRESSION (TFLITE -> ANDROID) • IN: (1, ?, 6); OUT: (1, ?, 6) • L IMIT TO K =10 W HILE STILL BOXES LEFT & WE HAVE LESS THAN K • A DD HIGHEST CONFIDENCE BOX B • Non-max- Filtering suppression (#frames, ?, 6) F OR EACH REMAINING BOX • (0.5) (NMS) C OMPUTE IOU WITH B • I F > 0.5 THEN REMOVE •

  25. Combination of • alpha=0.5, np=5 and alpha=1.5 np=3 A lot of trial-and-error • with lr drops, early stopping

  26. (Left) Candidate for • final, has mistakes (Bottom) Final video • sent for annotations

  27. NOTES ON DEPTHWISE • F INAL 2 CONVS • F INAL 4 CONVS (~5 SECONDS / FRAME ) • F INAL 6 CONVS • A LL CLASS LAYERS • A LL LOC LAYERS • F IRST 2 LAYERS 0 M AP

  28. SUGGESTIONS ON IMPROVEMENT • C URRENTLY FRAME - BY - FRAME TAKES 10 S / FRAME BATCH SIZE INFERENCE – FRAME RATE / DELAY TRADEOFF • • (O PTIMISTIC ) – S EQUENCE MODELLING

  29. PROGRESS ON MOBILENET V2 ILSVRC/ImageNet COCO/VOC VGG/ … Conv-block 1 Mobilenet Conv-block 1 … Conv-block 6 base layers Base layers • Conv’s/ Depthwise-separable Conv blocks • Classification/localization • layers Classification Localization

  30. FINAL THOUGHTS • D IFFICULTY OF NOT HAVING FIXED DATASET , HAVING NOISY DATA • T HANKS TO U TKARSH AND G AURAV

Recommend


More recommend