Batch construction and multitask learning in visual relationship - PowerPoint PPT Presentation

Batch construction and multitask learning in visual relationship recognition Shane Josias Willie Brink Stellenbosch University, CAIR Stellenbosch University josias@sun.ac.za wbrink@sun.ac.za 30 January 2020 1/13

Visual relationship recognition Task: produce a (subject, predicate, object) triplet given an image. Example: Visual relationship / Scene graph subject: boy on top predicate: of object: surfboard 2/13

Challenges Combinatorial: with 100 subject, 70 predicate, and 100 object labels we have 700,000 possible relationships. Data distribution: is typically long-tailed, making it difficult to learn rare relationships. 6500 15500 6000 6000 number of instances 15000 5000 5500 2000 3000 2000 1500 1500 2000 1000 1000 1000 500 500 0 0 0 0 20 40 60 80 100 0 10 20 30 40 50 60 70 0 20 40 60 80 100 subject label index predicate label index object label index 3/13

Our approach Treat VRR as a classification problem. Input: image, cropped around a pair of objects. Output: (subject, predicate, object) triplet. Three tasks: predict the subject, predict the predicate and predict the object. Avoid predicting over 700,000 classes. Obtain normalised scores over classes in each task. Combine scores through multiplication. 4/13

Single task learning with standard batching FC layer (2,048) FC layer (2,048) FC layer (2,048) ResNet-18 conv. base output scores over subjects input image FC layer (2,048) FC layer (2,048) FC layer (2,048) ResNet-18 conv. base output scores over predicates input image FC layer (2,048) FC layer (2,048) FC layer (2,048) ResNet-18 conv. base output scores over objects input image 5/13

Class-selective batch construction Select n classes from a vocabulary of N classes, uniformly at random. Sample m instances from each selected class, uniformly at random. truck shirt sky building table person instances containing shirt instances containing building instances containing person 6/13

Multitask learning FC layer (2,048) output scores over subjects FC layer (2,048) FC layer (2,048) FC layer (2,048) ResNet-18 conv. base output scores input image over predicates FC layer (2,048) output scores over objects 7/13

VRD dataset (Lu et al. ECCV 2016) 5,000 images, 37,987 visual relationships but only 15,448 unique relationships. 100 labels for both subject and objects, 70 predicate labels in five categories. action verb spatial preposition comparative non-action verb person person motorcycle elephant person kick on top of with taller than wear ball ramp wheel person shirt 8/13

Metrics MPCA: mean per-class accuracy; used to measure performance on rare classes in the individual tasks. R@k: recall-at- k ; percentage of times the correct label occurs in the top k predictions (if ordered by output scores). Tail R@k: R@k measured on visual relationship classes that have fewer than 1,000 samples for subject, predicate, and object labels. 9/13

Quantitative results: individual tasks MPCA on the test set R@1 on the test set 50 60 Standard Batching Standard Batching standard batching standard batching Batch Construction Batch Construction batch construction batch construction 50 40 40 MPCA 30 R@1 30 20 20 10 10 0 0 t e t t e t t e t t e t c t c c t c c t c c t c e a e e a e e a e e a e j c j j c j j c j j c j b b b b b b b b d i d i d i d i u o u o u o u o s e s e s e s e r r r r p p p p � �� single-task multitask single-task multitask Batch construction is performed with respect to label on x -axis (same as the task being predicted). 10/13

Quantitative results: visual relationship recognition R@50 on the test set Tail R@50 on the test set Standard Batching Standard Batching standard batching standard batching 60 25 Batch Construction Batch Construction batch construction batch construction 50 20 40 Tail R@50 R@50 15 30 10 20 5 10 0 0 single-task multitask single-task multitask Batch construction is performed with respect to the object labels since it performed better overall. 11/13

Qualitative results person , on , horse giraffe , taller than , giraffe person , on , skateboard person , feed , elephant Models person, on, horse 12.0 giraffe, taller than, giraffe 25.1 person, wear, person 11.8 person, above, street 4.3 person, ride, horse 7.0 giraffe, in front of, giraffe 20.8 person, wear, shirt 10.5 person, on, street 4.1 person, wear, horse 5.3 giraffe, next to, giraffe 9.5 person, wear, skateboard 10.0 person, under, street 3.0 ST-SB person, has, horse 5.2 giraffe, above, giraffe 7.6 person, wear, shoes 5.4 sky, above, street 1.7 person, on, person 3.1 giraffe, behind, giraffe 7.2 person, wear, pants 4.4 sky, on, street 1.6 person, on, horse 18.7 giraffe, in front of, giraffe 98.6 person, wear, skateboard 25.6 person, under, elephant 16.4 person, has, horse 11.8 giraffe, taller than, giraffe 0.4 person, on, skateboard 10.0 person, in front of, elephant 16.0 person, wear, horse 7.7 giraffe, behind, giraffe 0.4 person, has, skateboard 9.6 person, above, elephant 10.0 ST-BC-O person, in front of, horse 4.3 giraffe, next to, giraffe 0.1 person, ride, skateboard 5.2 person, near, elephant 4.7 person, next to, person 3.7 giraffe, beside, giraffe 0.1 person, wear, shoes 3.5 person, behind, elephant 4.1 person, wear, horse 9.3 giraffe, taller than, giraffe 45.4 person, wear, shirt 15.5 person, on, street 4.7 person, on, horse 6.8 giraffe, in front of, giraffe 18.9 person, wear, person 9.6 person, under, street 3.9 person, wear, person 3.4 giraffe, next to, giraffe 8.6 person, wear, skateboard 6.9 person, above, street 3.4 MT-SB person, behind, horse 3.1 giraffe, behind, giraffe 7.3 person, wear, shoes 6.1 person, on, person 2.4 person, has, horse 2.6 giraffe, under, giraffe 2.6 person, wear, pants 4.1 person, under, person 1.9 person, on, horse 13.2 giraffe, in front of, giraffe 92.5 person, wear, skateboard 20.0 person, in front of, elephant 7.4 person, above, horse 12.0 giraffe, taller than, giraffe 6.0 person, wear, shoes 14.0 person, near, elephant 6.9 MT-BC-O person, behind, horse 6.3 giraffe, behind, giraffe 0.9 person, wear, helmet 12.0 person, under, elephant 5.1 person, ride, horse 5.3 giraffe, next to, giraffe 0.3 person, has, skateboard 3.8 person, on, elephant 3.4 person, has, horse 4.8 giraffe, beside, giraffe 0.07 person, wear, pants 3.7 person, above, elephant 2.4 ST-SB single-task, standard batching MT-SB multitask, standard batching ST-BC-O single-task, batch construction from object labels MT-SB-O multitask, batch construction from object labels 12/13

Conclusion Class-selective batch construction improves performance on the tail of the distribution, at the cost of performance on the small number of dom- inating classes. 13/13

Conclusion Class-selective batch construction improves performance on the tail of the distribution, at the cost of performance on the small number of dom- inating classes. Multitask learning neither improves nor impedes performance. Reduced capacity can be beneficial. 13/13

Conclusion Class-selective batch construction improves performance on the tail of the distribution, at the cost of performance on the small number of dom- inating classes. Multitask learning neither improves nor impedes performance. Reduced capacity can be beneficial. Predicates are difficult to model. Limitation of pretrained models? 13/13

Conclusion Class-selective batch construction improves performance on the tail of the distribution, at the cost of performance on the small number of dom- inating classes. Multitask learning neither improves nor impedes performance. Reduced capacity can be beneficial. Predicates are difficult to model. Limitation of pretrained models? Misclassifications are often semantically similar to groundtruth. We could use a language model to incorporate semantics. 13/13

Batch construction and multitask learning in visual relationship - PowerPoint PPT Presentation

Batch construction and multitask learning in visual relationship recognition Shane Josias Willie Brink Stellenbosch University, CAIR Stellenbosch University josias@sun.ac.za wbrink@sun.ac.za 30 January 2020 1/13 Visual relationship

Multitask Learning Lei Tang Arizona State University Nov. 6th, 2006 Lei Tang Multitask

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

Consistent Multitask Learning with Nonlinear Output Constraints Carlo Ciliberto Department of

COMPETITIVE MULTITASK MARINE TECHNOLOGY Ocean Cleaner Technology S.L. is a competitive marine

HEBT Magnet Vacuum Chambers for Batch 2 and Batch 3 PSP Code 2.3.7.1.2.3.2 Lukas Urban /

Batch Systems Running your jobs on an HPC machine Outline What are batch systems? Why are

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Batch Mode Active Learning and Its Application to Medical Image Classification ICML 2006 S. Hoi,

Learning about the process and organism: Batch Sef Heijnen, Department of Biotechnology, Faculty

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

SPEEDING UP DEEP REINFORCEMENT LEARNING VIA TRANSFER AND MULTITASK LEARNING Speaker: Yunshu Du

Ac#ve Learning Machine Learning 10-601B Batch/Passive Learning

Automating batch fecundity measurements Automating batch fecundity measurements using digital

Mandatory Access Control 1 DAC and Trojan Horse Brown: read, write Employee Employee Brown

Identity in the Browser -or- Putting the Cart Before the Horse? Andy Steingruebl and Jeff

Flooding Hazards January 6, 2014 David Lochbaum Director, Nuclear Safety Project Union of

Learning with Bad Training Data via Iterative Trimmed Loss Minimization Yanyao Shen , Sujay

Shahram Hadian Born in Iran Proud U.S. Citizen Transformational Life Change 1999

For Thursday No reading Take home exam due at the beginning of class For Tuesday Read

Asset Data Maintenance: Is Your Horse in Front of Your Cart? Part 2 Christina Martinez, GISP

Variables (Alice In Ac/on, Ch 3) Slides Credit: Joel Adams,

Sambuz

Useful Links

Newsletter

Mail Us