School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh LOCAL ACTION RECOGNITION PROBLEM ACTION PRIMITIVES, NOT SEQUENCES, DISTINGUISH THESE 18 ACTIONS, GIVEN EG. A HAND WAVE ONLY ABOUT 10 CONSECUTIVE FRAMES TEMPORALLY LOCAL/SHORT-TERM IMAGE ANALYSIS/INSTANTANEOUS APPEARANCE BASED/VIEWPOINT SPECIFIC DAVIS & BOBICK ECVision Summer School: 4 - Local Action Recognition Fisher slide 1 ECVision Summer School: 4 - Local Action Recognition Fisher slide 2 School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh KEY STEPS CLASSIFICATION OF APPROACHES DETAILS TO COME ACTION RECOGNITION 1. BACKGROUND SUBTRACTION & THRESHOLDING USING HUMANOID IMAGE APPEARANCE GEOMETRIC MODELS CHANGES 2. TEMPORAL SEGMENTATION 3. ACCUMULATE MEI & MHI 2D MODELS 3D MODELS 4. COMPUTE 14 MOMENT INVARIANTS 5. MAHALANOBIS CLASSIFIER ECVision Summer School: 4 - Local Action Recognition Fisher slide 3 ECVision Summer School: 4 - Local Action Recognition Fisher slide 4
School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh MOTION ENERGY IMAGE EXAMPLE MOTION ENERGY IMAGE (MEI) GIVEN D ( x, y, t ): THRESHOLDED DIFFERENCE BETWEEN FRAME t AND BACKGROUND AT PIXEL ( x, y ) MEI: τ − 1 M ( x, y, t ) = � D ( x, y, t ) i =0 “WHERE” MOTION OCCURS ECVision Summer School: 4 - Local Action Recognition Fisher slide 5 ECVision Summer School: 4 - Local Action Recognition Fisher slide 6 School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh VIEW BASED REPRESENTATION MOTION HISTORY IMAGE (MHI) MULTIPLE 2D VIEWS, NOT 3D SAMPLE AZIMUTH EVERY 30 DEGREES IN TRAINING GIVEN D ( x, y, t ): THRESHOLDED FRAME DIFFERENCE MHI: τ IF D ( x, y, t ) = 1 H ( x, y, t ) = max (0 , H ( x, y, t − 1) − 1) ELSE “ORDER” MOTION OCCURS ECVision Summer School: 4 - Local Action Recognition Fisher slide 7 ECVision Summer School: 4 - Local Action Recognition Fisher slide 8
School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh MHI EXAMPLE MORE RECENT PIXELS BRIGHTER HU’S MOMENT INVARIANTS MOMENT INVARIANT (HERE) IS A NUMERICAL PROPERTY OF A WHOLE IMAGE USED TO SUMMARIZE MEI AND MHI IMAGES HU’S INVARIANTS INDEPENDENT OF: TRANSLATION, ROTATION, SCALE, INVERSION ECVision Summer School: 4 - Local Action Recognition Fisher slide 9 ECVision Summer School: 4 - Local Action Recognition Fisher slide 10 School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh INITIAL VALUES CENTRAL MOMENTS LET I ( x, y ) BE THE INITIAL IMAGE (BINARY OR GREY) TRANSLATION INVARIANT: � ( x − c x ) p ( y − c y ) q I ( x, y ) AREA: m pq = � I ( x, y ) N = ADD SCALE INVARIANCE CENTER OF MASS: � xI ( x, y ) m pq c x = 1 µ pq = N � yI ( x, y ) N ( p + q ) / 2+1 c y = 1 N ECVision Summer School: 4 - Local Action Recognition Fisher slide 11 ECVision Summer School: 4 - Local Action Recognition Fisher slide 12
School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh ADDING ROTATION INVARIANCE ENCODING & MATCHING MEI & 7 MOMENT INVARIANTS: MHI I 1 = ( µ 20 ) 2 + ( µ 02 ) 2 EACH FRAME ENCODED � h WITH 14 I 2 = ( µ 20 − µ 02 ) 2 + 4( µ 11 ) 2 VALUES: 7 HU MOMENTS FOR MEI & MHI I 3 = ( µ 30 − 3 µ 12 ) 2 + ( µ 03 − 3 µ 21 ) 2 DO FOR ALL TRAINING SEQUENCES i I 4 = ( µ 30 + µ 12 ) 2 + ( µ 03 + µ 21 ) 2 OVER ALL ACTIONS a ∈ A AND ALL VIEWS v ∈ V : { � h avi } . . . COMPUTE � b av = mean i ( { � USEFUL TO NORMALIZE I ′ n = f ( I n ) TO h avi } ) OVER SIMILAR VALUE RANGE MULTIPLE EXAMPLES CAN BE SENSITIVE TO NOISE AND COMPUTE COVARIANCE MATRIX R av IMAGE QUANTIZATION ECVision Summer School: 4 - Local Action Recognition Fisher slide 13 ECVision Summer School: 4 - Local Action Recognition Fisher slide 14 School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh EXP 1: 1 VIEW, 18 CLASSES ACTION RECOGNITION FOR AN UNKNOWN FRAME WITH DESCRIPTION � x , PICK THE ACTION a AND VIEWPOINT v MINIMIZING MAHALANOBIS DISTANCE: x − � x − � b av ) ′ R − 1 ( � av ( � b av ) ECVision Summer School: 4 - Local Action Recognition Fisher slide 15 ECVision Summer School: 4 - Local Action Recognition Fisher slide 16
School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh EXP 2: 2 VIEWS, 18 CLASSES TEMPORAL SEGMENTATION HOW TO CHOOSE TEMPORAL WINDOW SIZE FOR MEI/MHI COMPUTATION? HAS EFFICIENT SCHEME FOR COMPUTING SEVERAL τ = 11 . . . 19 (1-2 SEC) TRIED ALL τ , USED BEST RESULT? LOTS OF MISSING DETAILS ECVision Summer School: 4 - Local Action Recognition Fisher slide 17 ECVision Summer School: 4 - Local Action Recognition Fisher slide 18 School of Informatics, University of Edinburgh School of Informatics, University of Edinburgh WHAT WE HAVE LEARNED OPEN ISSUES 1. LOCAL SPATIO-TEMPORAL ACTION TRANSITIONS REPRESENTATION RECOGNIZING ACTION AT MIDDLE FRAME INSTEAD OF END 2. MOTION DESCRIPTIONS GIVE GOOD HYPOTHESIS ABOUT CURRENT SOME MOMENTS FRAGILE, MAYBE NOT ALL USEFUL ACTIVITY, IN A RESTRICTED DOMAIN SOME TEMPORAL QUANTIZATION 3. APPEARANCE BASED EFFECTS OBVIOUS IN DATA 4. LOTS OF IMPROVEMENTS POSSIBLE ECVision Summer School: 4 - Local Action Recognition Fisher slide 19 ECVision Summer School: 4 - Local Action Recognition Fisher slide 20
School of Informatics, University of Edinburgh Lecture Problem HOW CAN WE DETERMINE HOW MANY VIEWPOINTS SHOULD BE USED IN THE REPRESENTATION? ECVision Summer School: 4 - Local Action Recognition Fisher slide 21
Recommend
More recommend