Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the - PowerPoint PPT Presentation

Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the Kinect RGB-D Camera Works MicrosoC “Kinect for Xbox 360” – aka “Kinect 1” (2010) – Color video camera + laser- projected IR dot paUern + IR camera IR laser projector color camera T2 IR camera Many slides by D. Hoiem 640 x 480, 30 fps What the Kinect Does Compute Depth Image “2016 will be the year that we see interes6ng new applica6ons of depth camera technology on mobile phones.” -- Chris Bishop, Director of MicrosoC Research, Cambridge (2015) Application (e.g., game) Estimate body parts and joint poses

How Kinect Works: Overview Stereo from Projected Dots IR Projector IR Projector IR Sensor IR Sensor Projected Light Pattern Projected Light Pattern Stereo Stereo Algorithm Algorithm Segmentation, Segmentation, Part Prediction Part Prediction Depth Image Body parts and joint positions Depth Image Body parts and joint positions Stereo from Projected Dots Depth from Stereo Images image 1 image 2 1. Overview of depth from stereo 2. How it works for a projector/sensor pair Dense depth map 3. Stereo algorithm used Some of following slides adapted from Steve Seitz and Lana Lazebnik

Depth from Stereo Images Basic Stereo Matching Algorithm • Goal: recover depth by finding image coordinate x ’ in Image 2 that corresponds to x in Image 1 X X z • For each pixel in the first image x x x’ – Find corresponding epipolar line in the right image – Examine all pixels on the epipolar line and pick the best f f x' match C Baseline C’ – Triangulate the matches to get depth informa6on B Depth from Disparity Basic Stereo Matching Algorithm X − ′ x x f = z ′ − O O z x x ’ f f • If necessary, rec6fy the two stereo images to transform O Baseline O’ epipolar lines into scanlines B • For each pixel x in the first image ⋅ z = B ⋅ f B f ′ = − = – Find corresponding epipolar scanline in the right image disparity x x x − ′ z x – Examine all pixels on the scanline and pick the best match x ’ – Compute disparity x - x ’ and set depth( x ) = fB /( x - x ’) Disparity is inversely proportional to depth, z

Results of Window Search Correspondence Search Data Left Right scanline Matching cost Window-based matching Ground truth disparity • Slide a window along the right scanline and compare contents of that window with the reference window in the leC image • Matching cost: SSD or normalized cross-correla6on Improve by Adding Constraints and Solve Failures of Correspondence Search with Graph Cuts Before Occlusions, repeated structures Textureless surfaces Graph cuts Ground truth Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 Non-Lambertian surfaces, specularities For the latest and greatest: http://www.middlebury.edu/stereo/

Source: http://www.futurepicture.org/?p=97 Structured Light Example: Book vs. No Book • Basic Principle – Use a projector to create known features in the 3D scene (e.g., points, lines) • Light projec6on – If we project dis6nc6ve points, matching is easy Source: http://www.futurepicture.org/?p=97 Example: Book vs. No Book Kinect’s Projected Dot PaUern

Kinect RGB-D Camera Same Stereo Algorithms Apply Projector Sensor Implementa6on Kinect for Xbox One • In-camera ASIC computes 11-bit 640 x 480 • aka “Kinect 2” (2013) • Replaced Structured-Light Camera by depth map at 30 Hz Time-of-Flight Camera • Range limit for tracking: 0.7 – 6 m (2.3’ to 20’) • Higher resolu6on (1080p), larger view of view , 30 fps camera • Prac6cal range limit: 1.2 – 3.5 m • Depth resolu6on 2.5cm at 4m

Time-of-Flight Depth Sensing Kinect 2’s Time of Flight Sensor emiUed • Kinect 2 uses mul6ple measurements (3 pulse light pulse source frequencies x 3 amplitudes) to compute at scene stop-watch each pixel: d e – The amount of reflected light origina6ng from the v i e c e r e s u l p sensor h t g ac6ve light source (called the “ac6ve image”) i l – The depth of the scene from the phase shiCs for depth = c / 2 t, the mul6ple measurements (which disambiguate where c = speed 6me delay t of light Impulse Time-of-Flight Imaging the depth) intensity – The amount of ambient light emiUed pulse received pulse 6me [Koechner, 1968] Part 2: Pose from Depth Goal: Es6mate Pose from Depth Image IR Projector IR Sensor Projected Light Pattern Stereo Algorithm Segmentation, Part Prediction Real-Time Human Pose Recognition in Parts from a Single Depth Image, J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, Depth Image and A. Blake, Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011 Body parts and joint positions

Goal: Es6mate Pose from Depth Image Challenges • Lots of varia6on in bodies, orienta6ons, poses Step 1. Find body parts • Needs to be very fast (their algorithm runs at 200 Step 2. Compute joint positions fps on the Xbox 360 GPU) Pose Examples Part Label Map Joint Positions RGB Depth Examples of one part http://research.microsoft.com/apps/video/default.aspx?id=144455 Finding Body Parts Extract Body Pixels by Thresholding Depth • What should we use for a feature? – Difference in depth • What should we use for a classifier? – Random Forest / Decision Forest

Features Part Classifica6on with Random Forests • Difference of depth at two pixels • Random Forest : collec6on of independently-trained • Offset is scaled by depth at reference pixel binary decision trees • Each tree is a classifier that predicts the likelihood of a pixel x belonging to body part class c – Non-leaf node corresponds to a thresholded feature – Leaf node corresponds to a conjunc6on of several features – At leaf node store learned distribu6on P ( c | I , x ) d I ( x ) is depth image, θ = ( u , v ) is offset to second pixel Classifica6on Classifica6on Tes5ng Phase: 1. Classify each pixel x in image I using all decision trees and average the results at the leaves: Learning Phase: 1. For each tree, pick a randomly sampled subset of training data 2. Randomly choose a set of features and thresholds at each node 3. Pick the feature and threshold that give the largest information gain 4. Recurse until a certain accuracy is reached or tree-depth is obtained

Implementa6on Get Lots of Training Data • Capture and sample 500K mo6on capture • 31 body parts frames of people kicking, driving, dancing, etc. • 3 trees (depth 20) • Get 3D models for 15 bodies with a variety of • 300,000 training images per tree randomly weights, heights, etc. selected from 1M training images • Synthesize mo6on capture data for all 15 body • 2,000 training example pixels per image types • 2,000 candidate features • 50 candidate thresholds per feature • Decision forest constructed in 1 day on 1,000 core cluster Results

Step 2: Joint Posi6on Es6ma6on Results • Joints are es6mated using the mean-shi; clustering algorithm applied to the labeled pixels • Gaussian-weighted density es6mator for each body part to find its mode 3D posi6on • “Push back in depth” each cluster mode to lie at approx. center of the body part • 73% joint predic6on accuracy (on head, shoulders, elbows, hands) Cameras for Tracking Leap Mo6on – 2’ x 2’ x 2’ volume – 2015, $80

Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the - PowerPoint PPT Presentation

Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the Kinect RGB-D Camera Works MicrosoC Kinect for Xbox 360 aka Kinect 1 (2010) Color video camera + laser- projected IR dot paUern + IR camera IR laser projector

T Levels/Skills Plan Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

Acquisition of a three- dimensional model through Microsoft Kinect The Microsoft Kinect RGB

# Camera camera = Camera.open(); Camera camera

Camera camera = Camera.open();

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

GESTURE SENSORS Microsoft Kinect V1 24M - 2013 Microsoft Kinect V2 20M - 2016 + VR + GESTURE

Kinect Device How the Kinect Works T2 Subhransu Maji Slides credit: Derek Hoiem, University of

# Camera camera = Camera.open();

2 At Home: Kinect IR Camera RGB Camera There are some problems with cameras 3 Illumination

Kinect@Home: Crowdsourced RGB-D data Rasmus Gransson, Alper Aydemir and Patric Jensfelt

Recognises your face and voice Kinect Adventures What the Kinect Sees top view side view

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

Using sparse optical flow for multiple Kinect applications 27.6.2013 Stefan Guthe 1 Microsoft

RGB ar chitect s RGB ar chitect s RGB ar chitect s Concepts behind Blended Learning

RGB-D Mapping Overview CSE 571 Robotics Map RGB-D Mapping `` University of Washington Dieter

Point Cloud based Gesture Recognition with Kinect 2 Anton Klarn, Jonathan Karlsson Kinect v2

Devices Webinar Nation Webinar Dave Sevick, Computer Reach Catherine Crago, Housing Authority of

Visualization with dotplots Zsuzsanna Lipt ak Masters in Medical Bioinformatics academic year

and Quantum Dots Richard Tilley School of Chemistry, Mark Wainwright Analytical Centre,

Slides from FYS4411 Lectures Morten Hjorth-Jensen & Gustav R. Jansen 1 Department of Physics

EMBEDDED LIBRARIANSHIP: CONN CONNECT CTIN ING T THE HE DOT DOTS

MA111: Contemporary mathematics . Jack Schmidt University of Kentucky February 22, 2011

ENGR/CS 101 CS Session Lecture 1 CS session webpage

Trace and center of the twisted Heisenberg category Michael Reeks June 4, 2018 Michael Reeks

Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the - PowerPoint PPT Presentation

Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the Kinect RGB-D Camera Works MicrosoC Kinect for Xbox 360 aka Kinect 1 (2010) Color video camera + laser- projected IR dot paUern + IR camera IR laser projector

T Levels/Skills Plan Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

Acquisition of a three- dimensional model through Microsoft Kinect The Microsoft Kinect RGB

# Camera camera = Camera.open(); Camera camera

Camera camera = Camera.open();

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

GESTURE SENSORS Microsoft Kinect V1 24M - 2013 Microsoft Kinect V2 20M - 2016 + VR + GESTURE

Kinect Device How the Kinect Works T2 Subhransu Maji Slides credit: Derek Hoiem, University of

# Camera camera = Camera.open();

2 At Home: Kinect IR Camera RGB Camera There are some problems with cameras 3 Illumination

Kinect@Home: Crowdsourced RGB-D data Rasmus Gransson, Alper Aydemir and Patric Jensfelt

Recognises your face and voice Kinect Adventures What the Kinect Sees top view side view

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

Using sparse optical flow for multiple Kinect applications 27.6.2013 Stefan Guthe 1 Microsoft

RGB ar chitect s RGB ar chitect s RGB ar chitect s Concepts behind Blended Learning

RGB-D Mapping Overview CSE 571 Robotics Map RGB-D Mapping `` University of Washington Dieter

Point Cloud based Gesture Recognition with Kinect 2 Anton Klarn, Jonathan Karlsson Kinect v2

Devices Webinar Nation Webinar Dave Sevick, Computer Reach Catherine Crago, Housing Authority of

Visualization with dotplots Zsuzsanna Lipt ak Masters in Medical Bioinformatics academic year

and Quantum Dots Richard Tilley School of Chemistry, Mark Wainwright Analytical Centre,

Slides from FYS4411 Lectures Morten Hjorth-Jensen &amp; Gustav R. Jansen 1 Department of Physics

EMBEDDED LIBRARIANSHIP: CONN CONNECT CTIN ING T THE HE DOT DOTS

MA111: Contemporary mathematics . Jack Schmidt University of Kentucky February 22, 2011

ENGR/CS 101 CS Session Lecture 1 CS session webpage

Trace and center of the twisted Heisenberg category Michael Reeks June 4, 2018 Michael Reeks

Slides from FYS4411 Lectures Morten Hjorth-Jensen & Gustav R. Jansen 1 Department of Physics