a computer vision tangible user interface for mixed
play

A Computer Vision Tangible User Interface for Mixed Reality - PowerPoint PPT Presentation

A Computer Vision Tangible User Interface for Mixed Reality Billiards Brian Hammond Pace University December 10, 2007 Introduction HCI is study of how humans interact with computers The mouse and keyboard limits richness of


  1. A Computer Vision Tangible User Interface for Mixed Reality Billiards Brian Hammond • Pace University December 10, 2007

  2. Introduction • HCI is study of how humans interact with computers • The mouse and keyboard limits richness of interactions • Our goal is to help define techniques to improve this situation using Computer Vision as input sensor • We will create a Tangible User Interface to billiards to explore issues

  3. Background

  4. What is a TUI? • User interface elements are physical objects, called tokens • Many ways to sense a user-token interactions • electromagnetic, haptics, RFID... each have their limitations • We choose CV for sensing mechanism

  5. Why Billiards? • Familiar game(s) to many people • Known interaction patterns • Simple tokens • Unnatural to play a virtual billiards game without using real cue stick (e.g. with mouse)

  6. What is Mixed Reality? • A mixed reality system spans Milgram’s continuum of physical reality, augmented reality (AR), augmented virtuality (AV), and virtual reality (VR) • Our system mixes physical reality with VR • Physical cue stick associated with VR cue stick

  7. What is Computer Vision? • A branch of AI whose goal is to understand the content of digital images in order to make decisions • Why CV then? • Inexpensive (~ $50 for webcam) • Non-invasive (watch user-token interactions) leads to more natural interface; just use tokens as normally would

  8. Why CV is difficult ... • Camera imperfections → poor image quality → inaccurate model of the world • Easy to fool humans with optical illusions; same for automated processes • Computationally expensive (image data) • CV-based systems make tradeoffs in design to produce a working system

  9. Image Processing vs Understanding • Image processing is mechanical means to query or alter contents of digital images • Image understanding tries to find features or objects in images in order to make decisions from their state, spatial relationships, etc. • Image processing aids understanding • Understanding of images is context sensitive

  10. Digital Images • 2D array of numbers representing light intensity; usually 8-bits per picture element or pixel or 256 discrete intensity levels • Color images generally use 3 channels per pixel; color encoded as per some colorspace e.g. RGB, HSV, CIELab, etc. • Image processing manipulates numbers

  11. DV Camera Woes • Camera manufacturing is a tradeoff between cost and quality... leads to image imperfections • Geometric distortion , blooming , noise , chromatic aberrations , quantization , low resolution, low acquisition rate, etc. • Fight with camera calibration , temporal averaging , weighted moving averages , manual focus/exposure

  12. Photo from Nikon D50

  13. Image from Apple iSight

  14. Common CV Methods • Stereo vision -- use 2 cameras, find disparities, infer depth from triangulation • we don’t use stereo for cost, simplicity, and challenge of using just a single camera • Tracking -- follow moving object (e.g. cue stick) in sequence of images • we don’t use due to fast moving objects, occlusions, search window explosion, etc.

  15. Scene Layout

  16. Tokens • Tokens are physical UI elements • Unadorned billiards cue stick • Standard billiards cue ball • Patch of cloth/felt like billiards table surface • Other scene elements • light source, IEEE 1394 DV camera, human! • reference object, “planar object”

  17. Related Work TUI CV-based input Graspable User Visual Touchpad Interfaces Ph.D mulTetris Tangible Bits PlayAnywhere VideoPlace metaDESK MR-based Billiards Paper Mâché HapStick Touch-Space Stochasticks Crayons Automatic Pool Trainer

  18. Architecture Client-server using TCP/IP for IPC

  19. Server Process

  20. Server Tasks • image acquisition • feature detection and extraction • cue stick pose estimation • shot detection and analysis • client notifications (pose changes, shots)

  21. Feature Detection & Extraction • We define a feature as any shape or image object of interest (edge, region of a certain color, etc.) • Feature detection attempts to determine if feature is present in image • Feature extraction derives information from features • Lots of research on these based, mostly based on invariant properties of objects

  22. Features (cont’d) • We use color and shape to detect features in acquired digital images • Color is invariant? No, but read on... • We then extract information using image processing techniques • General strategy: reduce complexity repeatedly (abstract more and more)

  23. Common Feature Detection Methods

  24. Thresholding • Convert color image to grayscale; pick threshold intensity; values above become white; rest black

  25. Contours • Contour = Blob = Connected Components • Regions of like intensity in binary image • Connected to neighbors (4-way or 8-way) • Nested contours • Attributes: area, perimiter, circularity, etc.

  26. Morphological Operators • Alter the shape of contours • Erosion , dilation operators most common • Remove noise, fill holes

  27. Color Representation • Colors can be represented in many forms • Most common is RGB (or BGR) • We use HSV (hue, saturation, value) • Hue = color; saturation = vibrancy; value = brightness • Images acquired in BGR; we convert to HSV and perform image processing in HSV space • Colors matched using flexible color matching

  28. FCMs • Find pixels in source image where color matches a FCM; dest. image contains white where a match • Then use morphological operators on dest. as well as contour analysis to detect features

  29. Convex Hulls & Defects • Convex hull fits rubber band around shape • Convex shape means line through any pair of vertices does not cross edge of shape • Defect is where this does not hold • Can detect defects and exploit them for feature detection

  30. Cue Stick Pose • h : vertical offset from plane of desk • θ : pitch relative to plane of desk • ψ : yaw (relative to image space “up” dir.) • dist w : dist. between cue stick and cue ball • a, b : expected spin-inducing parameters

  31. Features → Tokens Feature Helps find.. Detection is.. Description cloth cue stick, ball, shadow automatic container Tr h manual top point of ref object Br h manual bottom point of ref object St h manual shadow of Tr cue tip dist w automatic tip of cue stick θ , ψ shaft automatic cue stick shaft θ shadow automatic shadow of shaft Sc shadow automatic shadow of tip of stick planar object h automatic edges used to find parallel lines find vanishing line of plane of parallel lines h automatic desk

  32. Planar Object Detection • Acquired image → grayscale → thresholded at intensity 192/255 (assume bright!) • Find contours; approximate polygon with each • Find contours that has 4 sides at 90°±5° (rectangle); take largest by area as object • 2 pair of opp. sides used for finding vanishing line of plane of desk

  33. Cloth Detection • Use FCM setup to find green hue with arbitrary saturation and value • Remove small contours by erosion operator • Take largest by area as cloth

  34. Cue Ball Detection • Cue ball sits atop cloth • Found nested contours in previous step • Remove small child contours of cloth • Take most circular (metric: area/radius) using minimal enclosing circle to find radius

  35. Cue Stick Detection • Detect stick and its shadow • Find shadow from using FCM with arbitrary hue and saturation but low value (i.e. dark areas match = shadow) • Restrict search region to bounding rectangle of cloth contour • Find convex hull of cloth contour; find defects and their deepest points

  36. Cue Stick Detection (cont’d) • Draw thick lines (stroke) along contour perimeter • Encloses shadow and stick contours • Fill in “holes” of cue ball and shadow; left with cue stick as largest child contour of cloth contour • Have detected Sc , cue tip , shaft , and shadow features

  37. Estimation of h • Adaptation of technique of “3D Trajectories from a Single Viewpoint using Shadows” by Ian D. Reid and A. North. • Recover lost sense of depth • Since we have a top-down view, depth=height • Requires use of planar object & reference object of user’s choice of known height that sits atop desk ( parallel lines, Tr, Br, St ) • Also uses features cue tip, Sc

  38. Estimation of h

  39. diagram adapted from Reid paper for our TUI setup

  40. diagram adapted from Reid paper for our TUI setup

  41. Lines Required for Estimation of h Line Description l1 through reference object shadow (Tr-Br) through Sc parallel to l1; hence l2 intersects l1 on l2 vanishing line l3 through Tr and Sc l4 through cue tip and Tr through Br and intersection of l3 and l4; intersection of l5 l5 and l1 is projection of cue tip on plane (Pb) l6 through Tr and intersection of l5 and vanishing line l7 through cue tip and intersection of l5 and vanishing line l8 through Tr and Br l9 through Pb and cue tip

Recommend


More recommend