computer vision
play

Computer Vision CSPP 56553 Artificial Intelligence March 3, 2004 - PowerPoint PPT Presentation

Computer Vision CSPP 56553 Artificial Intelligence March 3, 2004 Roadmap Motivation Computer vision applications Is a Picture worth a thousand words? Low level features Feature extraction: intensity, color High


  1. Computer Vision CSPP 56553 Artificial Intelligence March 3, 2004

  2. Roadmap • Motivation – Computer vision applications • Is a Picture worth a thousand words? – Low level features • Feature extraction: intensity, color – High level features • Top-down constraint: shape from stereo, motion,.. • Case Study: Vision as Modern AI – Fast, robust face detection (Viola & Jones 2002)

  3. Perception • From observation to facts about world – Analogous to speech recognition – Stimulus (Percept) S, World W • S = g(W) – Recognition: Derive world from percept • W=g’(S) • Is this possible?

  4. Key Perception Problem • Massive ambiguity – Optical illusions • Occlusion • Depth perception • “Objects are closer than they appear” • Is it full-sized or a miniature model?

  5. Image Ambiguity

  6. Handling Uncertainty • Identify single perfect correct solution – Impossible! • Noise, ambiguity, complexity • Solution: – Probabilistic model – P(W|S) = α P(S|W) P(W) • Maximize image probability and model probability

  7. Handling Complexity • Don’t solve the whole problem – Don’t recover every object/position/color… • Solve restricted problem – Find all the faces – Recognize a person – Align two images

  8. Modern Computer Vision Applications • Face / Object detection • Medical image registration • Face recognition • Object tracking

  9. Vision Subsystems

  10. Image Formation

  11. Images and Representations • Initially pixel images – Image as NxM matrix of pixel values – Alternate image codings • Grey-scale intensity values • Color encoding: intensities of RGB values

  12. Images

  13. Grey-scale Images

  14. Color Images

  15. Image Features • Grey-scale and color intensities – Directly access image signal values – Large number of measures • Possibly noisy • Only care about intensities as cues to world • Image Features: – Mid-level representation – Extract from raw intensities – Capture elements of interest for image understanding

  16. Edge Detection

  17. Edge Detection • Find sharp demarcations in intensity • 1) Apply spatially oriented filters • E.g. vertical, horizontal, diagonal • 2) Label above-threshold pixels with edge orientation • 3) Combine edge segments with same orientation: line

  18. Top-down Constraints • Goal: Extract objects from images – Approach: apply knowledge about how the world works to identify coherent objects

  19. Motion: Optical Flow • Find correspondences in sequential images – Units which move together represent objects

  20. Stereo

  21. Stereo Depth Resolution

  22. Texture and Shading

  23. Edge-Based 2-3D Reconstruction Assume world of solid polyhedra with 3-edge vertices Apply Waltz line labeling – via Constration Satisfaction

  24. Basic Object Recognition • Simple idea: – extract 3-D shapes from image – match against \shape library" • Problems: – extracting curved surfaces from image – representing shape of extracted object – representing shape and variability of library object classes – improper segmentation, occlusion – unknown illumination, shadows, markings, noise, complexity, etc. • Approaches: – index into library by measuring invariant properties of objects – alignment of image feature with projected library object feature – match image against multiple stored views (aspects) of library object – machine learning methods based on image statistics

  25. Hand-written Digit Recognition

  26. Summary • Vision is hard: – Noise, ambiguity, complexity • Prior knowledge is essential to constrain problem – Cohesion of objects, optics, object features • Combine multiple cues – Motion, stereo, shading, texture, • Image/object matching: – Library: features, lines, edges, etc • Apply domain knowledge: Optics • Apply machine learning: NN, NN, CSP, etc

  27. Computer Vision Case Study • “Rapid Object Detection using a Boosted Cascade of Simple Features”, Viola/Jones ’01 • Challenge: – Object detection: • Find all faces in an arbitrary images – Real-time execution • 15 frames per second – Need simple features, classifiers

  28. Rapid Object Detection Overview • Fast detection with simple local features – Simple fast feature extraction • Small number of computations per pixel • Rectangular features – Feature selection with Adaboost • Sequential feature refinement – Cascade of classifiers • Increasingly complex classifiers • Repeatedly rule out non-object areas

  29. Picking Features • What cues do we use for object detection? – Not direct pixel intensities – Features • Can encode task specific domain knowledge (bias) – Difficult to learn directly from data – Reduce training set size • Feature system can speed processing

  30. Rectangle Features • Treat rectangles as units – Derive statistics • Two-rectangle features – Two similar rectangular regions • Vertically or horizontally adjacent – Sum pixels in each region • Compute difference between regions

  31. Rectangle Features II • Three-rectangle features – 3 similar rectangles: horizontally/vertically • Sum outside rectangles • Subtract from center region • Four-rectangle features – Compute difference between diagonal pairs • HUGE feature set: ~180,000

  32. Rectangle Features

  33. Computing Features Efficiently • Fast detection requires fast feature calculation • Rapidly compute intermediate representation – “Integral image” – Value for point (x,y) is sum of pixels above, left – ii(x,y) = Σ x’<=x,y’<=y i(x,y) – Computed by recurrence • s(x,y) = s(x,y-1) + i(x,y) , where s(x,y) cumulative row • ii(x,y) = ii(x-1,y) + s(x,y) • Compute rectangle sum with 4 array references

  34. Rectangle Feature Summary • Rectangle features – Relatively simple – Sensitive to bars, edges, simple structure • Coarse – Rich enough for effective learning – Efficiently computable

  35. Learning an Image Classifier • Supervised training: +/- examples • Many learning approaches possible • Adaboost: – Selects features AND trains classifier – Improves performance of simple classifiers • Guaranteed to converge exponentially rapidly – Basic idea: Simple classifier • Boosts performance by focusing on previous errors

  36. Feature Selection and Training • Goal: Pick only useful features from 180000 – Idea: Small number of features effective • Learner selects single feature that best separates +/- ve examples – Learner selects optimal threshold for each feature – Classifier h(x) = 1 if pf(x)<p θ , 0 otherwise

  37. Basic Learning Results • Initial classification: Frontal faces – 200 features – Finds 95%, 1/14000 false positive – Very fast • Adding features adds to computation time • Features interpretable – Darker region around eyes that nose/cheeks – Eyes are darker than bridge of nose

  38. Primary Features

  39. “Attentional Cascade” • Goal: Improved classification, reduced time – Insight: Small – fast – classifiers can reject • But have very few false negatives – Reject majority of uninteresting regions quickly – Focus computation on interesting regions • Approach: “Degenerate” decision tree • Aka “cascade” • Positive results passed to high detection classifiers – Negative results rejected immediately

  40. Cascade Schematic All Sub-window Features T T T More CL 1 CL 2 CL 3 Classifiers F F F Reject Sub-Window

  41. Cascade Construction • Each stage is a trained classifier – Tune threshold to minimize false negatives – Good first stage classifier • Two feature strong classifier – eye/check + eye/nose • Tuned: Detect 100%; 40% false positives – Very computationally efficient • 60 microprocessor instructions

  42. Cascading • Goal: Reject bad features quickly – Most features are bad • Reject early in processing, little effort – Good regions will trigger full cascade • Relatively rare • Classification is progressively more difficult – Rejected the most obvious cases already • Deeper classifiers more complex, more error-prone

  43. Cascade Training • Tradeoffs: Accuracy vs Cost – More accurate classifiers: more features, complex – More features, more complex: Slower – Difficult optimization • Practical approach – Each stage reduces false positive rate – Bound reduction in false pos, increase in miss – Add features to each stage until meet target – Add stages until overall effectiveness targets met

  44. Results • Task: Detect frontal upright faces – Face/non-face training images • Face: ~5000 hand-labeled instances • Non-face: ~9500 random web-crawl, hand-checked – Classifier characteristics: • 38 layer cascade • Increasing number of features: 1,10,25,… : 6061 – Classification: Average 10 features per window • Most rejected in first 2 layers • Process 384x288 image in 0.067 secs

  45. Detection Tuning • Multiple detections: – Many subwindows around face will alert – Create disjoint subsets • For overlapping boundaries, only report one – Return average of corners • Voting: – 3 similarly trained detectors • Majority rules – Improves overall

Recommend


More recommend