the experiments for
play

The experiments for You -Do, I- Learn Presenter: Wenguang Mao - PowerPoint PPT Presentation

The experiments for You -Do, I- Learn Presenter: Wenguang Mao Instructor: Kristen Grauman Author for the paper: Dima Damen Recap of the Paper Gaze attention Gaze point Clustering position Clustering TRO MOI Gaze area appearance


  1. The experiments for “You -Do, I- Learn” Presenter: Wenguang Mao Instructor: Kristen Grauman Author for the paper: Dima Damen

  2. Recap of the Paper Gaze attention Gaze point Clustering position Clustering TRO MOI Gaze area appearance Neighbor frames

  3. Experiment Setup • Dataset: Bristol Egocentric Object Interactions Dataset

  4. Experiment Setup • Dataset: Bristol Egocentric Object Interactions Dataset • Egocentric videos at 6 locations • Gaze point on each frame • Gaze positions in 3D space • Gaze fixation on each frame • Ground truth positions of TROs • 3D map for each location, 3D positions of the camera for each frame, …… • Code: VLFeat, Matlab toolboxes, and programs written by myself

  5. Why Need Gaze Info • Given an egocentric image, which part of the image do you think I am focusing on? • Center of image? • Blue point: center of image • Red point: gaze point

  6. Why Need Gaze Info • The distance between the center and the gaze point (a) Desk (b) Door

  7. Why Need Gaze Info • The distance between the center and the gaze point Center of image is not good approximation for the gaze point (a) Desk (b) Door

  8. Why Need Gaze Info • The distance between the center and the gaze point ( during gaze fixation ) (a) Desk (b) Door

  9. Why Need Gaze Info • The distance between the center and the gaze point ( during gaze fixation ) Center of image is not good approximation for the gaze point Even during attention period (a) Desk (b) Door

  10. How Gaze Fixation Helps • Do you think there is any TRO in the video clips • Red dot: gaze point

  11. How Gaze Fixation Helps • Do you think there is any TRO in the video clips • Red dot: gaze point Gaze fixation helps identify a TRO

  12. How Gaze Fixation Helps • Do you think there is any TRO in the video clips • Red dot: gaze point

  13. How Gaze Fixation Helps • Do you think there is any TRO in the video clips • Red dot: gaze point Gaze fixation alone is far from enough to find TROs

  14. How 3D Positions of Gaze Help • Blue circles: 3D positions of gazes in a video • Red cross: ground truth positions of TRO (a) Without gaze fixation filtering (a) With gaze fixation filtering

  15. How 3D Positions of Gaze Help • Blue circles: 3D positions of gazes in a video • Red cross: ground truth positions of TRO 3D gaze positions are very helpful to identify TROs (a) Without gaze fixation filtering (a) With gaze fixation filtering

  16. Clustering for Gaze 3D positions • Right number of clusters (kmeans) • Yellow square: cluster center (a) Without gaze fixation filtering (b) With gaze fixation filtering

  17. Clustering for Gaze 3D positions • Right number of clusters (kmeans) • Yellow square: cluster center With the knowledge of right number of TROs, they can be easily identified using 3D gaze positions (a) Without gaze fixation filtering (b) With gaze fixation filtering

  18. Clustering for Gaze 3D positions • Too less clusters • Yellow square: cluster center (a) Without gaze fixation filtering (b) With gaze fixation filtering

  19. Clustering for Gaze 3D positions • Too less clusters • Yellow square: cluster center If underestimating the number, low precision and low recall for identifying TROs (a) Without gaze fixation filtering (b) With gaze fixation filtering

  20. Clustering for Gaze 3D positions • Too much clusters • Yellow square: cluster center (a) Without gaze fixation filtering (b) With gaze fixation filtering

  21. Clustering for Gaze 3D positions • Too much clusters • Yellow square: cluster center If overestimating the number, high recall and low precision (a) Without gaze fixation filtering (b) With gaze fixation filtering

  22. Spectral Clustering • Right number of clusters (a) kmeans (b) spectral

  23. Spectral Clustering • Right number of clusters Same with K-means (a) kmeans (b) spectral

  24. Spectral Clustering • Too less clusters (a) kmeans (b) spectral

  25. Spectral Clustering • Too less clusters Same with k-means (a) kmeans (b) spectral

  26. Spectral Clustering • Too much clusters (a) kmeans (b) spectral

  27. Spectral Clustering • Too much clusters Outperform k-means, high precision and high recall. (a) kmeans (b) spectral

  28. What is the Limitation of Gaze Positions • Can we only use 3D gaze positions? • No, because of moving TRO • How to solve this problem? • Appearance

  29. Appearance • How HoG features represent an image

  30. Appearance • How HoG features represent an image HoG is good to describe the boundary

  31. Identify TROs based on Appearance • Extract HoG from the region near the gaze point for each frame • Generate BoW representation for each frame • Perform clustering on frames • Use the frame closest to the center to represent each cluster • Compare the appearance of center frames with the ground truth

  32. Appearance • Five TROs around the desk tape socket screwdriver charger box

  33. Results Success (box) Success (tape) Duplicated (box) Success (charger) Failure

  34. Results Success (box) Success (tape) Duplicated (box) Missing two TROs, the appearance is not as effective as the position Success (charger) Failure

  35. Using Neighbor frames Failure Success (charger) Success (box) Success (driver) Success (tape)

  36. Using Neighbor frames Failure Success (charger) Success (box) Missing one TRO, using neighbor frames is helpful to improve performance Success (driver) Success (tape)

  37. Over-Estimating No. of Clusters Failure Success (box) Success (driver) Success (charger) Success (tape) Duplicated (box) Duplicated (box) Duplicated (driver)

  38. Over-Estimating No. of Clusters Failure Success (box) Success (driver) Success (charger) Missing one TROs, over-estimating is helpful to identify more TROs Success (tape) Duplicated (box) Duplicated (box) Duplicated (driver)

  39. Also Using Neighbor frames Failure Success (tape) Success (socket) Duplicated (socket) Success (driver) Success (box) Success (charger) Duplicated (box)

  40. Also Using Neighbor frames Failure Success (tape) Success (socket) Duplicated (socket) Finding all TROs Success (driver) Success (box) Success (charger) Duplicated (box)

  41. Conclusion • Gaze information is important and necessary for egocentric videos, and the center of image is not a good approximation • Gaze fixation is helpful for identifying TROs, but itself is not enough • 3D positions of gaze give rich information for TROs, but clustering method and the estimation on the number of TROs is critical • Use spectral clustering and do not worry about overestimating • Appearance is another important feature for identifying TROs • Using neighbor frames is beneficial to improve performance • Over-estimating No. of TROs is helpful to reduce false negative

Recommend


More recommend