Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu 1 , ∗ Alexander Schwing 2 , ∗ Kaustav Kundu 2 Raquel Urtasun 2 Sanja Fidler 2 1 Tsinghua University, 2 University of Toronto Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 1 / 22
How Many Times Have You Looked for Apartments? Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 2 / 22
How Many Times Have You Looked for Apartments? United States: 11.7% per year Craigslist: 90,000 rental ads per day only in New York 10 million people visit the website per day Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 2 / 22
How Many Times Have You Looked for Apartments? Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 2 / 22
Finding an Apartment/House is a Pain... Particularly during a winter in Toronto Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 3 / 22
Renting Apartments Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 4 / 22
Example Rental Data Plus some meta information e.g. wall height Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 5 / 22
Rent3D: View Rental Ads in 3D Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 6 / 22
Rent3D: View Rental Ads in 3D Camera localization within apartment Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 6 / 22
Related Work Room layout estimation ⊲ Hedau et al., 2009, 2012 ⊲ Lee et al., 2010 ⊲ Schwing et al., 2012, 2013 ⊲ Del Pero et al., 2011, 2012 Lee et al., 2010 ⊲ Choi et al., 2013 Virtual tours ⊲ Xiao & Furukawa, 2012 3D indoor reconstruction from large photo collections or video Xiao & Furukawa, 2012 ⊲ Cabral & Furukawa, 2014 ⊲ Brualla et al., 2014 Indoor localization (video, depth sensors) Project Tango SLAM work Cabral & Furukawa, 2014 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 7 / 22
Related Work Room layout estimation ⊲ Hedau et al., 2009, 2012 ⊲ Lee et al., 2010 ⊲ Schwing et al., 2012, 2013 ⊲ Del Pero et al., 2011, 2012 Lee et al., 2010 ⊲ Choi et al., 2013 Our work: Virtual tours ⊲ Xiao & Furukawa, 2012 3D indoor reconstruction and localization using monocular imagery 3D indoor reconstruction from large photo collections or video Xiao & Furukawa, 2012 ⊲ Cabral & Furukawa, 2014 ⊲ Brualla et al., 2014 Indoor localization (video, depth sensors) Project Tango SLAM work Cabral & Furukawa, 2014 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 7 / 22
Overview Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22
Overview Accurate camera localization : Scene cues Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22
Overview Accurate camera localization : Scene cues Semantic cues Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22
Overview Accurate camera localization : Scene cues Semantic cues Geometric cues by exploiting the dimension information Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22
Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room Front wall is the plane defined by vp 0 and vp 1 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room c r ∈ { 1 , . . . , | C r |} . . . a discrete variable representing within room r which wall the picture is facing ( | C r | the number of walls in a room) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room c r ∈ { 1 , . . . , | C r |} . . . a discrete variable representing within room r which wall the picture is facing ( | C r | the number of walls in a room) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room c r ∈ { 1 , . . . , | C r |} . . . a discrete variable representing within room r which wall the picture is facing ( | C r | the number of walls in a room) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room c r ∈ { 1 , . . . , | C r |} . . . a discrete variable representing within room r which wall the picture is facing ( | C r | the number of walls in a room) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room c r ∈ { 1 , . . . , | C r |} . . . a discrete variable representing within room r which wall the picture is facing ( | C r | the number of walls in a room) y . . . rays representing a room layout Typical parametrization for room layout [Hedau et al., 2009]: r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 Room is a 3D cuboid y 2 y 3 y = ( y 1 , y 2 , y 3 , y 4 ) vp 1 4 rays needed to define it Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room c r ∈ { 1 , . . . , | C r |} . . . a discrete variable representing within room r which wall the picture is facing ( | C r | the number of walls in a room) y . . . rays representing a room layout We formulate the problem as inference in a Conditional Random Field with the following energy: E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
Energy Terms: Scene Type E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Potential: Score of a scene classifier predicting scene type (e.g., bedroom, kitchen, reception) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 10 / 22
Energy Terms: Scene Type E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Potential: Score of a scene classifier predicting scene type (e.g., bedroom, kitchen, reception) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 10 / 22
Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Orientation Map [Lee et al., 2009] Geometric Context [Hedau et al., 2009] Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 y 2 y 3 vp 1 Orientation Map [Lee et al., 2009] Potential : Counts of blue, red, etc, pixels inside and outside of each wall Fast computation using integral geometry [Schwing et al., 2012] Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 y 2 y 3 vp 1 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 y 2 y 3 vp 1 y = ( y 1 , y 2 , y 3 , ✚ y 4 ), y 4 = f ( r , c r , y 1 , y 2 , y 3 ) ❩ ✚ ❩ Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 y 2 y 3 vp 1 y = ( y 1 , y 2 , y 3 , ✚ y 4 ), y 4 = f ( r , c r , y 1 , y 2 , y 3 ) ❩ ✚ ❩ Additional constraint on y : Camera is inside the room Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
Energy Terms: Windows E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Window-background segmentation Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 12 / 22
Energy Terms: Windows E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Window-background segmentation Potential : count window pixels inside and outside the window area vp 0 vp 2 vp 1 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 12 / 22
Learning and Inference We are minimizing the energy: � � ( r ∗ , c ∗ r , y ∗ ) = argmin E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r , c r , y Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 13 / 22
Learning and Inference We are minimizing the energy: � � ( r ∗ , c ∗ r , y ∗ ) = argmin E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r , c r , y Inference: Exhaustive enumeration of r and c r Exact branch and bound inference for y [Schwing & Urtasun, 2012] Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 13 / 22
Learning and Inference We are minimizing the energy: � � ( r ∗ , c ∗ r , y ∗ ) = argmin E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r , c r , y Inference: Exhaustive enumeration of r and c r Exact branch and bound inference for y [Schwing & Urtasun, 2012] We use S-SVM for training Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 13 / 22
Dataset We crawled a London apartment rental site # apartments 215 # of images 1570 # of indoor images 1259 # images without GT alignment 82 avg. # rooms per apt 6 avg. # walls per apt 31 avg. # windows per apt 6 avg. # doors per apt 9 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 14 / 22
Apartments in Central London Are Not Small Biggest apartment in dataset: 16 rooms, 5 bedrooms, 88 walls Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 15 / 22
Recommend
More recommend