com putin g the depth of a flat
play

Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d - PDF document

Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d David Eppstein UC Irvin e 1 Robust Regression Given data with depen den t an d in depen den t vars Describe depen den t vars as fun ction of in dep. on es Should be robust again


  1. Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d David Eppstein UC Irvin e 1

  2. Robust Regression Given data with depen den t an d in depen den t vars Describe depen den t vars as fun ction of in dep. on es Should be robust again st arbitrary outliers Prefer distan ce-free m ethods for robustn ess again st skewed an d data-depen den t n oise 2

  3. Exam ple: Data Depth (n o variables in depen den t) Fit a poin t to a cloud of data poin ts Depth of a fit x = m in # data poin ts in halfspace con tain in g x x Tukey m edian = poin t with m ax possible depth 3

  4. Kn own Results for Data Depth n � � Tukey m edian has depth ≥ d + 1 [Radon 1946] Deep (but n ot optim ally deep) poin t can be foun d in tim e polyn om ial in n an d d [Clarkson , Eppstein , Miller, Sturtivan t, Ten g 1996] Deepest poin t can be foun d in tim e O ( n d ) (lin ear program with that m any con strain ts) Com putin g the depth of a poin t is NP-com plete for variable d [John son & Preparata 1978] O ( n d − 1 + n log n ) for fixed d [Rousseeuw & Struyf 1998] 4

  5. Exam ple: Regression Depth (all but on e variable in depen den t) [Hubert & Rousseeuw 1998] Fit a hyperplan e to a cloud of data poin ts Non fit = vertical hyperplan e (doesn ’t predict depen den t variable) Depth of a fit = m in # data poin ts crossed while m ovin g to a n on fit 5

  6. Kn own Results for Regression Depth n � � Deepest hyperplan e has depth ≥ d + 1 [Am en ta, Bern , Eppstein , Ten g 1998; Mizera 1998] Deepest hyperplan e can be foun d in tim e O ( n d ) (breadth first search in arran gem en t) Plan ar deepest lin e can be foun d in O ( n log n ) [van Kreveld et al. 1999; Lan germ an & Steiger 2000] Com putin g the depth of a hyperplan e is NP-com plete for variable d [Am en ta et al. 1998] O ( n d − 1 + n log n ) for fixed d [Rousseeuw & Struyf 1998] 6

  7. Multivariate Regression Depth (any n um ber k of in depen den t variables) [Bern & Eppstein 2000] Defin ition of depth for k -flat Equals data depth for k = 0 Equals regression depth for k = d − 1 Deepest flat has depth Ω( n ) � � n Con jecture: depth ≥ ( k + 1 )( d − k ) + 1 true for k = 0, k = 1, k = d − 1 7

  8. New Results Com putin g the depth of a k -flat is O ( n d − 2 + n log n ) when 0 < k < d − 1 Saves a factor of n com pared to sim ilar results for regression depth, data depth Determ in istic O ( n log n ) for lin es in space ( k = 1 , d = 3 ) Ran dom ized O ( n d − 2 ) for all other cases Likely can be deran dom ized usin g ǫ -n et techn iques 8

  9. Projective Geom etry Augm en t Euclidean geom . by “poin ts at in fin ity” On e in fin ite poin t per fam ily of parallel lin es Set of in fin ite poin ts form s “hyperplan e at in fin ity” Equivalen tly: view hyperplan es an d poin ts as equators an d pairs of poles on a sphere Non fit = k -flat touchin g som e particular ( d − k − 1 ) -flat at in fin ity 9

  10. Projective Duality In ciden ce-preservin g correspon den ce between k -flats an d ( d − k − 1 ) -flats Cloud of data poin ts becom es arran gem en t of hyperplan es In coordin ates (two dim en sion al case): ( a , b ) �→ y = ax + b y = m x + c �→ ( − m , c ) 10

  11. Crossin g Distan ce Crossin g distan ce between a j -flat an d a k -flat in a hyperplan e arran gem en t = m in im um n um ber of hyperplan es crossed by any lin e segm en t con n ectin g the two flats (in cl. lin e segm en ts “through in fin ity”) 11

  12. Defin ition of Depth Depth of a k-flat F = crossin g distan ce between dual ( F ) an d dual (( d − k − 1 ) -flat at in fin ity ) In prim al space, m in im um # data poin ts in double wedge boun ded by F an d by (( d − k − 1 ) -flat at in fin ity Non fit always has depth zero (zero-len gth lin e seg, em pty wedge) 12

  13. Param etrizin g Lin e Segm en ts Let F 1 , F 2 be flats (un orien ted projective spaces) If F 1 ∩ F 2 = ∅ , any pair ( p 1 ∈ F 1 , p 2 ∈ F 2 ) determ in es un ique lin e through them Need on e m ore bit of in form ation to specify which of two lin e segm en ts: double cover (orien ted proj. spaces) O 1 , O 2 Two-to-on e correspon den ce O 1 × O 2 �→ lin e segm en ts 13

  14. When does a segm en t cross a hyperplan e? Set of lin e segm en ts crossin g hyperplan e H is h 1 ⊕ h 2 where h i are halfspaces in O i with boun dary ( h i ) = H ∩ O i Or m ore sim ply, disjoin t un ion of two sets halfspace × halfspace O 1 ∞ F 1+ F 1– F 2+ O 2 ∞ F 2– Lin e seg w/ fewest crossin gs = poin t covered fewest tim es by such sets 14

  15. Algorithm for k = 1, d = 3 : Wan t poin t in torus O 1 × O 2 covered by fewest rectan gles h 1 × h 2 Sweep left-right (i.e., by O 1 -coordin ate), use segm en t tree to keep track of shallowest poin t in sweep lin e Tim e: O ( n log n ) Algorithm for Higher Dim en sion s : Replace segm en t tree by history tree of ran dom ized in crem en tal arran gem en t Replace sweep by traversal of history tree O ( n j + k − 1 ) for crossin g distan ce between j -flat an d k -flat ⇒ O ( n d − 2 ) for flat depth 15

  16. Con clusion s Presen ted efficien t algorithm for testin g depth Many rem ain in g open problem s in algorithm s, com bin atorics, & statistics How to fin d deepest flat efficien tly? What is its depth? Can we fin d deep flats efficien tly when d is variable? Do local optim ization heuristics work? Are sim ilar ideas of depth useful for n on lin ear regression ? 16

Recommend


More recommend