Jerry in the Age of Trees Werner Stuetzle Department of Statistics, UW May 15, 2019
At a time long long ago
The Car
The Fashion
The Band
The Stanford Stat Computing Facility
The Frontier
Virtually Unlimited Storage
The Message Should be Clear
Who’s this Cool Dude??
Gems - Not so Hidden
PRIM-9: An interactive multidimensional data display and analysis system (with Mary Anne Fisherkeller and John Tukey, 1974, 208 citations) A Projection Pursuit algorithm for exploratory data analysis (with John Tukey, 1974, 2245 citations)
An algorithm for finding best matches in logarithmic time (with Jon Bentley and Ari Finkel, 1976, 3150 Citations) Data structures for range searching (with Jon Bentley, 1979, 814 citations) A recursive partitioning decision rule for nonparametric classification (1977, 507 citations) A tree-structured approach to nonparametric multiple regression (acknowledges Leo Breiman, Charles Stone, Larry Rafsky, 1979, 67 citations)
Fast algorithms for constructing minimal spanning trees in coordinate spaces (with Jon Bentley, 1978, 137 citations) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests (with Larry Rafsky, 1979, 606 citations)
Hidden Gems
A nonparametric procedure for comparing multivariate point sets (with Sam Steppel, 1973, 13 citations) Given: Two samples S 1 and S 2 . Question: Are they from the same population? Idea: ◮ For each obs i in S 1 ∪ S 2 count number of S 1 obs among its k nearest neighbors ⇒ m i ◮ If they are from the same population then the distribution of m i for obs in S 1 and obs in S 2 should be the same. ◮ Comparison of univariate distributions can be calibrated using permutations.
Data analysis techniques for high energy particle physics (1974, 45 citations) Given: Two sets of features X and Y observed for the same collection of objects. Question: Are they independent? ◮ For each obs i find k nearest neighbors in X -space and k nearest neighbors in Y -space. ◮ Find m i number of shared nearest neighbors ◮ Compare to permutation distribution
A nested partitioning procedure for numerical multiple integration and adaptive importance sampling (with Margaret Wright (?), 1978, 51 citations) Goal: Compute integral of multivariate function f over a box. Idea: ◮ There may be small regions that dominate the integral ⇒ need to stratify. ◮ Strata consist of axis parallel boxes ◮ Optimal strata depend on sd of f , but sd is as hard to estimate as mean ◮ Use numerical optimization to find max and min of f in box
Looking forward to Jerry @ 100
Recommend
More recommend