learning data representations hierarchies and invariance
play

Learning Data Representations: Hierarchies and Invariance Joachim M. - PowerPoint PPT Presentation

Learning Data Representations: Hierarchies and Invariance Joachim M. Buhmann Computer Science Department, ETH Zurich 23 November 2013 Value chain of IT: Personalized Medicine Activation of the mTOR Signaling Pathway in Renal Clear Cell


  1. Learning Data Representations: Hierarchies and Invariance Joachim M. Buhmann Computer Science Department, ETH Zurich 23 November 2013

  2. Value chain of IT: Personalized Medicine Activation of the mTOR Signaling Pathway in Renal Clear Cell Carcinoma. Robb et al., J Urology 177:346 (2007) my Knowledge my Data my Information my Value happy (alive) patients 23 Nov 2013 Joachim M. Buhmann MIT Workshop 2

  3. Learning features and representations § What are representations good for? § Task specific data reduction § Decision making § Efficient computation § Unfavorable properties of representations § Strongly statistically dependent features D KL ⇣ ⌘ p ( x 1 , . . . , x n k Q i p ( x i ) � 0 difficult to estimate easy to estimate hard to compute simple to compute 23 Nov 2013 Joachim M. Buhmann MIT Workshop 3

  4. Design principles for representations § Decoupling (statistical & computational) find epistemic atoms (symbols) , e.g., grandmother cells Example: chain of boolean variables x 1 ∈ { 0 , 1 } 0 1 0 0 1 n X Consider ξ k = (2 x i − 1) exp( ik 2 π /n ) i =1 23 Nov 2013 Joachim M. Buhmann MIT Workshop 4

  5. Design principles for representations (cont.) § Conditional decoupling § Infer tree structures § Modular structures § Latent variable discovery K-means: sum of average cluster distortions = sum of average pairwise distances 23 Nov 2013 Joachim M. Buhmann MIT Workshop 5

  6. Challenge for learning representations § Learning representations explores the space of structures § Combinatorial search in spaces with dim VC ( ∞ ) § Data adaptive coarsening is required, i.e., in the asymptotic limit we derive a distribution over structures and not a single best one. Current learning theory is insufficient to handle this constraint! => Information / rate distortion theory 23 Nov 2013 Joachim M. Buhmann MIT Workshop 6

  7. Goal: Theory for learning algorithms 10 11 4 1 12 6 2 3 7 8 5 10 11 … 9 4 1 12 A lgorithm A 6 2 3 10 11 7 8 4 5 1 12 6 9 2 3 7 8 5 9 § Modeling in pattern recognition requires § quantization : given identify a set of good hypotheses, A § learning : find an that specifies an informative set! A 23 Nov 2013 Joachim M. Buhmann MIT Workshop 7

  8. Low-Energy Computing § Novel low-power architectures operate near transistor threshold voltage (NTV) § e.g., Intel Claremont § 1.5 mW @10 MHz (x86) § NTV promises 10x more energy efficiency at 10x more parallelism! source: Intel § 10 5 times more soft errors (bits flip stochastically) § Hard to correct in hardware à expose to programmer? @ Thorsten Höffler 23 Nov 2013 Joachim M. Buhmann MIT Workshop 8

Recommend


More recommend