Learning Deep Architectures Using Kernel Modules Li Deng Microsoft - PowerPoint PPT Presentation

MLSLP ‐ 2012 Learning Deep Architectures Using Kernel Modules Li Deng Microsoft Research, Redmond (thanks collaborations/discussions with many people)

Introduction • Deep neural net (“modern” multilayer perceptron) • Hard to parallelize in learning  • Deep Convex Net (Deep Stacking Net) • Limited hidden-layer size and part of parameters not convex in learning  • (Tensor DSN/DCN) and Kernel DCN • K-DCN: combines elegance of kernel methods and high performance of deep learning • Linearity of pattern functions (kernel) and nonlinearity in deep nets

Deep Neural Networks 3

Deep Stacking Network (DSN) • “Stacked generalization” in machine learning: – Use a high-level model to combine low-level models – Aim to achieve greater predictive accuracy • This principle has been reduced to practice: – Learning parameters in DSN/DCN (Deng & Yu, Interspeech- 2011; Deng, Yu, Platt, ICASSP-2012) – Parallelizable, scalable learning (Deng, Hutchinson, Yu, Interspeech-2012)

. DSN/DCN Architecture . . Example: L=3 • Many modules • Still easily trainable 10 10 • Alternating linear & nonlinear sub-layers 3000 3000 • Actual architecture for digit image recognition (10 classes) 10 784 10 784 • MNIST: 0.83% error rate (LeCun’s MNIST site) 3000 3000 10 784 10 784 3000 3000 784 784

Anatomy of a Module in DCN targets 10 10 10 10 3000 3000 U=pinv(h)t 10 784 10 784 h 3000 3000 10 784 10 784 W rand W RBM 3000 3000 784 linear units 10 linear units 784 linear units 10 linear units x 784 784

From DCN to Kernel-DCN Predictions � �� Preds � �� Preds � �� Input Data X � �� Prediction � �� Input Data X � �� ; � ∈ � � � �� Input Data �

Kernel-DCN

Nystrom Woodbury Approximation C

K-DSN Using Reduced Rank Kernel Regression

K-DCN: Layer-Wise Regularization Predictions • Two hyper-parameters in each module • Tuning them using cross validation data � �� • Relaxation at lower modules � �� • Special regularization procedures � �� • Lower-modules vs. higher modules Preds � �� Preds � �� Input Data X � �� Prediction � �� Input Data X � �� ; � ∈ � � � �� Input Data �

USE OF KERNEL DEEP CONVEX NETWORKS AND END-TO-END SLT-2012 paper: LEARNING FOR SPOKEN LANGUAGE UNDERSTANDING Li Deng 1 , Gokhan Tur 1,2 , Xiaodong He 1 , and Dilek Hakkani-Tur 1,2 1 Microsoft Research, Redmond, WA, USA 2 Conversational Systems Lab, Microsoft, Sunnyvale, CA, USA Table 2 . Comparisons of the domain classification error rates among the boosting-based baseline system, DCN system, and K- DCN system for a domain classification task. Three types of raw features (lexical, query clicks, and name entities) and four ways of their combinations are used for the evaluation as shown in four rows of the table. Feature Sets Baseline DCN K-DCN lexical features 10.40% 10.09% 9.52% lexical features 9.40% 9.32% 8.88% + Named Entities lexical features 8.50% 7.43% 5.94% + Query clicks lexical features 10.10% 7.26% 5.89% + Query clicks + Named Entities

Table 3 . More detailed results of K-DCN in Table 2 with Lexical+QueryClick features. Domain classification error rates (percent) on Train set, Dev set, and Test set as a function of the depth of the K-DCN. Depth Train Err% Dev Error% Test Err% 1 9.54 12.90 12.20 6.36 10.50 9.99 2 3 4.12 9.25 8.25 4 1.39 7.00 7.20 0.28 6.50 5.94 5 6 0.26 6.45 5.94 7 0.26 6.55 6.26 8 0.27 6.60 6.20 0.28 6.55 6.26 9 10 0.26 7.00 6.47 0.28 6.85 6.41 11

Learning Deep Architectures Using Kernel Modules Li Deng Microsoft - PowerPoint PPT Presentation

MLSLP 2012 Learning Deep Architectures Using Kernel Modules Li Deng Microsoft Research, Redmond (thanks collaborations/discussions with many people) Introduction Deep neural net (modern multilayer perceptron) Hard to

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

8. Other Deep Architectures CS 519 Deep Learning, Winter 2018 Fuxin Li With materials from Zsolt

Architectures Architectural styles Software architectures Architectures versus middleware

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

False Alarm Reduction for Active Sonars using Deep Learning Architectures Matthias Bu

Module-3b: Offset and Flicker Mitigation 01 August 2018 14:29 Modules Page 1 Modules Page 2

NetBSD Kernel Topics: IP Processing mbuf structure Loadable Kernel Modules Interrupts

Suggested Treatment of CHP in an EERS Context Anna Chittum, Research Associate Co-Authors: R.

Harley-Davidson Museum Milwaukee, WI. Jonathan Rumbaugh, BAE/MAE Mechanical Option Advisor: Dr.

Promote wellness events @ your work location Give away prizes to participants Wellness

Advocacy Member Services Professional Development OUR CORE VALUES We serve public school

Enhancing Profits by Improving Access to Skilled Workers Host: Richland County Development

R @ A PAPER PRESENTED AT THE HOODING CEREMONY OF COLLEGE OF SCIENCE & TECHNOLOGY, (CST)

Substance Use Urges and the Use of Dialectical Behavior Therapy Skills as Reported in Daily

Videovoice diaries to understand the perspectives of community health volunteers in Ethiopia:

Learning Deep Architectures Using Kernel Modules Li Deng Microsoft - PowerPoint PPT Presentation

MLSLP 2012 Learning Deep Architectures Using Kernel Modules Li Deng Microsoft Research, Redmond (thanks collaborations/discussions with many people) Introduction Deep neural net (modern multilayer perceptron) Hard to

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

8. Other Deep Architectures CS 519 Deep Learning, Winter 2018 Fuxin Li With materials from Zsolt

Architectures Architectural styles Software architectures Architectures versus middleware

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

False Alarm Reduction for Active Sonars using Deep Learning Architectures Matthias Bu

Module-3b: Offset and Flicker Mitigation 01 August 2018 14:29 Modules Page 1 Modules Page 2

NetBSD Kernel Topics: IP Processing mbuf structure Loadable Kernel Modules Interrupts

Suggested Treatment of CHP in an EERS Context Anna Chittum, Research Associate Co-Authors: R.

Harley-Davidson Museum Milwaukee, WI. Jonathan Rumbaugh, BAE/MAE Mechanical Option Advisor: Dr.

Promote wellness events @ your work location Give away prizes to participants Wellness

Advocacy Member Services Professional Development OUR CORE VALUES We serve public school

Enhancing Profits by Improving Access to Skilled Workers Host: Richland County Development

R @ A PAPER PRESENTED AT THE HOODING CEREMONY OF COLLEGE OF SCIENCE &amp; TECHNOLOGY, (CST)

Substance Use Urges and the Use of Dialectical Behavior Therapy Skills as Reported in Daily

Videovoice diaries to understand the perspectives of community health volunteers in Ethiopia:

R @ A PAPER PRESENTED AT THE HOODING CEREMONY OF COLLEGE OF SCIENCE & TECHNOLOGY, (CST)