A General Artificial Neural Network Extension for HTK Chao Zhang - PowerPoint PPT Presentation

A General Artificial Neural Network Extension for HTK Chao Zhang & Phil Woodland University of Cambridge 15 April 2015

Overview • Design Principles • Implementation Details � Generic ANN Support � ANN Training � Data Cache � Other Features • A Summary of HTK-ANN • HTK based Hybrid/Tandem Systems & Experiments � Hybrid SI System � Tandem SAT System � Demo Hybrid System with Flexible Structures • Conclusions 2 of 14

Design Principles • The design should be as generic as possible. � Flexible input feature configurations. � Flexible ANN model architectures. • HTK-ANN should be compatible with existing functions. � To minimise the e ff ort to reuse previous source code and tools. � To simplify the transfer of many technologies. • HTK-ANN should be kept “research friendly”. 3 of 14

Generic ANN Support • In HTK-ANN, ANNs have layered structures. � An HMM set can have any number of ANNs. � Each ANN can have any number of layers. • An ANN layer has � Parameters: weights, biases, activation function parameters � An input vector: defined by a feature mixture structure • A feature mixture has any number of feature element s • A feature element defines a fragment of the input vector by � Source: acoustic features, augmented features, output of some layer. � A context shift set: integers indicated the time di ff erence. 4 of 14

Generic ANN Support • In HTK-ANN, ANN structures can be any directed cyclic graph. • Since only standard EBP is included at present, HTK-ANN can train non-recurrent ANNs properly (directed acyclic graph). t-6 Feature Element 1 Source: Input acoustic features t-3 Context Shift Set: {-6, -3, 0, 3, 6} t t+3 Feature Element 2 Source: ANN 1, Layer 3, Outputs t+6 Context Shift Set: {0} t t-1 Feature Element 3 Source: ANN 2, Layer 2, Outputs t Context Shift Set: {-1, 0, 1} t+1 Figure: An example of a feature mixture. 5 of 14

ANN Training • HTK-ANN supports di ff erent training criteria � Frame-level: CE, MMSE � Sequence-level: MMI, MPE, MWE • ANN model training labels can come from � Frame-to-label alignment: for CE and MMSE criteria � Feature files: for autoencoders � Lattice files: for MMI, MPE, and MWE criteria • Gradients for SGD can be modified with momentum, gradient clipping, weight decay, and max norm. • Supported learning rate schedulers include List, Exponential Decay, AdaGrad, and a modified NewBob. 6 of 14

Data Cache • HTK-ANN has three types of data shu ffl ing � Frame based shu ffl ing: CE/MMSE for DNN, (unfolded) RNN � Utterance based shu ffl ing: MMI, MPE, and MWE training � Batch of utterance level shu ffl ing: RNN, ASGD 5 1 1 2 3 4 3 2 1 2 3 1 3 1 2 3 4 5 4 1 2 3 4 batch t batch t batch t Figure: Examples of di ff erent types of data shu ffl ing. 7 of 14

Other Features • Math Kernels: CPU, MKL, and CUDA based new kernels for ANNs • Input Transforms: compatible with HTK SI/SD input transforms • Speaker Adaptation: an ANN parameter unit online replacement • Model Edit � Insert/Remove/Initialise an ANN layer � Add/Delete a feature element to a feature mixture � Associate an ANN model to HMMs • Decoders � HVite: tandem/hybrid system decoding/alignment/model marking � HDecode: tandem/hybrid system LVCSR decoding � HDecode.mod: tandem/hybrid system model marking � A Joint decoder: log-linear combination of systems (same decision tree) 8 of 14

A Summary of HTK-ANN • Extended modules: HFBLat, HMath, HModel, HParm, HRec, HLVRec • New modules � HANNet: ANN structures & core algorithms � HCUDA: CUDA based math kernel functions � HNCache: Data cache for data random access • Extended tools: HDecode, HDecode.mod, HHEd, HVite • New tools � HNForward: ANN evaluation & output generation � HNTrainSGD: SGD based ANN training 9 of 14

Building Hybrid SI Systems • Steps of building CE based SI CD-DNN-HMMs using HTK � Produce desired tied state GMM-HMMs by decision tree tying (HHEd) � Generate ANN-HMMs by replacing GMMs with an ANN (HHEd) � Generate frame-to-state labels with a pre-trained system (HVite) � Train ANN-HMMs based on CE (HNTrainSGD) • Steps for CD-DNN-HMM MPE training � Generate num./den. lattices (HLRescore & HDecode) � Phone mark num./den. lattices (HVite or HDecode.mod) � Perform MPE training (HNTrainSGD) 10 of 14

ANN Front-ends for GMM-HMMs • ANNs can be used as GMM-HMM front-ends by using a feature mixture to define the composition of the GMM-HMM input vector. • HTK can accomodate a tandem SAT system as a single system � Mean and variance normalisations are treated as activation functions. � SD parameters are replaceable according to speaker ids. Pitch Mean/Variance Normalisation PLP HLDA CMLLR Pitch Bottleneck DNN STC PLP Figure: A composite ANN as a Tandem SAT system front-end. 11 of 14

Standard BOLT System Results • Hybrid DNN structure: 504 ⇥ 2000 4 ⇥ 1000 ⇥ 12000 • Tandem DNN structure: 504 ⇥ 2000 4 ⇥ 1000 ⇥ 26 ⇥ 12000 System Criterion %WER Hybrid SI CE 34.5 Hybrid SI MPE 31.6 Tandem SAT MPE 33.2 Hybrid SI ⌦ Tandem SAT MPE 31.0 Table: Performance of BOLT tandem and hybrid systems with standard configurations evaluated on dev’14. ⌦ is the joint decoding with system dependent combination weights (1.0, 0.2). 12 of 14

WSJ Demo Systems with Flexible Structures • Stacking MLPs: (468 + ( n � 1) ⇥ 200) ⇥ 1000 ⇥ 200 ⇥ 3000, n = 1 , 2 , . . . . Each MLP takes all previous BN features as input. • The top MLP does not have a BN layer. • System was trained with CE based discriminative pre-training and fine-tuning. • Systems were trained with 15 hours Wall Street Journal (WSJ0). FNN %Accuracy %WER Num Train Held-out 65k dt 65k et 1 69.9 58.1 9.3 10.9 2 72.8 59.1 9.0 10.4 3 73.9 59.1 8.8 10.7 Table: Performance of the WSJ0 Demo Systems. 13 of 14

Conclusions • HTK-ANN integrates native support of ANNs into HTK. • HTK based GMM technologies can be directly applied to ANN-based systems. • HTK-ANN can train FNNs with very flexible configurations � Topologies equivalent to DAG � Di ff erent activation functions � Various input features � Frame-level and sequence-level training criteria • Experiments on 300h CTS task showed HTK can generate standard state-of-the-art tandem and hybrid systems. • WSJ0 experiments showed HTK can build systems with flexible structures. • HTK-ANN will be available with the release of HTK 3.5 in 2015. 14 of 14

A General Artificial Neural Network Extension for HTK Chao Zhang - PowerPoint PPT Presentation

A General Artificial Neural Network Extension for HTK Chao Zhang & Phil Woodland University of Cambridge 15 April 2015 Overview Design Principles Implementation Details Generic ANN Support ANN Training Data Cache

Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview

Introduction to HTK Toolkit Berlin Chen 2004 Reference: - Steve Young et al. The HTK Book .

HTK Version 3.4 Features (cont) Mark Gales, Andrew Liu & Phil Woodland 19th April 2007 HTK3

Artificial Neural Networks By: Kodi Neumiller Overview What is an artificial neural network

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Networks Luke Schuler Overview What is an Artificial Neural Network? History

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Acoustic Modeling for Speech Recognition Berlin Chen 2004 References: 1. X. Huang et. al. Spoken

CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain,

Acoustic Modeling for Speech Recognition Berlin Chen 2003 References: 1. X. Huang et. al.,

Improving User Experience for translators Translate Extension Translate Extension Translate

Artificial Neural Networks Roger Barlow CODATA School - Roger Barlow -Artificial Neural Networks

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

A community facility for systems tes1ng at scale The prior

Anne Bracy Computer Science Cornell University The slides are the product of many rounds of

Reasoning for Humans: Clear Thinking in an Uncertain World PHIL 171 Eric Pacuit Department of

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris

Finding Finding Al All l Nearest earest Neighb Neighbors ors wi with th a Single a Single

Online Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Online Computations Sometimes, an

IAML: Artificial Neural Networks Charles Sutton and Victor Lavrenko School of Informatics

AUTONOMOUS DAMAGE DETECTION IN DOUBLE TRACK STEEL RAILWAY BRIDGES Ahmed Rageh Ph.D. Student,

Sambuz

Useful Links

Newsletter

Mail Us

A General Artificial Neural Network Extension for HTK Chao Zhang - PowerPoint PPT Presentation

A General Artificial Neural Network Extension for HTK Chao Zhang & Phil Woodland University of Cambridge 15 April 2015 Overview Design Principles Implementation Details Generic ANN Support ANN Training Data Cache

Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview

Introduction to HTK Toolkit Berlin Chen 2004 Reference: - Steve Young et al. The HTK Book .

HTK Version 3.4 Features (cont) Mark Gales, Andrew Liu &amp; Phil Woodland 19th April 2007 HTK3

Artificial Neural Networks By: Kodi Neumiller Overview What is an artificial neural network

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Networks Luke Schuler Overview What is an Artificial Neural Network? History

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Acoustic Modeling for Speech Recognition Berlin Chen 2004 References: 1. X. Huang et. al. Spoken

CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain,

Acoustic Modeling for Speech Recognition Berlin Chen 2003 References: 1. X. Huang et. al.,

Improving User Experience for translators Translate Extension Translate Extension Translate

Artificial Neural Networks Roger Barlow CODATA School - Roger Barlow -Artificial Neural Networks

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

A community facility for systems tes1ng at scale The prior

Anne Bracy Computer Science Cornell University The slides are the product of many rounds of

Reasoning for Humans: Clear Thinking in an Uncertain World PHIL 171 Eric Pacuit Department of

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris

Finding Finding Al All l Nearest earest Neighb Neighbors ors wi with th a Single a Single

Online Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Online Computations Sometimes, an

IAML: Artificial Neural Networks Charles Sutton and Victor Lavrenko School of Informatics

AUTONOMOUS DAMAGE DETECTION IN DOUBLE TRACK STEEL RAILWAY BRIDGES Ahmed Rageh Ph.D. Student,

Sambuz

Useful Links

Newsletter

Mail Us

HTK Version 3.4 Features (cont) Mark Gales, Andrew Liu & Phil Woodland 19th April 2007 HTK3