DTW and Search Hsin-min Wang
References Books 1. X. Huang, A. Acero, H. Hon, “Spoken Language Processing”, Chapters 12-13, Prentice Hall, 2001 2. Chin-Hui Lee, Frank K. Soong and Kuldip K. Paliwal. Automatic Speech and Speaker Recognition, Chapters 13, 16-18, Kluwer Academic Publishers, 1996 3. John R. Deller, JR. John G. Proakis, and John H. L. Hansen. Discrete-Time Processing of Speech Signals, Chapters 11-12, IEEE Press, 2000 4. L.R. Rabiner and B.H. Juang. Fundamentals of speech recognition, Chapter 7, Prentice Hall, 1993 5. Frederick Jelinek. Statistical Methods for Speech Recognition, Chapters 5-6, MIT Press, 1999 6. N. Nilisson. Principles of Artificial Intelligence, 1982 Papers 1. Hermann Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000 2. Patrick Kenny, et al, “A*-Admissible heuristics for rapid lexical access,” IEEE Trans. on SAP, 1993 3. Stefan Ortmanns and Hermann Ney, “A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language (1997) 11,43-72 4. Jean-Luc Gauvain and Lori Lamel, “Large-Vocabulary Continuous Speech Recognition: Advances and Applications,” Proceedings of the IEEE, August 2000 2
Search Algorithms for Speech Recognition � Template-based: without statistical modeling/training – Directly compare/align the test and reference waveforms on their features vector sequences (could be with different length) to derive the overall distortion between them – Dynamic Time Warping (DTW) : warp speech templates in the time dimension to alleviate the distortion � Model-based: HMMs widely used – Concatenate the subword models according to the pronunciation of the words in a lexicon – The states in the HMM can be expanded to form the state-search space (HMM state transition network) in the search – Apply appropriate search strategies 3
Dynamic Time Warping (DTW) Fig .3 4 Eamonn J. Keogh & Michael J. Pazzani
DTW (cont.) 5
DTW (cont.) ( n , m ) (2,3) (2,2) (1,1) 6
DTW (cont.) 7
DTW (cont.) 8
Advantages of DTW � Speech recognition based on DTW is simple to implement and fairly effective for small-vocabulary isolated word speech recognition � Dynamic programming (DP) can temporally align patterns to account for differences in speaking rates across speakers as well as across repetitions of the word by the same speaker 9
Weaknesses of DTW � Creation of the template streams from data is non-trivial and typically is accomplished by pairwise warping of training instances � Alternatively, all observed instances are stored as templates, but this is incredibly slow � As a result, the HMM is a much better alternative for spoken language processing Also refer to page 383 of the text book. 10
Continuous Speech Recognition � Continuous speech recognition is both a pattern recognition and search problem – In speech recognition, making a search decision is also referred as decoding • Find a sequence of words whose corresponding acoustic and language models best match the input signal • The search space (complexity) is highly correlated with the search space determined by the constraints imposed by the language models � Speech recognition search is usually done with the Viterbi or A* stack decoders – The relative merits of both search algorithms were quite controversial in the 1980s 11
Model-based Speech Recognition � A search process to uncover the word sequence ( ) ˆ = W w w ,..., w with the maximum posterior probability P W X 1 2 m ( ) ˆ = W arg max P W X W ) ( ) ( P W P X W = W w , w ,.. w ,..., w = arg max ( ) 1 2 i m P X { } W ∈ where w V : v ,v ,.....,v ) ( ) ( i 1 2 N = arg max P W P X W W Acoustic Model Probability Language Model Probability Unigram: ( ) ( ) C w ( ) ( ) ( ) ( ) N-gram ≈ = P w w .. w P w P w ... P w , P w j ( ) C w 1 2 k 1 2 k j ∑ Language Modeling Bigram: i i ( ) ( ) ) ( ) ( ) C w w ( ) ( ( ) ≈ = P w w .. w P w P w w ... P w w , P w w − j 1 j C w − − 1 2 k 1 2 1 k k 1 j j 1 − Trigram: j 1 ( ) ) ( ) ( ) ( ) ( ) C w w w ( ) ( ≈ = ( ) P w w .. w P w P w w P w w w ... P w w w , P w w w j − 2 j − 1 j C w w − − − − 1 2 k 1 2 1 3 1 2 k k 2 k 1 k k 1 k 2 − − j 2 j 1 12
Block Diagram of Model-based Speech Recognition Information-based Case Grammar 13
Basic Search Algorithms
What Is “Search”? � The idea of search implies moving around, examining things, and making decisions about whether the sought object has yet been found � Two classical problems in AI – Traveling salesman’s problem : find a shortest-distance tour, starting at one of many cities, visiting each city exactly once, and returning to the starting city – N -queens problem : place N queens on an N x N chessboard in such a way that no queen can capture any other queen; i.e., there is no more than one queen in any given row, column, or diagonal. 15
A Simple City-traveling Problem 16
A Simple City-traveling Problem (cont.) 17
The General Graph Search Algorithm OPEN : stores the nodes waiting for expansion CLOSE : stores the already expanded nodes If both 6(a) and 6(b) are omitted, graph search -> tree search 6(a) and 6(b): bookkeeping or merging process 18
Blind Graph Search Algorithms � If the aim of the search problem is to find an acceptable path instead of the best path, blind search is often used � Blind search treats every node in the OPEN list the same and blindly decides the order to be expanded without using any domain knowledge � Blind search does not expand nodes randomly. It follows some systematic way to explore the search graph – Depth-first search – Breadth-first search 19
Depth-First Search � Depth-first search picks an arbitrary alternative at every node visited � The search sticks with this partial path and works forward from it � Other alternatives at the same level are ignored completely � The deepest nodes are expanded first and nodes of equal depth are ordered arbitrary 20
Depth-First Search (cont.) � Depth-first search generates only one at a time – Graph search generates all successors at a time � When depth-first search reaches a dead-end, it goes back to the last decision point and proceeds with another alternative – Depth-first search could be dangerous because it might search an impossible path that is actually an infinite dead-end – A depth bound can be placed to constrain the nodes to be explored 21
Breadth-First Search � Breadth-first search examines all the nodes on one level before considering any of the nodes on the next level (depth) � Breadth-first search is guaranteed to find a solution if one exists – it might not find a shortest- distance path, but it’s guaranteed to find the one with fewest cities visited (minimum-length path) � May be inefficient when all solutions leading to the goal node are at approximately the same depth 22
Breadth-First Search (cont.) 23
Heuristic Graph Search � Blind search finds only one arbitrary solution instead of the optimal solution – To find the optimal solution with depth-first or breadth-first search, the search needs to continue rather than stop searching when the first solution is discovered • After the search reaches all solutions, we can compare them and select the best – British Museum search or brute-force search � Heuristic search takes advantage of the heuristic information (domain-specific knowledge) during search – Use the heuristic function to re-order the OPEN list in Step 7 of Algorithm 12.1 – Some heuristics can reduce search effort without sacrificing optimality, while other can greatly reduce search effort but provide only sub-optimal solutions – Best-first (or A*) search and beam search 24
Best-First Search � Best-first search explores the best node first since it offers the best hope of leading to the best path, this is why it is called best-first search � A search algorithm is called admissible if it can guarantee to find the optimal solution – Admissible best-first search is called A* search f ( N )= g ( N )+ h ( N ) 25 h ( N )=0& g ( N ):depth of node N ->breadth-first
A* Search � History of A* Search in AI – The most widely studied best-first search (Hert, Nilsson,1968) – Developed for additive cost measures (The cost of a path = sum of the costs of its arcs) � Properties – A* search can sequentially generates multiple recognition candidates – A* search needs a heuristic function that satisfies the admissible condition � Admissibility – The property that a search algorithm guarantees to find an optimal solution, if one exists 26
A* Search – 1st example 27
A* Search – 1st example (cont.) S 2+10.3 3+8.5 A (3+4)+10.3 (3+3)+5.7 C (6+3)+2.8 E (9+5)+7 9+3 G 28
Recommend
More recommend