Practical Learning Algorithms for Structured Prediction Models Kai-Wei Chang University of Illinois at Urbana-Champaign
Dream: Intelligent systems that are able to read, to see, to talk, and to answer questions. 2
Translation system Personal assistant system 3
Carefully Slide 4
小心 : 地滑 : Carefully Slide Careful Landslip Take Care Wet Floor Caution Smooth 5
Q: [Chris] = [Mr. Robin] ? Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy , Chris lived in a pretty home called Cotchfield Farm . When Chris was three years old, his father wrote a poem about him . The poem was printed in a magazine for others to read. Mr. Robin then wrote a book Slide modified from Dan Roth 6
Complex Decision Structure Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy , Chris lived in a pretty home called Cotchfield Farm . When Chris was three years old, his father wrote a poem about him . The poem was printed in a magazine for others to read. Mr. Robin then wrote a book 7
Co-reference Resolution Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh . As a boy , Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him . The poem was printed in a magazine for others to read. Mr. Robin then wrote a book 8
Scalability Issues Algorithm 2 is shown to perform a local-optimality guarantee. Robin is alive and well. He is the same better Berg-Kirkpatrick, ACL Bill Clinton , recently elected as the President of Can learning to search work even Methods for learning to search Consequently, LOLS can Robin is alive and well. He is the person that you read about in the book, 2010. It can also be expected to the USA , has been invited by the Russian when the reference is poor? for structured prediction typically improve upon the reference President] , [Vladimir Putin , to visit Russia . same person that you read about Winnie the Pooh. As a boy, Chris lived converge faster -- anyway, the E- President Clinton said that he looks forward to We provide a new learning to imitate a reference policy, with policy, unlike previous in the book, Winnie the Pooh. As in a pretty home called Cotchfield step changes the auxiliary strengthening ties between USA and Russia search algorithm, LOLS, which existing theoretical guarantees algorithms. This enables us to a boy, Chris lived in a pretty Farm. When Chris was three years old, function by changing the does well relative to the demonstrating low regret develop structured contextual home called Cotchfield Farm. his father wrote a poem about him. The expected counts, so there's no reference policy, but additionally compared to that reference. This bandits, a partial information When Chris was three years old, poem was printed in a magazine for point in finding a local maximum guarantees low regret compared is unsatisfactory in many structured prediction setting with his father wrote a poem about others to read. Mr. Robin then wrote a of the auxiliary to deviations from the learned applications where the reference many potential applications. him. The poem was printed in a book function in each iteration policy. policy is suboptimal and the goal magazine for others to read. Mr. of learning is to Robin then wrote a book Large amount of data Complex decision structure 9
Goal: Practical Machine Learning [Modeling] Expressive and general formulations [Algorithms] Principled and efficient [Applications] Support many applications 10
My Research Contributions Limited memory linear classifier [KDD 10, 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, 14] Data Size Linear classification [ ICML08, KDD 08, Robin is alive and well. He is the same person that Methods for learning to you read about in the JMLR 08a, 10a, 10b,10c] search for structured book, Winnie the Pooh. prediction typically Can learning to search As a boy, Chris lived in a imitate a reference policy, work even when the Algorithm 2 is shown to a local-optimality pretty home called with existing theoretical reference is poor? perform better Berg- guarantee. Consequently, Cotchfield Farm. When guarantees demonstrating We provide a new Kirkpatrick, ACL 2010. It LOLS can improve upon Chris was three years old, low regret compared to learning to search can also be expected to the reference policy, his father wrote a poem that reference. This is algorithm, LOLS, which converge faster -- unlike previous about him. The poem was unsatisfactory in many does well relative to the anyway, the E-step algorithms. This enables printed in a magazine for applications where the reference policy, but changes the auxiliary us to develop structured others to read. Mr. Robin reference policy is additionally guarantees function by changing the contextual bandits, a then wrote a book Bill Clinton , recently elected as the suboptimal and the goal low regret compared to President of the USA , has been invited expected counts, so there's partial information of learning is to by the Russian President] , [Vladimir deviations from the Structured prediction models no point in finding a local structured prediction Putin , to visit Russia . President learned policy. Clinton said that he looks forward to maximum of the auxiliary setting with many strengthening ties between USA and function in each iteration potential applications. Russia [ICML 14, ECML 13a, 13b , AAAI 15, CoNLL 11, 12] Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years Problem Complexity old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book 11
My Research Contributions LIBLINEAR [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] • Implements our proposed learning algorithms • Supports binary and multiclass classification Impact: > 60,000 downloads, > 2,600 citations in AI (AAAI, IJCAI), Data Mining (KDD, ICDM), Machine Learning (ICML, NIPS) Computer Vision (ICCV, CVPR), Information Retrieval (WWW, SIGIR), NLP (ACL, EMNLP), Multimedia (ACM-MM), HCI (UIST), System (CCS) Limited memory linear classifier [KDD 10, KDD 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, EMNLP 14] Data Size Linear Classification [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] Structured prediction models [ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12] 12 Problem complexity
My Research Contributions (Selective) Block Minimization [KDD 10, 11, TKDD 12] Supports learning from large data and streaming data KDD best paper (2010), Yahoo! KSC award (2011) Limited memory linear classifier [KDD 10, KDD 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, EMNLP 14] Data Size Linear Classification [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] Structured prediction models [ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12] 13 Problem complexity
My Research Contributions Latent Representation for KBs [EMNLP 13b,14] Tensor methods for completing missing entries in KBs Applications: e.g., entity relation extraction, word relation extraction. Limited memory linear classifier [KDD 10, KDD 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, EMNLP 14] Data Size Linear Classification [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] Structured prediction models [ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12] 14 Problem complexity
My Research Contributions Structured Prediction Models [ECML 13a, 13b, ICML14, CoNLL 11,12, ECML 13a, AAAI15] • Design tractable, principled, domain specific models • Speedup general structured models Limited memory linear classifier [KDD 10, KDD 11, TKDD 12] Latent representation for knowledge bases [EMNLP 13, EMNLP 14] Data Size Linear Classification [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c] Structured prediction models [ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12] 15 Problem complexity
Structured Prediction Assign values to a set of interdependent output variables Task Input Output Part-of-speech They operate Pronoun Verb Noun And Noun Tagging ships and banks. Dependency They operate Root They operate ships and banks . Parsing ships and banks. Segmentation 16
Structured Prediction Models Learn a scoring function: 𝑇𝑑𝑝𝑠𝑓 𝑝𝑣𝑢𝑞𝑣𝑢 𝑧 | 𝑗𝑜𝑞𝑣𝑢 𝑦 , 𝑛𝑝𝑒𝑓𝑚 𝑥 Linear model: 𝑇 𝑧 | 𝑦, 𝑥 = 𝑗 𝑥 𝑗 𝜚 𝑗 𝑦, 𝑧 Features: e.g., Verb-Noun, Mary-Noun Output 𝑧: Noun Verb Det Adj Noun Input 𝑦: Mary had a little lamb Features based on both input and output 17
Inference Find the best scoring output given the model argmax 𝑇𝑑𝑝𝑠𝑓 𝑝𝑣𝑢𝑞𝑣𝑢 𝑧 | 𝑗𝑜𝑞𝑣𝑢 𝑦 , 𝑛𝑝𝑒𝑓𝑚 𝑥 𝑧 Output space is usually exponentially large Inference algorithms: Specific: e.g., Viterbi (linear chain) General: Integer linear programming (ILP) Approximate inference algorithms: e.g., belief propagation, dual decomposition 18
Recommend
More recommend