efficient induction of probabilistic word classes with lda
play

Efficient induction of probabilistic word classes with LDA - PowerPoint PPT Presentation

Efficient induction of probabilistic word classes with LDA Grzegorz Chrupa la Saarland University IJCNLP 2011 G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 1 / 29 Word classes Berlin Bangkok Tokyo Warsaw


  1. Efficient induction of probabilistic word classes with LDA Grzegorz Chrupa� la Saarland University IJCNLP 2011 G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 1 / 29

  2. Word classes Berlin Bangkok Tokyo Warsaw Sarkozy Merkel Obama Berlusconi Mr Ms President Dr Groups of words sharing syntax/semantics Useful for generalization and abstraction G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 2 / 29

  3. Word classes as features Have been successfully used in Named Entity recognition Syntactic parsing Sentence retrieval G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 3 / 29

  4. Brown clustering Brown et al propose their algorithm in 1992 Agglomerative, hard clustering algorithm Minimizes MI between adjacent classes Still most commonly used word class type G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 4 / 29

  5. Brown’s weaknesses 1 Time complexity: O ( K 2 V ) G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 5 / 29

  6. Brown’s weaknesses 1 Time complexity: O ( K 2 V ) 2 Hard clustering ◮ Each word form assigned to only one class ◮ Need separate classes for: ⋆ first name ⋆ last name ⋆ first name OR last name ⋆ last name OR city G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 5 / 29

  7. Word class induction with LDA addresses both issues G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 6 / 29

  8. LDA for topic modeling For each topic z draw φ z from a Dirichlet For each document d ◮ Draw a topic distribution θ d from a Dirichlet ◮ Repeat until generated all the words in d ⋆ Draw a topic z from θ d ⋆ Draw a word w from the φ z G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 7 / 29

  9. LDA G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 8 / 29

  10. Topic vs word classes Topics → Word classes Documents → Word types Words → Context features G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 9 / 29

  11. Krzysztof argues L argues R director L director L edits R said R Bledkowski R Kieslowski R Kieslowski R Rutkowski R Sikorski R and L G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 10 / 29

  12. Generative process For each class z draw φ z from a Dirichlet For each word type d ◮ Draw a class distribution θ d from a Dirichlet ◮ Repeat ⋆ Draw a word class z from θ d ⋆ Draw a context feature w from the φ z G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 11 / 29

  13. Induced distributions θ d : class distribution given word type φ z : feature distribution given class G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 12 / 29

  14. Soft clustering Martin Cameron chief Gingrich Martin Newt Van Scott Roberts Mr. Ms. John Robert President Dr. David Street General Texas Fidelity State California G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 13 / 29

  15. Context Newt, Speaker • executive, operating says, Chairman • Clinton, Dole, J. Wall, West, East • County, AG, Journal G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 14 / 29

  16. Efficiency Brown: O ( K 2 V ) LDA: O ( KN ) Scaling feature counts by 1 m reduces LDA runtime m times G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 15 / 29

  17. Testing efficiency in practice 60M words of North American News Text LDA, Brown: 100, 200, 500, 1000 classes LDA counts scaled by 1 3 G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 16 / 29

  18. Runtimes ● ● brown 50.0 lda Runtime hours ● 5.0 ● 1.0 ● ● 0.2 50 100 200 500 G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 17 / 29

  19. Semi-supervised learning performance Use word classes as features Brown ◮ different levels of hierarchy LDA ◮ class distributions and context information Explore several class granularities G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 18 / 29

  20. Fine-grained NER on BBN animal cardinal age date duration disease building highway-street city country state-province law continent region money nationality political ordinal corporation educational government percent person plant vehicle weight chemical drug food time G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 19 / 29

  21. F1 error 14 ● ● 12 ● ● 10 ● brown lda 8 50 200 1000 G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 20 / 29

  22. Morphological analysis Token Lemma MSD Gloss Pero pero but cc cuando cuando when cs era ser he was vsii3s0 ni˜ no ni˜ no boy ncms000 le el to him pp3csd00 gustaba gustar it pleased vmii3p0 G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 21 / 29

  23. MA results with Morfette Brown: 500 classes LDA: 50 classes on Spanish, 100 on French French Spanish Baseline Brown LDA 0 2 4 6 G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 22 / 29

  24. Semantic relation classification Task defined at Semeval 2007 and 2010 The bowl was full of apples, pears and oranges content-container ( pears , bowl ) G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 23 / 29

  25. Relation inventory cause-effect instrument-agency product-producer content-container entity-origin entity-destination component-whole member-collection communication-topic G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 24 / 29

  26. Relation classification results 500 Brown classes, 100 LDA classes Baseline Brown LDA 0 5 10 15 20 25 G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 25 / 29

  27. LDA RC would rank third in Semeval 2010 Without PropBank, FrameNet, WordNet, NomLex, Text Runner, Cyc... G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 26 / 29

  28. To conclude: Efficient induction of Probabilistic word classes which Match or improve on hierarchical Brown classes G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 27 / 29

  29. Thank you G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 28 / 29

  30. Relation classification 50 ● baseline ● lda 40 ● ● 30 ● 20 G. Chrupala (Saarland Uni) Efficient word classes with LDA IJCNLP 2011 29 / 29

Recommend


More recommend