lsa spaces for semantic differences
play

LSA Spaces for Semantic Differences John C. Martin Dissertation - PowerPoint PPT Presentation

Comparison of Hyper-dimensional LSA Spaces for Semantic Differences John C. Martin Dissertation Defense 20 May 2016 Overview Review LSA model of learning What is meaning? Measures Experiments Semantic Measurement Model Q


  1. Comparison of Hyper-dimensional LSA Spaces for Semantic Differences John C. Martin Dissertation Defense 20 May 2016

  2. Overview • Review LSA model of learning – What is meaning? • Measures • Experiments • Semantic Measurement Model • Q & A

  3. The LSA Model of Learning  Orthogonal Axes  Dimensionality Reduction  Mapping System Meaning

  4. Compositionality Constraint The meaning of a document is the sum of the meaning of its words 𝑟 𝑈 𝑉 𝑙 D =  𝑙

  5. Compositionality Constraint Corollary The meaning of a word is defined by the documents in which it appears (and does not appear)

  6. Meaning The Mapping system consists of:  Term Vector Dictionary  Singular Values

  7. Motivation

  8. Objective Find a measure or set of measures that can quantify the difference between two spaces

  9. Measures • Direct Comparison • Projected Content Comparison • Rotated Item Comparison

  10. Direct Comparison Measures 2 1 2 1 3 3

  11. Individual Space Measures • Document Count • Term Count • Non-zeroes

  12. Distribution Analysis

  13. Term and Document Overlap

  14. Projected Content Comparisons 2 Matched items 1 3 projected into each space 1 2 3

  15. Projected Item Distribution

  16. Three-Tuple Comparisons 𝐵, 𝐶, 𝐷 𝐵 = 𝑞 𝑗 , 𝐶 = 𝑞 𝑘 , 𝐷 = 𝑞 𝑙 , where 𝑗 ≠ 𝑘 ≠ 𝑙, ∀𝑞 ∈ 𝑄

  17. Three-Tuple Relationship Changes

  18. Rotations and Transform Comparisons 2 1 3 2 1 3 2'

  19. The Transform 𝐵 1 = 𝑄𝑠𝑝𝑘𝑓𝑑𝑢(𝐵, 𝑇 1 ) 𝐵 2 = 𝑄𝑠𝑝𝑘𝑓𝑑𝑢(𝐵, 𝑇 2 ) 𝑈 𝐵 2 = 𝑉  𝑊 𝑈 𝐵 1 𝑅 = 𝑉𝑊 𝑈 𝐵 1 𝑅 − 𝐵 2 𝐺

  20. Comparative Space Centroid Analysis C 1 C 2 C 1 C 2

  21. Overlapping Term Vector Norm 𝑈 𝑙 2 𝑈 1 𝑅 − 𝑈 2 𝐺 = 𝑈 𝐺 = 𝑢 𝑗,𝑘 𝑗=1 𝑘=1

  22. Projection/Anchor Sets Unique Term Set Documents Terms Instances NICHD04 1,060 5,912 70,063 T-500 500 16,317 123,668 T-1000 1,000 24,319 252,372 T-5000 5,000 49,995 1,281,749

  23. Control Experiment

  24. General Experiment

  25. General Experiment

  26. Grade Level Series Experiment

  27. Grade Level Series OTV-Norm

  28. Large Volume Experiment

  29. Large Volume Experiment

  30. Non-overlapping Series Experiment

  31. Non-Overlapping Series OTV-Norm

  32. Frozen Vocabulary Experiment

  33. OTV-Norm

  34. Semantic Measurement Model 𝑈𝐷% ≈ −0.207882 + 0.0507194 𝑃𝑈𝑊𝑂𝑝𝑠𝑛 + −0.339339(𝑈𝑃𝑆)

  35. Summary of Contributions • Semantic differences are observable – Measurable – Quality based • Similarity not dependent on overlapping content • OTV-Norm & Semantic Measurement Model – Whole-space measurement

  36. Further Research • Refine the model – Anchor set selection/influence – Account for non-overlapping terms – Investigate non-linear model • Other questions raised

  37. Leverage for Answering Other Questions  Is it possible to identify key documents that affect the meaning of a space?  Do additional items added to a space have any impact?  Is there a point at which adding any items to a space makes no difference?  Is it possible to identify necessary knowledge that would align two spaces?

  38. Q&A

  39. Backup Slides

  40. Projection of New Content Mapping Information 2 LSA Text 1 Space Sources 3 Projection 1 2 3

  41. Data • 42 Spaces • 592 Comparisons • 4 Projection Sets • 4 Anchor Sets • 26 Measures 61,568 Data Items Collected

  42. Distribution Analysis

Recommend


More recommend