SLIDE 1 Comparison of Hyper-dimensional LSA Spaces for Semantic Differences
John C. Martin Dissertation Defense 20 May 2016
SLIDE 2 Overview
- Review LSA model of learning
–What is meaning?
- Measures
- Experiments
- Semantic Measurement Model
- Q & A
SLIDE 3 The LSA Model of Learning
Orthogonal Axes Dimensionality Reduction Mapping System
Meaning
SLIDE 4
SLIDE 5
SLIDE 6
Compositionality Constraint
The meaning of a document is the sum of the meaning of its words
D=
𝑟𝑈𝑉𝑙
𝑙
SLIDE 7
Compositionality Constraint Corollary The meaning of a word is defined by the documents in which it appears (and does not appear)
SLIDE 8 Meaning
The Mapping system consists of:
Term Vector Dictionary Singular Values
SLIDE 9
Motivation
SLIDE 10
Objective
Find a measure or set of measures that can quantify the difference between two spaces
SLIDE 11 Measures
- Direct Comparison
- Projected Content Comparison
- Rotated Item Comparison
SLIDE 12 Direct Comparison Measures
1 2 3 1 2 3
SLIDE 13 Individual Space Measures
- Document Count
- Term Count
- Non-zeroes
SLIDE 14
Distribution Analysis
SLIDE 15
Term and Document Overlap
SLIDE 16 Projected Content Comparisons Matched items projected into each space
1 2 3 1 2 3
SLIDE 17
Projected Item Distribution
SLIDE 18 Three-Tuple Comparisons
𝐵, 𝐶, 𝐷 𝐵 = 𝑞𝑗, 𝐶 = 𝑞𝑘, 𝐷 = 𝑞𝑙, where 𝑗 ≠ 𝑘 ≠ 𝑙, ∀𝑞 ∈ 𝑄
SLIDE 19
Three-Tuple Relationship Changes
SLIDE 20 Rotations and Transform Comparisons
1 2 3 1 2 3 2'
SLIDE 21 The Transform
𝐵1 = 𝑄𝑠𝑝𝑘𝑓𝑑𝑢(𝐵, 𝑇1) 𝐵2 = 𝑄𝑠𝑝𝑘𝑓𝑑𝑢(𝐵, 𝑇2) 𝐵1
𝑈𝐵2 = 𝑉 𝑊𝑈
𝑅 = 𝑉𝑊𝑈
𝐵1𝑅 − 𝐵2 𝐺
SLIDE 22 Comparative Space Centroid Analysis
C1 C2 C2 C1
SLIDE 23 Overlapping Term Vector Norm
𝑈
1𝑅 − 𝑈2 𝐺 =
𝑈
𝐺 =
𝑢 𝑗,𝑘
2 𝑙 𝑘=1 𝑈 𝑗=1
SLIDE 24
SLIDE 25
SLIDE 26 Projection/Anchor Sets
Set Documents Unique Terms Term Instances NICHD04 1,060 5,912 70,063 T-500 500 16,317 123,668 T-1000 1,000 24,319 252,372 T-5000 5,000 49,995 1,281,749
SLIDE 27
Control Experiment
SLIDE 28
General Experiment
SLIDE 29
General Experiment
SLIDE 30
Grade Level Series Experiment
SLIDE 31
SLIDE 32
Grade Level Series OTV-Norm
SLIDE 33
Large Volume Experiment
SLIDE 34
Large Volume Experiment
SLIDE 35
Non-overlapping Series Experiment
SLIDE 36
Non-Overlapping Series OTV-Norm
SLIDE 37
Frozen Vocabulary Experiment
SLIDE 38
OTV-Norm
SLIDE 39 Semantic Measurement Model
𝑈𝐷% ≈ −0.207882 + 0.0507194 𝑃𝑈𝑊𝑂𝑝𝑠𝑛 + −0.339339(𝑈𝑃𝑆)
SLIDE 40
SLIDE 41
SLIDE 42
SLIDE 43 Summary of Contributions
- Semantic differences are observable
– Measurable – Quality based
- Similarity not dependent on overlapping
content
- OTV-Norm & Semantic Measurement Model
– Whole-space measurement
SLIDE 44 Further Research
–Anchor set selection/influence –Account for non-overlapping terms –Investigate non-linear model
SLIDE 45 Leverage for Answering Other Questions
Is it possible to identify key documents that
affect the meaning of a space?
Do additional items added to a space have
any impact?
Is there a point at which adding any items to a
space makes no difference?
Is it possible to identify necessary knowledge
that would align two spaces?
SLIDE 46
Q&A
SLIDE 47
Backup Slides
SLIDE 48 Projection of New Content
Text Sources LSA Space
2 3 1 1 2 3
Projection
Mapping Information
SLIDE 49 Data
- 42 Spaces
- 592 Comparisons
- 4 Projection Sets
- 4 Anchor Sets
- 26 Measures
61,568 Data Items Collected
SLIDE 50
Distribution Analysis
SLIDE 51