LSA Spaces for Semantic Differences John C. Martin Dissertation - - PowerPoint PPT Presentation

lsa spaces for semantic differences
SMART_READER_LITE
LIVE PREVIEW

LSA Spaces for Semantic Differences John C. Martin Dissertation - - PowerPoint PPT Presentation

Comparison of Hyper-dimensional LSA Spaces for Semantic Differences John C. Martin Dissertation Defense 20 May 2016 Overview Review LSA model of learning What is meaning? Measures Experiments Semantic Measurement Model Q


slide-1
SLIDE 1

Comparison of Hyper-dimensional LSA Spaces for Semantic Differences

John C. Martin Dissertation Defense 20 May 2016

slide-2
SLIDE 2

Overview

  • Review LSA model of learning

–What is meaning?

  • Measures
  • Experiments
  • Semantic Measurement Model
  • Q & A
slide-3
SLIDE 3

The LSA Model of Learning

 Orthogonal Axes  Dimensionality Reduction  Mapping System

Meaning

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Compositionality Constraint

The meaning of a document is the sum of the meaning of its words

D=

𝑟𝑈𝑉𝑙

𝑙

slide-7
SLIDE 7

Compositionality Constraint Corollary The meaning of a word is defined by the documents in which it appears (and does not appear)

slide-8
SLIDE 8

Meaning

The Mapping system consists of:

 Term Vector Dictionary  Singular Values

slide-9
SLIDE 9

Motivation

slide-10
SLIDE 10

Objective

Find a measure or set of measures that can quantify the difference between two spaces

slide-11
SLIDE 11

Measures

  • Direct Comparison
  • Projected Content Comparison
  • Rotated Item Comparison
slide-12
SLIDE 12

Direct Comparison Measures

1 2 3 1 2 3

slide-13
SLIDE 13

Individual Space Measures

  • Document Count
  • Term Count
  • Non-zeroes
slide-14
SLIDE 14

Distribution Analysis

slide-15
SLIDE 15

Term and Document Overlap

slide-16
SLIDE 16

Projected Content Comparisons Matched items projected into each space

1 2 3 1 2 3

slide-17
SLIDE 17

Projected Item Distribution

slide-18
SLIDE 18

Three-Tuple Comparisons

𝐵, 𝐶, 𝐷 𝐵 = 𝑞𝑗, 𝐶 = 𝑞𝑘, 𝐷 = 𝑞𝑙, where 𝑗 ≠ 𝑘 ≠ 𝑙, ∀𝑞 ∈ 𝑄

slide-19
SLIDE 19

Three-Tuple Relationship Changes

slide-20
SLIDE 20

Rotations and Transform Comparisons

1 2 3 1 2 3 2'

slide-21
SLIDE 21

The Transform

𝐵1 = 𝑄𝑠𝑝𝑘𝑓𝑑𝑢(𝐵, 𝑇1) 𝐵2 = 𝑄𝑠𝑝𝑘𝑓𝑑𝑢(𝐵, 𝑇2) 𝐵1

𝑈𝐵2 = 𝑉 𝑊𝑈

𝑅 = 𝑉𝑊𝑈

𝐵1𝑅 − 𝐵2 𝐺

slide-22
SLIDE 22

Comparative Space Centroid Analysis

C1 C2 C2 C1

slide-23
SLIDE 23

Overlapping Term Vector Norm

𝑈

1𝑅 − 𝑈2 𝐺 =

𝑈

𝐺 =

𝑢 𝑗,𝑘

2 𝑙 𝑘=1 𝑈 𝑗=1

slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26

Projection/Anchor Sets

Set Documents Unique Terms Term Instances NICHD04 1,060 5,912 70,063 T-500 500 16,317 123,668 T-1000 1,000 24,319 252,372 T-5000 5,000 49,995 1,281,749

slide-27
SLIDE 27

Control Experiment

slide-28
SLIDE 28

General Experiment

slide-29
SLIDE 29

General Experiment

slide-30
SLIDE 30

Grade Level Series Experiment

slide-31
SLIDE 31
slide-32
SLIDE 32

Grade Level Series OTV-Norm

slide-33
SLIDE 33

Large Volume Experiment

slide-34
SLIDE 34

Large Volume Experiment

slide-35
SLIDE 35

Non-overlapping Series Experiment

slide-36
SLIDE 36

Non-Overlapping Series OTV-Norm

slide-37
SLIDE 37

Frozen Vocabulary Experiment

slide-38
SLIDE 38

OTV-Norm

slide-39
SLIDE 39

Semantic Measurement Model

𝑈𝐷% ≈ −0.207882 + 0.0507194 𝑃𝑈𝑊𝑂𝑝𝑠𝑛 + −0.339339(𝑈𝑃𝑆)

slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43

Summary of Contributions

  • Semantic differences are observable

– Measurable – Quality based

  • Similarity not dependent on overlapping

content

  • OTV-Norm & Semantic Measurement Model

– Whole-space measurement

slide-44
SLIDE 44

Further Research

  • Refine the model

–Anchor set selection/influence –Account for non-overlapping terms –Investigate non-linear model

  • Other questions raised
slide-45
SLIDE 45

Leverage for Answering Other Questions

 Is it possible to identify key documents that

affect the meaning of a space?

 Do additional items added to a space have

any impact?

 Is there a point at which adding any items to a

space makes no difference?

 Is it possible to identify necessary knowledge

that would align two spaces?

slide-46
SLIDE 46

Q&A

slide-47
SLIDE 47

Backup Slides

slide-48
SLIDE 48

Projection of New Content

Text Sources LSA Space

2 3 1 1 2 3

Projection

Mapping Information

slide-49
SLIDE 49

Data

  • 42 Spaces
  • 592 Comparisons
  • 4 Projection Sets
  • 4 Anchor Sets
  • 26 Measures

61,568 Data Items Collected

slide-50
SLIDE 50

Distribution Analysis

slide-51
SLIDE 51