Vectors and Semantics November 2008 Peter Turney Vectors and Semantics Peter Turney Vectors and Semantics Vision of the Future • future of SKDOU: from text to knowledge – input: web – output: knowledge – beyond search: QA with unconstrained questions and answers – 24/7 continuous automatic learning from the web • what will that knowledge look like? – default assumption: – a giant expert system – but generated automatically, no hand-coding • what will that knowledge look like? – my opinion: – expert systems are missing something vital – expert systems are not a sufficient representation of knowledge – we need vectors 2
Vectors and Semantics November 2008 Peter Turney Vectors and Semantics Outline • symbolic versus spatial approaches to knowledge – logic versus geometry • term-document matrix – latent semantic analysis; applications • pair-pattern matrix – latent relational analysis; applications • episodic versus semantic – some hypotheses about vectors and semantics • conclusions – how to acquire knowledge; how to represent knowledge 3 Vectors and Semantics Outline • symbolic versus spatial approaches to knowledge – logic versus geometry • term-document matrix – latent semantic analysis; applications • pair-pattern matrix – latent relational analysis; applications • episodic versus semantic – some hypotheses about vectors and semantics • conclusions – how to acquire knowledge; how to represent knowledge 4
Vectors and Semantics November 2008 Peter Turney Symbolic versus Spatial (1 of 3) Symbolic AI • symbolic approach to knowledge – logic, propositional calculus, graph theory, set theory ... • GOFAI: good old-fashioned AI • benefits – good for deduction, reasoning about entailment, consistency – crisp, clean, binary-valued – good for yes/no questions • does A entail B? • costs – not so good for induction, learning, theories from data – aliasing: noise due to analog to digital conversion – not good for questions about similarity 5 • how similar is A to B? Symbolic versus Spatial (2 of 3) Spatial AI • spatial approach to knowledge – vector spaces, linear algebra, geometry, ... • machine learning, statistics, feature space, information retrieval • benefits – good for induction, learning, theories from data – fuzzy, analog, real-valued – good for questions about similarity • similarity(A,B) = cosine(A,B) • costs – not so good for deduction, entailment, consistency – messy, lots of numbers – not convenient for communication 6 • language is digital
Vectors and Semantics November 2008 Peter Turney Symbolic versus Spatial (3 of 3) Symbolic vs Spatial • need to combine symbolic and spatial approaches – symbolic for communication and entailment – spatial for similarity and learning • reference – Peter Gärdenfors. (2000). Conceptual Spaces: The Geometry of Thought. MIT Press. 7 Vectors and Semantics Outline • symbolic versus spatial approaches to knowledge – logic versus geometry • term-document matrix – latent semantic analysis; applications • pair-pattern matrix – latent relational analysis; applications • episodic versus semantic – some hypotheses about vectors and semantics • conclusions – how to acquire knowledge; how to represent knowledge 8
Vectors and Semantics November 2008 Peter Turney Term-Document Matrix (2 of 9) Technicalities • weighting the elements – give more weight when a term t i is surprisingly frequent in a document d j – tf-idf = term frequency times inverse document frequency – hundreds of variations of tf-idf • smoothing the matrix – problem of sparsity, small corpus – Singular Value Decomposition (SVD), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), Nonnegative Matrix Factorization (NMF), ... • comparing the vectors – many ways to compare two vectors 10 – cosine, Jaccard, Euclidean, Dice, correlation, Hamming, ...
Vectors and Semantics November 2008 Peter Turney Term-Document Matrix (3 of 9) Information Retrieval how similar is document d 1 to document d 2 ? • – cosine of angle between d 1 and d 2 column vectors in matrix • how relevant is document d to query q ? – make a pseudo-document vector to represent q – cosine of angle between d and q • references – Gerard Salton and Michael J. McGill. (1983). Introduction to Modern Information Retrieval . McGraw-Hill. – Scott Deerwester, Susan T. Dumais, and Richard Harshman. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science , 41(6):391-407. 11 Term-Document Matrix (4 of 9) Word Similarity how similar is term t 1 to term t 2 ? • – cosine of angle between t 1 and t 2 row vectors in matrix • evaluation on TOEFL multiple-choice synonym questions – 92.5% highest score of any pure (non-hybrid) algorithm – 64.5% for average human • references – Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review , 104(2):211–240. – Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth Machine Translation Summit , pp. 315-322. 12
Vectors and Semantics November 2008 Peter Turney Term-Document Matrix (5 of 9) Essay Grading • grade student essays – latent semantic analysis – commercial product, Pearson's Knowledge Technologies • references – Rehder, B., Schreiner, M.E., Wolfe, M.B., Laham, D., Landauer, T.K., and Kintsch, W. (1998). Using latent semantic analysis to assess knowledge: Some technical considerations. Discourse Processes , 25, 337-354. – Foltz, P.W., Laham, D., and Landauer, T.K. (1999). Automated essay scoring: Applications to educational technology. Proceedings of the ED-MEDIA ‘99 Conference , Association for the Advancement of Computing in Education, Charlottesville. 13 Term-Document Matrix (6 of 9) Textual Cohesion • measuring textual cohesion – latent semantic analysis • reference – Foltz, P.W., Kintsch, W., and Landauer, T.K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse Processes , 25, 285-307. 14
Vectors and Semantics November 2008 Peter Turney Term-Document Matrix (7 of 9) Semantic Orientation • measuring praise and criticism – latent semantic analysis – small set of positive and negative reference words • good, nice, excellent, positive, fortunate, correct, and superior • bad, nasty, poor, negative, unfortunate, wrong, and inferior – semantic orientation of a word X is sum of similarities of X with positive reference words minus sum of similarities of X with negative reference words • reference – Turney, P.D., and Littman, M.L. (2003), Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems (TOIS) , 21 (4), 315- 346 15 Term-Document Matrix (8 of 9) Logic • logical operations can be performed by linear algebra – t 1 OR t 2 = the vector space spanned by t 1 and t 2 – t 1 NOT t 2 = is the projection of t 1 onto the subspace that is orthogonal to t 2 – bass NOT fisherman = bass in the sense of a musical instrument, not bass in the sense of a fish • reference – Dominic Widdows. (2004). Geometry and Meaning. CSLI Publications. 16
Vectors and Semantics November 2008 Peter Turney Term-Document Matrix (9 of 9) Summary • applications for a term-document (word-chunk) matrix – information retrieval – measuring word similarity – essay grading – textual cohesion – semantic orientation – logic 17 Vectors and Semantics Outline • symbolic versus spatial approaches to knowledge – logic versus geometry • term-document matrix – latent semantic analysis; applications • pair-pattern matrix – latent relational analysis; applications • episodic versus semantic – some hypotheses about vectors and semantics • conclusions – how to acquire knowledge; how to represent knowledge 18
Vectors and Semantics November 2008 Peter Turney Pair-Pattern Matrix (1 of 8) Pair-Pattern Matrix • pair-pattern matrix – rows correspond to pairs of words • X : Y = mason : stone – columns correspond to patterns • “X works with Y” – element corresponds to the frequency of the given pattern in a corpus, when the variables in the pattern are instantiated with the words in the given pair • “mason works with stone” – row vector gives the distribution of the patterns in which the given pair appears • a signature of the semantic relation between mason and stone 19 Pair-Pattern Matrix (2 of 8) Technicalities • exactly the same as with term-document matrices – weighting the elements – smoothing the matrix – comparing the vectors • many lessons carry over from term-document matrices – good weighting approaches – good smoothing algorithms – good formulas for comparing 20
Recommend
More recommend