word embeddings
play

Word Embeddings Luke Zettlemoyer (Slides adapted from Danqi Chen, - PowerPoint PPT Presentation

CSEP 517 Natural Language Processing Word Embeddings Luke Zettlemoyer (Slides adapted from Danqi Chen, Greg Durrett, Chris Manning, Dan Jurafsky) How to represent words? N-gram language models P ( w it is 76 F and ) It is 76 F and ___.


  1. CSEP 517 Natural Language Processing Word Embeddings Luke Zettlemoyer (Slides adapted from Danqi Chen, Greg Durrett, Chris Manning, Dan Jurafsky)

  2. How to represent words? N-gram language models P ( w ∣ it is 76 F and ) It is 76 F and ___. [0.0001, 0.1, 0, 0, 0.002, …, 0.3, …, 0] red sunny Text classification P ( y = 1 ∣ x ) = σ ( θ ⊺ w + b ) I like this movie. 👎 [0, 1, 0, 0, 0, …, 1, …, 1] w (1) [0, 1, 0, 1, 0, …, 1, …, 1] I don’t like this movie. 👏 w (2) don’t

  3. Representing words as discrete symbols In traditional NLP, we regard words as discrete symbols: hotel, conference, motel — a localist representation one 1, the rest 0’s Words can be represented by one-hot vectors: hotel = [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] motel = [0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0] Vector dimension = number of words in vocabulary (e.g., 500,000) Challenge: How to compute similarity of two words?

  4. Representing words by their context Distributional hypothesis : words that occur in similar contexts tend to have similar meanings J.R.Firth 1957 • “You shall know a word by the company it keeps” • One of the most successful ideas of modern statistical NLP! These context words will represent banking.

  5. Distributional hypothesis C1: A bottle of ___ is on the table. “tejuino” C2: Everybody likes ___. C3: Don’t have ___ before you drive. C4: We make ___ out of corn.

  6. Distributional hypothesis C1 C2 C3 C4 tejuino 1 1 1 1 C1: A bottle of ___ is on the table. 0 0 0 0 loud C2: Everybody likes ___. 1 0 0 0 motor-oil C3: Don’t have ___ before you drive. 0 1 0 1 tortillas 0 1 0 0 choices C4: We make ___ out of corn. 1 1 1 0 wine “words that occur in similar contexts tend to have similar meanings”

  7. Words as vectors • We’ll build a new model of meaning focusing on similarity • Each word is a vector • Similar words are “nearby in space” • A first solution: we can just use context vectors to represent the meaning of words! • word-word co-occurrence matrix:

  8. <latexit sha1_base64="Ex14HCNmlweatqm/BdVuIyGZI=">ACX3icbZFLSwMxFIUz47tWHXUlboJFqCBlpgi6EUQ3LivYKnTaIZNmajDzMI+BEvIn3Qlu/Cdm2graeiFwON+9JPckLhgV0vc/HdldW19Y3Ortl3f2d3z9g96Ilcky7OWc6fYyQIoxnpSioZeS4QWnMyFP8elfxp5JwQfPsU4KMkjROKMJxUhaK/JKnItmCL5EidamXP4o0tzBq9hmHCEdShUGml6HZih7hmtIgrLiBpjwRuXS3jYruC/rJwxE3kNv+VPCy6LYC4aYF6dyHsPRzlWKckZkiIfuAXcqARlxQzYmqhEqRA+BWNSd/KDKVEDPQ0HwNPrTOCSc7tyScur8nNEqFmKSx7ay2F4usMv9jfSWTq4GmWaEkyfDsokQxKHNYhQ1HlBMs2cQKhDm1b4X4BdlIpf2Smg0hWFx5WfTarcBvBQ8XjZvbeRyb4BicgCYIwCW4AfegA7oAg0/HdbaduvPlbri7rjdrdZ35zCH4U+7RNyRuQc=</latexit> <latexit sha1_base64="Ex14HCNmlweatqm/BdVuIyGZI=">ACX3icbZFLSwMxFIUz47tWHXUlboJFqCBlpgi6EUQ3LivYKnTaIZNmajDzMI+BEvIn3Qlu/Cdm2graeiFwON+9JPckLhgV0vc/HdldW19Y3Ortl3f2d3z9g96Ilcky7OWc6fYyQIoxnpSioZeS4QWnMyFP8elfxp5JwQfPsU4KMkjROKMJxUhaK/JKnItmCL5EidamXP4o0tzBq9hmHCEdShUGml6HZih7hmtIgrLiBpjwRuXS3jYruC/rJwxE3kNv+VPCy6LYC4aYF6dyHsPRzlWKckZkiIfuAXcqARlxQzYmqhEqRA+BWNSd/KDKVEDPQ0HwNPrTOCSc7tyScur8nNEqFmKSx7ay2F4usMv9jfSWTq4GmWaEkyfDsokQxKHNYhQ1HlBMs2cQKhDm1b4X4BdlIpf2Smg0hWFx5WfTarcBvBQ8XjZvbeRyb4BicgCYIwCW4AfegA7oAg0/HdbaduvPlbri7rjdrdZ35zCH4U+7RNyRuQc=</latexit> <latexit sha1_base64="Ex14HCNmlweatqm/BdVuIyGZI=">ACX3icbZFLSwMxFIUz47tWHXUlboJFqCBlpgi6EUQ3LivYKnTaIZNmajDzMI+BEvIn3Qlu/Cdm2graeiFwON+9JPckLhgV0vc/HdldW19Y3Ortl3f2d3z9g96Ilcky7OWc6fYyQIoxnpSioZeS4QWnMyFP8elfxp5JwQfPsU4KMkjROKMJxUhaK/JKnItmCL5EidamXP4o0tzBq9hmHCEdShUGml6HZih7hmtIgrLiBpjwRuXS3jYruC/rJwxE3kNv+VPCy6LYC4aYF6dyHsPRzlWKckZkiIfuAXcqARlxQzYmqhEqRA+BWNSd/KDKVEDPQ0HwNPrTOCSc7tyScur8nNEqFmKSx7ay2F4usMv9jfSWTq4GmWaEkyfDsokQxKHNYhQ1HlBMs2cQKhDm1b4X4BdlIpf2Smg0hWFx5WfTarcBvBQ8XjZvbeRyb4BicgCYIwCW4AfegA7oAg0/HdbaduvPlbri7rjdrdZ35zCH4U+7RNyRuQc=</latexit> <latexit sha1_base64="Ex14HCNmlweatqm/BdVuIyGZI=">ACX3icbZFLSwMxFIUz47tWHXUlboJFqCBlpgi6EUQ3LivYKnTaIZNmajDzMI+BEvIn3Qlu/Cdm2graeiFwON+9JPckLhgV0vc/HdldW19Y3Ortl3f2d3z9g96Ilcky7OWc6fYyQIoxnpSioZeS4QWnMyFP8elfxp5JwQfPsU4KMkjROKMJxUhaK/JKnItmCL5EidamXP4o0tzBq9hmHCEdShUGml6HZih7hmtIgrLiBpjwRuXS3jYruC/rJwxE3kNv+VPCy6LYC4aYF6dyHsPRzlWKckZkiIfuAXcqARlxQzYmqhEqRA+BWNSd/KDKVEDPQ0HwNPrTOCSc7tyScur8nNEqFmKSx7ay2F4usMv9jfSWTq4GmWaEkyfDsokQxKHNYhQ1HlBMs2cQKhDm1b4X4BdlIpf2Smg0hWFx5WfTarcBvBQ8XjZvbeRyb4BicgCYIwCW4AfegA7oAg0/HdbaduvPlbri7rjdrdZ35zCH4U+7RNyRuQc=</latexit> <latexit sha1_base64="LwUwX9BK80Y7mx5ulvUsqXb08A0=">ACSHicbVDLSsNAFJ3UV62vqEs3g0WoICURQTdC0Y3LCvYBTQiT6aQdOsmEmUmhpPk8Ny7d+Q1uXCjizklbqbYeGDj3nHu5d4fMyqVZb0YhZXVtfWN4mZpa3tnd8/cP2hKnghMGpgzLto+koTRiDQUVYy0Y0FQ6DPS8ge3ud8aEiEpjx7UKCZuiHoRDShGSkue6WEuK06IVN8P0iQ7gz98mJ3Ca+gEAuF07kMHd7n61ZSlznhuO2M4L4e6zDyzbFWtCeAysWekDGaoe+az0+U4CUmkMENSdmwrVm6KhKYkazkJLECA9Qj3Q0jVBIpJtOgsjgiVa6MOBCv0jBifp7IkWhlKPQ1535kXLRy8X/vE6igis3pVGcKBLh6aIgYVBxmKcKu1QrNhIE4QF1bdC3Ec6O6WzL+kQ7MUvL5PmedW2qvb9Rbl2M4ujCI7AMagAG1yCGrgDdAGDyCV/AOPown4834NL6mrQVjNnMI/qBQ+AbQvrTU</latexit> <latexit sha1_base64="LwUwX9BK80Y7mx5ulvUsqXb08A0=">ACSHicbVDLSsNAFJ3UV62vqEs3g0WoICURQTdC0Y3LCvYBTQiT6aQdOsmEmUmhpPk8Ny7d+Q1uXCjizklbqbYeGDj3nHu5d4fMyqVZb0YhZXVtfWN4mZpa3tnd8/cP2hKnghMGpgzLto+koTRiDQUVYy0Y0FQ6DPS8ge3ud8aEiEpjx7UKCZuiHoRDShGSkue6WEuK06IVN8P0iQ7gz98mJ3Ca+gEAuF07kMHd7n61ZSlznhuO2M4L4e6zDyzbFWtCeAysWekDGaoe+az0+U4CUmkMENSdmwrVm6KhKYkazkJLECA9Qj3Q0jVBIpJtOgsjgiVa6MOBCv0jBifp7IkWhlKPQ1535kXLRy8X/vE6igis3pVGcKBLh6aIgYVBxmKcKu1QrNhIE4QF1bdC3Ec6O6WzL+kQ7MUvL5PmedW2qvb9Rbl2M4ujCI7AMagAG1yCGrgDdAGDyCV/AOPown4834NL6mrQVjNnMI/qBQ+AbQvrTU</latexit> <latexit sha1_base64="LwUwX9BK80Y7mx5ulvUsqXb08A0=">ACSHicbVDLSsNAFJ3UV62vqEs3g0WoICURQTdC0Y3LCvYBTQiT6aQdOsmEmUmhpPk8Ny7d+Q1uXCjizklbqbYeGDj3nHu5d4fMyqVZb0YhZXVtfWN4mZpa3tnd8/cP2hKnghMGpgzLto+koTRiDQUVYy0Y0FQ6DPS8ge3ud8aEiEpjx7UKCZuiHoRDShGSkue6WEuK06IVN8P0iQ7gz98mJ3Ca+gEAuF07kMHd7n61ZSlznhuO2M4L4e6zDyzbFWtCeAysWekDGaoe+az0+U4CUmkMENSdmwrVm6KhKYkazkJLECA9Qj3Q0jVBIpJtOgsjgiVa6MOBCv0jBifp7IkWhlKPQ1535kXLRy8X/vE6igis3pVGcKBLh6aIgYVBxmKcKu1QrNhIE4QF1bdC3Ec6O6WzL+kQ7MUvL5PmedW2qvb9Rbl2M4ujCI7AMagAG1yCGrgDdAGDyCV/AOPown4834NL6mrQVjNnMI/qBQ+AbQvrTU</latexit> <latexit sha1_base64="LwUwX9BK80Y7mx5ulvUsqXb08A0=">ACSHicbVDLSsNAFJ3UV62vqEs3g0WoICURQTdC0Y3LCvYBTQiT6aQdOsmEmUmhpPk8Ny7d+Q1uXCjizklbqbYeGDj3nHu5d4fMyqVZb0YhZXVtfWN4mZpa3tnd8/cP2hKnghMGpgzLto+koTRiDQUVYy0Y0FQ6DPS8ge3ud8aEiEpjx7UKCZuiHoRDShGSkue6WEuK06IVN8P0iQ7gz98mJ3Ca+gEAuF07kMHd7n61ZSlznhuO2M4L4e6zDyzbFWtCeAysWekDGaoe+az0+U4CUmkMENSdmwrVm6KhKYkazkJLECA9Qj3Q0jVBIpJtOgsjgiVa6MOBCv0jBifp7IkWhlKPQ1535kXLRy8X/vE6igis3pVGcKBLh6aIgYVBxmKcKu1QrNhIE4QF1bdC3Ec6O6WzL+kQ7MUvL5PmedW2qvb9Rbl2M4ujCI7AMagAG1yCGrgDdAGDyCV/AOPown4834NL6mrQVjNnMI/qBQ+AbQvrTU</latexit> Words as vectors u · v cos ( u , v ) = k u kk v k P V i =1 u i v i cos ( u , v ) = qP V qP V i =1 u 2 i =1 v 2 i i What is the range of ? cos( ⋅ )

  9. Words as vectors Problem: not all counts are equal, words can randomly co-occur • Solution: re-weight by how likely it is for the two words to co-occur by simple chance • PPMI = Positive Pointwise Mutual Information

  10. Sparse vs dense vectors • Still, the vectors we get from word-word occurrence matrix are sparse (most are 0’s) & long (vocabulary size) • Alternative: we want to represent words as short (50-300 dimensional) & dense (real-valued) vectors • The focus of this lecture • The basis of all the modern NLP systems

Recommend


More recommend