Word2Vec Michael Collins, Columbia University Motivation We can - PowerPoint PPT Presentation

Jun 14, 2023 •193 likes •254 views

Word2Vec Michael Collins, Columbia University Motivation We can easily collect very large amounts of unlabeled text data Can we learn useful representations (e.g., word embeddings) from unlabeled data? Bigrams from Unlabeled Data

Word2Vec Michael Collins, Columbia University
Motivation ◮ We can easily collect very large amounts of unlabeled text data ◮ Can we learn useful representations (e.g., word embeddings) from unlabeled data?
Bigrams from Unlabeled Data ◮ Given a corpus, extract a training set { x ( i ) , y ( i ) } for i = 1 . . . n , where each x ( i ) ∈ V , y ( i ) ∈ V , where V is the vocabulary ◮ For example, Hispaniola quickly became an important base from which Spain expanded its empire into the rest of the Western Hemisphere . Given a window size of + / − 3 , for x = base we get the pairs ( base , became ) , ( base , an ) , ( base , important ) , ( base , from ) , ( base , which ) , ( base , Spain )
Learning Word Embeddings ◮ Given a corpus, extract a training set { x ( i ) , y ( i ) } for i = 1 . . . n , where each x ( i ) ∈ V , y ( i ) ∈ V , where V is the vocabulary ◮ For each word w ∈ V , define word embeddings θ ′ ( w ) ∈ R d and θ ( w ) ∈ R d ◮ Define Θ ′ , Θ to be the two matrices of embeddings parameters ◮ Can then define p ( y ( i ) | x ( i ) ; Θ , Θ ′ ) = exp { θ ′ ( x ( i ) ) · θ ( y ( i ) ) } Z ( x ( i ) ; Θ , Θ ′ ) where Z ( x ( i ) ; Θ , Θ ′ ) = � y ∈V exp { θ ′ ( x ( i ) ) · θ ( y ) }
Learning Word Embeddings (Continued) ◮ Can define p ( y ( i ) | x ( i ) ; Θ , Θ ′ ) = exp { θ ′ ( x ( i ) ) · θ ( y ( i ) ) } Z ( x ( i ) ; Θ , Θ ′ ) where Z ( x ( i ) ; Θ , Θ ′ ) = � y ∈V exp { θ ′ ( x ( i ) ) · θ ( y ) } ◮ A first objective function that can be maximized using stochastic gradient: n � log p ( y ( i ) | x ( i ) ; Θ , Θ ′ ) L (Θ , Θ ′ ) = i =1     n   � �  θ ′ ( x ( i ) ) · θ ( y ( i ) ) } − log exp { θ ′ ( x ( i ) ) · θ ( y ) }  =     i =1 y ∈V   � �� Expensive!
An Alternative: Negative Sampling ◮ Given a corpus, extract a training set { x ( i ) , y ( i ) } for i = 1 . . . n , where each x ( i ) ∈ V , y ( i ) ∈ V , where V is the vocabulary ◮ In addition, for each i sample y ( i,k ) for k = 1 . . . K from a “noise” distribution p n ( y ) . E.g., p n ( y ) is the unigram distribution over words y ◮ A new loss function: n exp { θ ′ ( x ( i ) ) · θ ( y ( i ) ) } � L (Θ ′ , Θ) = log 1 + exp { θ ′ ( x ( i ) ) · θ ( y ( i ) ) } i =1 n K 1 � � + log 1 + exp { θ ′ ( x ( i ) ) · θ ( y ( i,k ) ) } i =1 k =1

Recommend

word2vec Durgesh Kumar OSINT LAB, CSE Department IIT Guwahati Table of contents 1 Overview 2

word2vec Durgesh Kumar OSINT LAB, CSE Department IIT Guwahati Table of contents 1 Overview 2 Background 3 Introduction Training word2vec algorithm 4 Terminologies References 5 Durgesh Kumar word2vec 13th December 2019 1 / 16 Word

241 views • 20 slides

word2vec Kuan-Ting Lai 2020/5/28 Word2vec (Word Embeddings) Embed one-hot encoded word

word2vec Kuan-Ting Lai 2020/5/28 Word2vec (Word Embeddings) Embed one-hot encoded word vectors into dense vectors Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. "Distributed representations of words and

529 views • 38 slides

An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec

An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline Introduction 1 Background & Significance 2 Architecture 3 CBOW word representations 4 model

1.38k views • 28 slides

Introduction CSCE CSCE 496/896 496/896 Lecture 9: Lecture 9: word2vec and word2vec and To

Introduction CSCE CSCE 496/896 496/896 Lecture 9: Lecture 9: word2vec and word2vec and To apply recurrent architectures to text (e.g., NLM), CSCE 496/896 Lecture 9: node2vec node2vec need numeric representation of words Stephen Scott

394 views • 4 slides

word2vec Tom Kenter IR Reading Group September 12 2014

word2vec Tom Kenter IR Reading Group September 12 2014 Word2vec: what is it (not)? It is: Neural network-based Word embeddings Mapping

509 views • 19 slides

Word2vec and beyond presented by Eleni Triantafillou March 1, 2016 The Big Picture There is a

Word2vec and beyond presented by Eleni Triantafillou March 1, 2016 The Big Picture There is a long history of word representations Techniques from information retrieval: Latent Semantic Analysis (LSA) Self-Organizing Maps (SOM)

579 views • 43 slides

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY 5/3/18 Outline NLP Intro Word representations and word embeddings Word2vec models Visualizing word embeddings Word2vec in

1.18k views • 63 slides

Word Embeddings - Word2Vec Fall 2020 2020-09-30 Adapted from slides from Dan Jurafsky, Chris

SFU NatLangLab CMPT 825: Natural Language Processing Word Embeddings - Word2Vec Fall 2020 2020-09-30 Adapted from slides from Dan Jurafsky, Chris Manning, Danqi Chen and Karthik Narasimhan 1 Announcements Homework 1 due today Both

799 views • 57 slides

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers Why are these so different? Last

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers Why are these so different? Last time : words are vectors of observed counts How big are these vectors? Big vectors: the size of your vocabulary How similar are two words? sim(

653 views • 18 slides

Why is word2vec so fast? Efficiency tricks for neural nets Taylor Berg-Kirkpatrick Site

CS11-747 Neural Networks for NLP Why is word2vec so fast? Efficiency tricks for neural nets Taylor Berg-Kirkpatrick Site https://phontron.com/class/nn4nlp2017/ Glamorous Life of an AI Scientist Perception Reality Waiting. Photo Credit:

641 views • 34 slides

node2vec: Scalable Feature Learning for Networks Aditya Grover, Jure Leskovec Farzaneh Heidari

node2vec: Scalable Feature Learning for Networks Aditya Grover, Jure Leskovec Farzaneh Heidari Outline word2vec (Background) Random Walk (Background) node2vec Evaluation Results Deficiencies 3 4 Random word2vec

427 views • 18 slides

Thomas Wood NLP/data science consultant Past projects Boehringer Ingelheim - pharma

Thomas Wood NLP/data science consultant Past projects Boehringer Ingelheim - pharma CV Library: Predict industries/salaries from CV - word2vec + CNN Predict search terms from CV - LSTM Forensic stylometry demo

139 views • 9 slides

Will Nathan like Camille? Will Nathan vote for candidate T.? 2

Will Nathan like Camille? Will Nathan vote for candidate T.? 2 Will Nathan like Camille? Will Nathan vote for candidate T.? 3 4 5 word2vec can be

547 views • 17 slides

#TagSpace: Semantic Embeddings from Hashtags Jason Weston, Sumit Chopra, Keith Adams 2014

#TagSpace: Semantic Embeddings from Hashtags Jason Weston, Sumit Chopra, Keith Adams 2014 Jack Lanchantin Motivation Word and document embeddings are difficult to learn Most current techniques use unsupervised methods word2vec

541 views • 19 slides

Image2Vec: Learning image representation for reasoning Lerrel J. Pinto, Gunnar A. Sigurdsson

Image2Vec: Learning image representation for reasoning Lerrel J. Pinto, Gunnar A. Sigurdsson Word2Vec Learning the meaning of words along with semantic relationships Vector representations encode linguistic regularities and patterns

131 views • 12 slides

Deep Learning Methods for Natural Language Processing Garrett Hoffman Director of Data Science @

Deep Learning Methods for Natural Language Processing Garrett Hoffman Director of Data Science @ StockTwits Talk Overview Learning Distributed Representations of Words with Word2Vec Recurrent Neural Networks and their Variants

1.4k views • 92 slides

The Simplex Method Combinatorial Problem Solving (CPS) Javier Larrosa Albert Oliveras Enric

The Simplex Method Combinatorial Problem Solving (CPS) Javier Larrosa Albert Oliveras Enric Rodr guez-Carbonell May 6, 2020 Global Idea The Fundamental Theorem of Linear Programming ensures it is sufficient to explore basic

1.12k views • 40 slides

Coding addition Sasha Rubin Cornell REU 2009 Arithmetic on N Addition is space-efficient. eg.

Coding addition Sasha Rubin Cornell REU 2009 Arithmetic on N Addition is space-efficient. eg. Base 10 coding of N . carry propagation procedure for +. Arithmetic on N Addition is space-efficient. eg. Base 10 coding of N . carry

485 views • 7 slides

Defining Tax Base and Tax Rates PMR Technical Meeting on Carbon Tax May 2014 British

British Columbia Carbon Tax Defining Tax Base and Tax Rates PMR Technical Meeting on Carbon Tax May 2014 British Columbia Carbon Tax Emissions from fuel Carbon Tax Coverage combustion included in carbon tax base approx 72% Emissions

558 views • 18 slides

Objectives Induction and Recursion Identify the parts of a proof by induction and their

Base case: Let n . Then n = 1, and the sum of the list is 1; therefore the base case holds. Induction case: Suppose you need to show that this property is true for some n . First, pretend that somebody else already did all the work of proving that

289 views • 3 slides

2 4 0 Integer Representation 100 10 1 weight 10 2 10 1 10 0 position 2 1 0 Base

positional number representation Wellesley CS 240 = 2 x 10 2 + 4 x 10 1 + 0 x 10 0 2 4 0 Integer Representation 100 10 1 weight 10 2 10 1 10 0 position 2 1 0 Base determines: Bits, binary numbers, and bytes Fixed-width

444 views • 5 slides

WIP: Coherence via big categories with families of locally cartesian closed categories Martin

WIP: Coherence via big categories with families of locally cartesian closed categories Martin Bidlingmaier 1 Aarhus University 1 Supported by AFOSR grant 12595060. The coherence problem Locally cartesian closed (lcc) categories are natural

328 views • 15 slides

Constructing N by S 1 induction Robert Rose Indiana University rrose1@iu.edu August 13, 2019

Constructing N by S 1 induction Robert Rose Indiana University rrose1@iu.edu August 13, 2019 The problem In Martin-L of type theory and homotopy type theory, an axiom of infinity is given explicitly as a type of natural numbers with

646 views • 23 slides

Toward naive type theories David Ripley University of Connecticut http://davewripley.rocks

Toward naive type theories David Ripley University of Connecticut http://davewripley.rocks Types Why type theory? Types Why type theory? Type theories are a family of formalisms with a wide range of uses. Invented to block paradox, they

834 views • 62 slides