MIA - Master on Artificial Intelligence Advanced Natural Language - PowerPoint PPT Presentation

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural Language Processing Similarity and Clustering

Advanced Natural Language Processing 1 Similarity and Clustering Similarity and Clustering Similarity Clustering Hierarchical Clustering Non-hierarchical Clustering Evaluation

Advanced Natural Language Processing 1 Similarity and Clustering Similarity and Clustering Similarity Similarity Clustering Hierarchical Clustering Non-hierarchical Clustering Evaluation

The Concept of Similarity Advanced Natural Language Processing Similarity, proximity, affinity, distance, difference, divergence Similarity and We use distance when metric properties hold: Clustering Similarity d ( x , x ) = 0 d ( x , y ) � 0 when x � = y d ( x , y ) = d ( y , x ) (simmetry) d ( x , z ) � d ( x , y ) + d ( y , z ) (triangular inequation) We use similarity in the general case Function: sim : A × B → S (where S is often [ 0, 1 ] ) Homogeneous: sim : A × A → S (e.g. word-to-word) Heterogeneous: sim : A × B → S (e.g. word-to-document) Not necessarily symmetric, or holding triangular inequation.

The Concept of Similarity Advanced Natural Language Processing Similarity and If A is a metric space, the distance in A may be used. Clustering Similarity � � ( x i − y i ) 2 D euclidean ( � x , � y ) = | � x − � y | = i Similarity vs distance 1 sim D ( A , B ) = 1 + D ( A , B ) monotonic: min { sim ( x , y ) , sim ( x , z ) } � sim ( x , y ∪ z )

Applications Advanced Natural Language Processing Similarity and Clustering Clustering, case-based reasoning, IR, ... Similarity Discovering related words - Distributional similarity Resolving syntactic ambiguity - Taxonomic similarity Resolving semantic ambiguity - Ontological similarity Acquiring selectional restrictions/preferences

Relevant Information Advanced Natural Language Processing Content (information about compared units) Similarity and Words: form, morphology, PoS, ... Clustering Senses: synset, topic, domain, ... Similarity Syntax: parse trees, syntactic roles, ... Documents: words, collocations, NEs, ... Context (information about the situation in which simmilarity is computed) Window–based vs. Syntactic–based External Knowledge Monolingual/bilingual dictionaries, ontologies, corpora

Vectorial methods (1) Advanced Natural Language L 1 norm, Manhattan distance, taxi-cab distance, Processing city-block distance N Similarity and � L 1 ( � x , � y ) = | x i − y i | Clustering Similarity i = 1 L 2 norm, Euclidean distance � N � � � L 2 ( � x , � y ) = | � x − � y | = ( x i − y i ) 2 � i = 1 Cosine distance � x i y i y ) = � x · � y i cos ( � x , � y | = � � � � | � x | · | � x 2 y 2 i · i i i

Vectorial methods (2) Advanced Natural Language Processing L 1 and L 2 norms are particular cases of Minkowsky measure Similarity and Clustering � N � 1 r Similarity � ( x i − y i ) r D minkowsky ( � x , � y ) = L r ( � x , � y ) = i = 1 Camberra distance N | x i − y i | � D camberra ( � x , � y ) = | x i + y i | i = 1 Chebychev distance D chebychev ( � x , � y ) = max | x i − y i | i

Set-oriented methods (3): Binary–valued vectors seen as sets Advanced Natural Language Processing Dice. S dice ( X , Y ) = 2 · | X ∩ Y | | X | + | Y | Similarity and Clustering Jaccard. S jaccard ( X , Y ) = | X ∩ Y | Similarity | X ∪ Y | | X ∩ Y | Overlap. S overlap ( X , Y ) = min ( | X | , | Y | ) | X ∩ Y | Cosine. cos ( X , Y ) = � | X | · | Y | Above similarities are in [ 0, 1 ] and can be used as distances simply substracting: D = 1 − S

Set-oriented methods (4): Agreement contingency table Advanced Object i Natural Language 1 0 Processing 1 a + b a b Object j Similarity and 0 c d c + d Clustering a + c b + d p Similarity 2 a Dice. S dice ( X , Y ) = 2 a + b + c a Jaccard. S jaccard ( X , Y ) = a + b + c a Overlap. S overlap ( X , Y ) = min ( a + b , a + c ) a Cosine. S overlap ( X , Y ) = � ( a + b )( a + c ) Matching coefficient. S mc ( i , j ) = a + d p

Distributional Similarity Advanced Natural Language Processing Particular case of vectorial representation where attributes are probability distributions Similarity and N Clustering x T = [ x 1 . . . x N ] such that ∀ i , 0 � x i � 1 and � � x i = 1 Similarity i = 1 Kullback-Leibler Divergence (Relative Entropy) q ( y ) log q ( y ) � D ( q || r ) = (non symmetrical) r ( y ) y ∈ Y Mutual Information h ( a , b ) � � I ( A , B ) = D ( h || f · g ) = h ( a , b ) log f ( a ) · g ( b ) a ∈ A b ∈ B (KL-divergence between joint and product distribution)

Semantic Similarity Advanced Natural Language Processing Project objects onto a semantic space: Similarity and Clustering D A ( x 1 , x 2 ) = D B ( f ( x 1 ) , f ( x 2 )) Similarity Semantic spaces: ontology (WordNet, CYC, SUMO, ...) or graph-like knowledge base (e.g. Wikipedia). Not easy to project words, since semantic space is composed of concepts, and a word may map to more than one concept. Not obvious how to compute distance in the semantic space.

WordNet Advanced Natural Language Processing Similarity and Clustering Similarity

Distances in WordNet Advanced Natural WordNet::Similarity Language Processing http://maraca.d.umn.edu/cgi-bin/similarity/similarity.cgi Similarity and Clustering Some definitions: Similarity SLP ( s 1 , s 2 ) = Shortest Path Length from concept s 1 to s 2 (Which subset of arcs are used? antonymy, gloss, . . . ) depth ( s ) = Depth of concept s in the ontology MaxDepth = max s ∈ WN depth ( s ) LCS ( s 1 , s 2 ) = Lowest Common Subsumer of s 1 and s 2 IC ( s ) = − log 1 P ( s ) = Information Content of s (given a corpus)

Distances in WordNet Advanced Shortest Path Length: D ( s 1 , s 2 ) = SLP ( s 1 , s 2 ) Natural Language SLP ( s 1 , s 2 ) Processing Leacock & Chodorow: D ( s 1 , s 2 ) = − log 2 · MaxDepth Similarity and Wu & Palmer: D ( s 1 , s 2 ) = 2 · depth ( LCS ( s 1 , s 2 )) Clustering Similarity depth ( s 1 ) + depth ( s 2 ) Resnik: D ( s 1 , s 2 ) = IC ( LCS ( s 1 , s 2 )) Jiang & Conrath: D ( s 1 , s 2 ) = IC ( s 1 ) + IC ( s 2 ) − 2 · IC ( LCS ( s 1 , s 2 )) Lin: D ( s 1 , s 2 ) = 2 · IC ( LCS ( s 1 , s 2 )) IC ( s 1 ) + IC ( s 2 ) Gloss overlap: Sum of squares of lengths of word overlaps between glosses Gloss vector: Cosine of second-order co-occurrence vectors of glosses

Distances in Wikipedia Advanced Natural Language Processing Similarity and Clustering Measures using links, including measures usend on Similarity WordNet, but applied to Wikipedia graph http://www.h-its.org/english/research/nlp/download/wikipediasimilarity.php Measures using content of articles (vector spaces) Measures using Wikipedia Categories

Advanced Natural Language Processing 1 Similarity and Clustering Similarity and Clustering Similarity Clustering Clustering Hierarchical Clustering Non-hierarchical Clustering Evaluation

Clustering Advanced Natural Language Processing Partition a set of objects into clusters. Objects: features and values Similarity and Clustering Similarity measure Clustering Utilities: Exploratory Data Analysis (EDA). Generalization ( learning ). Ex: on Monday , on Sunday , ? Friday Supervised vs unsupervised classification Object assignment to clusters Hard. one cluster per object. Soft. distribution P ( c i | x j ) . Degree of membership.

Clustering Advanced Natural Language Processing Produced structures Hierarchical (set of clusters + relationships) Similarity and Good for detailed data analysis Clustering Provides more information Clustering Less efficient No single best algorithm Flat / Non-hierarchical (set of clusters) Preferable if efficiency is required or large data sets K-means: Simple method, sufficient starting point. K-means assumes euclidean space, if is not the case, EM may be used. Cluster representative Centroid − → x ∈ c − → 1 µ = � x − → | c |

Dendogram Advanced Natural Language Processing Similarity and Clustering Hierarchical Clustering Single-link clustering of 22 frequent En- glish words represent- ed as a dendogram. be not he I it this the his a and but in on with for at from of to as is was

Hierarchical Clustering Advanced Natural Language Processing Similarity and Bottom-up (Agglomerative Clustering) Clustering Hierarchical Start with individual objects, iteratively group the most Clustering similar. Top-down (Divisive Clustering) Start with all the objects, iteratively divide them maximizing within-group similarity.

Agglomerative Clustering (Bottom-up) Advanced Natural Input: A set X = { x 1 , . . . , x n } of objects Language Processing A function sim: P ( X ) × P ( X ) − → R Output: A cluster hierarchy Similarity and Clustering Hierarchical for i :=1 to n do c i := { x i } end Clustering C := { c 1 , . . . , c n } ; j := n + 1 while C > 1 do ( c n 1 , c n 2 ) :=arg max ( c u , c v ) ∈ C × C sim ( c u , c v ) c j = c n 1 ∪ c n 2 C := C \ { c n 1 , c n 2 } ∪ { c j } j := j + 1 end–while

MIA - Master on Artificial Intelligence Advanced Natural Language - PowerPoint PPT Presentation

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural Language Processing Similarity and Clustering Advanced Natural Language Processing 1 Similarity and Clustering Similarity and Clustering

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Chief Executive, mia Jane Longhurst The benefits of mia membership & the marketing advantage

and 2016/17 Budget MIA Financial Audit MIA financial reports are audited annually by external

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Lecture Overview What is Artificial Intelligence? Agents acting in an environment

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

SWIRLING TURBULENT JET IMPINGEMENT Farhana Afroz and Muhammad A.R. Sharif Aerospace Engineering

Frequency Analysis of Magnetic Journal Bearing Instability for MMR Condition Dokyu Kim a ,

Sharing Good Practice - Onshore wind repowering 7 th December 2017 Summary and key messages

Bootstrap method and its application to the hypothesis testing in GPS mixed integer linear model

White Roofs to Cool your Building, your City and (this is new !) Cool the World. A US Progress

Experience in Asia Pacific Towns and Cities Pakistan Sri Lanka Bangladesh Cambodia Vietnam

Building Community Resilience Incorporating Hazard Mitigation, Climate and other Changing

7 Refinement Options November 3, 2016 Overview Recap the HS Boundary Refinement Process

MIA - Master on Artificial Intelligence Advanced Natural Language - PowerPoint PPT Presentation

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural Language Processing Similarity and Clustering Advanced Natural Language Processing 1 Similarity and Clustering Similarity and Clustering

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Chief Executive, mia Jane Longhurst The benefits of mia membership &amp; the marketing advantage

and 2016/17 Budget MIA Financial Audit MIA financial reports are audited annually by external

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Lecture Overview What is Artificial Intelligence? Agents acting in an environment

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

SWIRLING TURBULENT JET IMPINGEMENT Farhana Afroz and Muhammad A.R. Sharif Aerospace Engineering

Frequency Analysis of Magnetic Journal Bearing Instability for MMR Condition Dokyu Kim a ,

Sharing Good Practice - Onshore wind repowering 7 th December 2017 Summary and key messages

Bootstrap method and its application to the hypothesis testing in GPS mixed integer linear model

White Roofs to Cool your Building, your City and (this is new !) Cool the World. A US Progress

Experience in Asia Pacific Towns and Cities Pakistan Sri Lanka Bangladesh Cambodia Vietnam

Building Community Resilience Incorporating Hazard Mitigation, Climate and other Changing

7 Refinement Options November 3, 2016 Overview Recap the HS Boundary Refinement Process

Chief Executive, mia Jane Longhurst The benefits of mia membership & the marketing advantage