Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor - PowerPoint PPT Presentation

Nearest Neighbor and Locality-Sensitive Hashing • Nearest Neighbor • Set Similarity • Locality-Sensitive Hashing • Document Similarity Philip Bille

Nearest Neighbor and Locality-Sensitive Hashing • Nearest Neighbor • Set Similarity • Locality-Sensitive Hashing • Document Similarity

Nearest Neighbor • Nearest Neighbor. • Preprocess a collection of high-dimensional vectors V = V 1 , V 2 , ..., V n to support • NN(S): return all S i ∈ S such that sim(S, S i ) ≥ threshold t • Applications. • Classification • Search • Find similar items • Recommendation systems • ....

Nearest Neighbor • Nearest Neighbor (Set version). • Preprocess a collection of sets S = S 1 , S 2 , ..., S n to support • NN(S): return all S i ∈ S such that sim(S, S i ) ≥ t

Jaccard Similarity T S J ( S , T ) = | S ∩ T | | S ∪ T |

Minhashing • Pick a hash function f that maps elements to distinct integers. • minhash h(S) = min hash on elements in S. T S 2 6 9 3 10 8 1 4 Pr[ h ( S ) = h ( T )] = | S ∩ T | | S ∪ T | = J ( S , T )

Set Signatures • Set signature. • Pick k hash functions f 1 ,f 2 ,...,f k independently • ⇒ k minhashes h 1 , h 2 ,..., h k • sig(S) = [h 1 (S), h 2 (S), ..., h k (S)] • Jaccard similarity estimation. • J(S,T) ≈ (#equal pairs in sig(S) and sig(T)) / k

Nearest Neighbor • Data structure. S 1 S 2 S n • Signaturematrix M h 1 h 1 (S 1 ) h 1 (S 2 ) ... h 1 (S n ) h 2 h 2 (S 1 ) h 2 (S 2 ) h 2 (S n ) ... h k • NN(S): • Compute sig(S). • Compare sig(S) with sig(S 1 ),...,sig(S k ) using Jaccard estimation. Return all sets with similarity estimation ≥ t.

Locality-Sensitive Hashing • Idea. • Filter all but a few candidates. • Check candidates using set signature similarity estimation. • (Optionally compute exact Jaccard similarity for candidates). • Goal. • Balance false positives and false negatives • false positives = sets with similarity < t that become candidates • false negatives = sets with similarity > t that do not become candidates.

Locality-Sensitive Hashing M r rows b = 5 • Banding. • Partition signature matrix M into b bands of r rows. • Store a dictionary for each band.

Locality-Sensitive Hashing S M r rows b = 5 • NN(S): • Construct sig(S) • Partition sig(S) into bands and lookup in corresponding dictionary. • Make S i a candidate if it matches on some band with S.

Locality-Sensitive Hashing • Analysis of banding. Suppose S and S i have similarity s. What is probability that S i becomes a candidate? • Probability identical on 1 row = s • Probability identical on 1 band = s r • Probability at least 1 row in a band is not identical = 1 - s r • Probability no band is identical = (1-s r ) b • Probability at least 1 band is identical = 1 - (1-s r ) b S M r rows b = 5

Locality-Sensitive Hashing b = 20, r = 5, n = br = 100 • Choosing b and r. • Threshold: similarity where probability of becoming a candidate is > 1/2 • Threshold ≈ (1/b) 1/r

Documents as Sets • Shingles. • "I used to think I was indecisive, but now I'm not too sure." • ["I", "used", "to"], ["used", "to", "think"], ["think", "I", "was"] • Document = set of shingles.

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor - PowerPoint PPT Presentation

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity Locality-Sensitive Hashing Document Similarity Philip Bille Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Locality-Sensitive Hashing LSH Fingerprints References Anil Maheshwari School of Computer

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Locality-Sensitive Hashing Documents LSH Metric Spaces Sensitive Function Anil Maheshwari

Near Neighbor Search in High Dimensional Data (2) Locality-Sensitive Hashing (continued) LS

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Advances in QCD Theory Christopher Lee Los Alamos National Laboratory Theoretical Division

Placement report of USI graduates 2014 Survey Results Survey Methodology The 11 th Survey on

Programs in Higher Education Alyson Reed Director, Linguistic Society of America Roundtable

Education in Washington State for the 21 st century economy Ed Lazowska Bill & Melinda Gates

Hertfordshires approach to Top -up Funding High Needs Funding (HNF) in Mainstream Schools

Targeted Charging Review Update The webinar will begin shortly Targeted Charging Review Update

Variability Panel Tom Spyrou TAU 2014 3/2014 Who is responsible for Library Quality n This

MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE vocoder MBE Parameters

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor - PowerPoint PPT Presentation

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity Locality-Sensitive Hashing Document Similarity Philip Bille Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Locality-Sensitive Hashing LSH Fingerprints References Anil Maheshwari School of Computer

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Locality-Sensitive Hashing Documents LSH Metric Spaces Sensitive Function Anil Maheshwari

Near Neighbor Search in High Dimensional Data (2) Locality-Sensitive Hashing (continued) LS

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Advances in QCD Theory Christopher Lee Los Alamos National Laboratory Theoretical Division

Placement report of USI graduates 2014 Survey Results Survey Methodology The 11 th Survey on

Programs in Higher Education Alyson Reed Director, Linguistic Society of America Roundtable

Education in Washington State for the 21 st century economy Ed Lazowska Bill &amp; Melinda Gates

Hertfordshires approach to Top -up Funding High Needs Funding (HNF) in Mainstream Schools

Targeted Charging Review Update The webinar will begin shortly Targeted Charging Review Update

Variability Panel Tom Spyrou TAU 2014 3/2014 Who is responsible for Library Quality n This

MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE vocoder MBE Parameters

Education in Washington State for the 21 st century economy Ed Lazowska Bill & Melinda Gates