Machine Learning for Efficient Neighbor Selection in Unstructured - PowerPoint PPT Presentation

Efficient Neighbor Selection Methodology Results Summary Machine Learning for Efficient Neighbor Selection in Unstructured P2P Networks Robert Beverly 1 Mike Afergan 2 1 MIT CSAIL rbeverly@csail.mit.edu 2 Akamai/MIT afergan@alum.mit.edu USENIX SysML, 2007 Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Methodology Problem Overview Results Neighbor Selection and Self-Reorganization Summary Outline Efficient Neighbor Selection 1 Problem Overview Neighbor Selection and Self-Reorganization Methodology 2 Datasets Representing the Dataset Learning Task Results 3 Training Points Prediction Results Discussion Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Methodology Problem Overview Results Neighbor Selection and Self-Reorganization Summary Efficient Neighbor Selection in unstructured P2P networks Problem Domain Unstructured P2P overlays, e.g. Kazaa, Gnutella, etc. Problem Self-reorganization in unstructured P2P overlays promises better performance, scalability and resilience But cost of reorganization may be greater than benefit ! Neighbor Selection Problem Choose neighbors efficiently → with few queries Choose neighbors effectively → with high success Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Methodology Problem Overview Results Neighbor Selection and Self-Reorganization Summary Efficient Neighbor Selection in unstructured P2P networks Our Approach Support Vector Machines (SVMs) and feature selection for classification Simulate algorithm using live P2P datasets Results Predict “good” neighbors with over 90% accuracy using minimal knowledge of the neighbor’s files or type Find neighbors capable of answering future queries Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Methodology Problem Overview Results Neighbor Selection and Self-Reorganization Summary Unstructured P2P Networks Simple, popular and widely used e.g. Gnutella estimated at ≃ 3.5M nodes Typically used for file sharing Overlay Structure: Organic; nodes interconnect with minimal constraints Nodes are dynamic Queries: Flooded through overlay Peers answer Initiate peer-to-peer download Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Methodology Problem Overview Results Neighbor Selection and Self-Reorganization Summary Outline Efficient Neighbor Selection 1 Problem Overview Neighbor Selection and Self-Reorganization Methodology 2 Datasets Representing the Dataset Learning Task Results 3 Training Points Prediction Results Discussion Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Methodology Problem Overview Results Neighbor Selection and Self-Reorganization Summary Self-Reorganization Because node connections are unconstrained, previous research suggests self-reorganization Improved query recall, efficiency, speed, scalability, resilience, trust, etc. Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Methodology Problem Overview Results Neighbor Selection and Self-Reorganization Summary Reorganization Paradox But, how can a node determine in real-time whether or not to attach to another node? F 3 F N 2 F 1 Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Methodology Problem Overview Results Neighbor Selection and Self-Reorganization Summary Reorganization Paradox How can a node determine in real-time whether or not to attach to another node? Reorganization presents a paradox: only way to learn about another node is to issue queries, but issuing queries reduces the benefit of reorganization. Our insight: use machine learning classification plus feature selection Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Datasets Methodology Representing the Dataset Results Learning Task Summary Outline Efficient Neighbor Selection 1 Problem Overview Neighbor Selection and Self-Reorganization Methodology 2 Datasets Representing the Dataset Learning Task Results 3 Training Points Prediction Results Discussion Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Datasets Methodology Representing the Dataset Results Learning Task Summary Live P2P Datasets Want to evaluate potential algorithms on real data Used two Gnutella datasets DataSet Nodes Contains Beverly, et al. 1,500 Queries, Files, Timestamps Goh, et al. 4,500 Queries, Files, Timestamps Both captured with a promiscuous UltraPeer Similar results from both datasets Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Datasets Methodology Representing the Dataset Results Learning Task Summary Outline Efficient Neighbor Selection 1 Problem Overview Neighbor Selection and Self-Reorganization Methodology 2 Datasets Representing the Dataset Learning Task Results 3 Training Points Prediction Results Discussion Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Datasets Methodology Representing the Dataset Results Learning Task Summary Data Preprocessing Nodes hold and advertise files, ex: "Red Hot Chili Peppers - Californication.mp3" Nodes issue queries, ex: "remember madonna i’ll" @ 1051761774 Remove: non-alphanumerics, stop-words, single chars Per the Gnutella protocol, we tokenize queries and file name on remaining white space: f i , q i Let N be the set of all nodes and n = | N | . Represent all unique tokens and files as Q = � q i and F = � f i Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Datasets Methodology Representing the Dataset Results Learning Task Summary Hypothetical Oracle Dataset includes all files and queries for every node We employ an oracle model in order to measure prediction accuracy For every potential connection compute utility u i ( j ) This work defines u i ( j ) simply as the number of queries from i matched by j Form an n -x- n adjacency matrix Y where Y i , j = sign ( u i (( j )) Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Datasets Methodology Representing the Dataset Results Learning Task Summary Hypothetical Oracle Node j Token Index x 1 x x x 2 3 k 0 1 0 0 . . . . . . Node i Node i . . . . . . y = sign(u (j)) i,j i (a) Adjacency Matrix (b) File Store Matrix Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Datasets Methodology Representing the Dataset Results Learning Task Summary Hypothetical Oracle Using all file store tokens, F , we assign each token a unique index where | F | = k . Form an n -x- k file store matrix X where X i , j = 1 ⇐ ⇒ F j ∈ f i Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Efficient Neighbor Selection Datasets Methodology Representing the Dataset Results Learning Task Summary Hypothetical Oracle Node j Token Index x 1 x x x 2 3 k 0 1 0 0 . . . . . . Node i Node i . . . . . . y = sign(u (j)) i,j i (a) Adjacency Matrix (b) File Store Matrix Robert Beverly, Mike Afergan Efficient Neighbor Selection in P2P Networks

Machine Learning for Efficient Neighbor Selection in Unstructured - PowerPoint PPT Presentation

Efficient Neighbor Selection Methodology Results Summary Machine Learning for Efficient Neighbor Selection in Unstructured P2P Networks Robert Beverly 1 Mike Afergan 2 1 MIT CSAIL rbeverly@csail.mit.edu 2 Akamai/MIT afergan@alum.mit.edu

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Scaling IPv6 Neighbor Discovery Ben Mack-Crane ( ben.mackcrane@huawei.com ) Overview of Neighbor

On Optimal Neighbor Discovery Philipp H. Kindt philipp.kindt@tum.de SIGCOMM19, Beijing CH

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

KARMA KARMA P2P lending platform Investor presentation PREMISES Investment Imbalance

PEER-TO-PEER LENDING AND THE FUTURE OF CO-OPERATION FINANCIAL SYSTEM IN 2025 Why Crowdfunding

Center Pointe Bay Islands Entertainment Downtown District Core Beach Corridor Transit Policy

Presentation All Content is Company Confidential and Exclusive Property of Genuine Parts Company.

Adding Unusual Transports to The Serval Project Alexandros Tsiridis & Joseph Hill

P2P Global Investments plc Specialist Lending Credit Managed by Pollen Street Capital Limited

VCREDIT Holdings Limited Management Presentation September 2018 Disclaimer THIS DOCUMENT OR THE

EfficientSub StreamEncodingand StreamEncodingand EfficientSub

Machine Learning for Efficient Neighbor Selection in Unstructured - PowerPoint PPT Presentation

Efficient Neighbor Selection Methodology Results Summary Machine Learning for Efficient Neighbor Selection in Unstructured P2P Networks Robert Beverly 1 Mike Afergan 2 1 MIT CSAIL rbeverly@csail.mit.edu 2 Akamai/MIT afergan@alum.mit.edu

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Scaling IPv6 Neighbor Discovery Ben Mack-Crane ( ben.mackcrane@huawei.com ) Overview of Neighbor

On Optimal Neighbor Discovery Philipp H. Kindt philipp.kindt@tum.de SIGCOMM19, Beijing CH

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

KARMA KARMA P2P lending platform Investor presentation PREMISES Investment Imbalance

PEER-TO-PEER LENDING AND THE FUTURE OF CO-OPERATION FINANCIAL SYSTEM IN 2025 Why Crowdfunding

Center Pointe Bay Islands Entertainment Downtown District Core Beach Corridor Transit Policy

Presentation All Content is Company Confidential and Exclusive Property of Genuine Parts Company.

Adding Unusual Transports to The Serval Project Alexandros Tsiridis &amp; Joseph Hill

P2P Global Investments plc Specialist Lending Credit Managed by Pollen Street Capital Limited

VCREDIT Holdings Limited Management Presentation September 2018 Disclaimer THIS DOCUMENT OR THE

EfficientSub StreamEncodingand StreamEncodingand EfficientSub

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Adding Unusual Transports to The Serval Project Alexandros Tsiridis & Joseph Hill