Based on the Number of Queries Kenta Iwasaki, Kazuyuki Shudo Tokyo - PowerPoint PPT Presentation

東京工業大学岩﨑謙汰, 首藤一幸 IEEE SocialCom 2018 December 2018 Comparing Graph Sampling Methods Based on the Number of Queries Kenta Iwasaki, Kazuyuki Shudo Tokyo Institute of Technology Tokyo Tech

1 / 10 Graph sampling ⊃ Crawling ⊃ Random walk • They enable estimation of nodal and topological properties of online social networks (OSNs) – Effective because the entire network is not available. – Properties: Degree distribution, clustering coefficient, … – Note: Crawling (e.g. random walk) is possible but uniform sampling is not. A query with Crawling on OSN Node ID Sample node list Neighbor (friend) list [1, 2, 4, 2, 7, …] • Query can be the bottleneck of the sampling performance due to – API limits – Communication latency is much larger than computation.

2 / 10 Contribution: Query number standard • Problem – Sample size has been the standard to evaluate graph sampling techniques. Standards in studies Length of [Rasti 2009] sample node list [Riberio 2010] (walk length) [Lee 2012] Length of Fig. 4 in [Lee 2012] [Hardiman 2013] sample node list ??? # of samples Number of [Gjoka 2011] • Contribution sample nodes – Query number based comparison shows different relative merits for sampling and estimation techniques. – It reflects graph accessing cost better.

3 / 10 Graph sampling techniques • Random walk ‐ based techniques are effective for property estimation for OSNs – They enable unbiased sampling with Markov chain analysis. • Our targets – SRW ‐ rw : Simple random walk w/ re ‐ weighting – NBRW ‐ rw : Non ‐ backtracking random walk w/ re ‐ weighting – MHRW : Metropolis ‐ Hastings random walk 1 1 1 4 4 1/2 4 1/2 3 1/3 3 3 1/2 1/3 x 2 1/3 = 1/degree 0 2 2 1/3 Previous node 1/6 SRW: MHRW: NBRW: Simple Non ‐ backtracking Metropolis ‐ Hastings random walk random walk random walk

4 / 10 Sample size vs. query number • Very different Sample size (length of sample node list) by 10,000 queries Simple Non ‐ backtracking Metropolis ‐ Hastings Graphs are in Stanford Large Network Dataset Collection • Rationale: MHRW can stay the same node and the length of sample node list grows without a query. • Note that not only the sample size determines estimation efficiency. E.g. NBRW reaches various nodes and it is better with Counting Triangles [Iwasaki 2018].

5 / 10 Query issuing timings 1. For random walk – When getting neighbor (friend) list of the next hop  2. For property estimation – Depends on each estimation technique – E.g. When getting neighbor (friend) list of multiple neighbor nodes  of a node to calculate clustering coefficient of the node naively. 1 4 3 Target 2 It is necessary to know how the neighbor nodes connected each other to calculate cluster coefficient.

6 / 10 Experiments with sample size and query number standards • Clustering coefficient estimated • Estimation efficiency (precision / cost) compared on 1. Estimation techniques: Naïve method vs. Counting Triangles [Hardiman 2013] Counting Triangle does not require additional queries for property estimation. 2. Sampling (random walk) techniques: SRW vs. NBRW vs. MHRW Graph # of nodes Average degree Average Clust. Coeff. Amazon 334,863 5.530 0.3967 DBLP 317,080 6.622 0.6324 Gowalla 196,591 9.668 0.2367 in Stanford Large Network Dataset Collection

7 / 10 Naïve method vs. Counting Triangles [Hardiman 2013] • Sampling with simple random walk ( SRW ) • Relative merits are reversed. – The similar results shown with the other networks. Better Reversed Query number Sample size

8 / 10 SRW vs. NBRW vs. MHRW • Estimating with Counting Triangles • Margins are much narrowed. Better Narrow Sample size Query number • Note: Our contribution includes Counting Triangles with MHRW.

9 / 10 SRW vs. NBRW vs. MHRW • Estimating with Counting Triangles • Relative merits are reversed for DBLP graph. Better Reversed Sample size Query number

10 / 10 Summary • Query number standard Cf. sample size standard – for comparing graph sampling techniques – for comparing property estimation techniques – It reflects graph accessing cost better. •Accessing online social networks •Accessing a graph on storage and memory • The two standards showed different relative merits for techniques. Tokyo Tech

Based on the Number of Queries Kenta Iwasaki, Kazuyuki Shudo Tokyo - PowerPoint PPT Presentation

, IEEE SocialCom 2018 December 2018 Comparing Graph Sampling Methods Based on the Number of Queries Kenta Iwasaki, Kazuyuki Shudo Tokyo Institute of Technology Tokyo Tech 1 / 10 Graph sampling

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G.

Computational Geometry Lecture 14: Windowing queries Computational Geometry Lecture 14:

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

New Requirements Top-N/Bottom-N queries Interactive queries Decision making

Computational Geometry Lecture 15: Windowing queries Computational Geometry Lecture 15:

12 Tips for giving an Effective Presentation Louise Lehane, UoL, Ireland Tip Number One Tip

What is a prime number? What is a prime number? What is a prime number? What is a prime number?

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

How to run SQL queries on TBs of data using GPUs Jake Wheat Lead Architect, SQream Technologies

Bayesian Networks Machine Learning 10-601B Seyoung Kim Many

Role-Based Architecture & Network Pointers Jona Schoch Seminar in Distributed Computing Oct

Will This Paper Increase Your h -index? Scientific Impact Prediction Yuxiao Dong, Reid A.

Network impact of Web access to device APIs W3C Workshop on Security for Access to Device APIs

discrimination via semantic segmentation Andy Chappell 11/12/2019 DUNE UK Meeting 2 Roadmap

5G, Preparing the Future Network Society 19 January 2017 Insert Confidentiality Level in slide

Network Science Barab asi: Ch. 1 Introduction Joao Meidanis University of Campinas,

COVID-19 44 th in a series of weekly calls, initiated in January by CDC as a forum for

Based on the Number of Queries Kenta Iwasaki, Kazuyuki Shudo Tokyo - PowerPoint PPT Presentation

, IEEE SocialCom 2018 December 2018 Comparing Graph Sampling Methods Based on the Number of Queries Kenta Iwasaki, Kazuyuki Shudo Tokyo Institute of Technology Tokyo Tech 1 / 10 Graph sampling

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

Geometric Algorithms Range &amp; windowing queries (2 lectures) Database queries 2/180 G.

Computational Geometry Lecture 14: Windowing queries Computational Geometry Lecture 14:

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

New Requirements Top-N/Bottom-N queries Interactive queries Decision making

Computational Geometry Lecture 15: Windowing queries Computational Geometry Lecture 15:

12 Tips for giving an Effective Presentation Louise Lehane, UoL, Ireland Tip Number One Tip

What is a prime number? What is a prime number? What is a prime number? What is a prime number?

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

How to run SQL queries on TBs of data using GPUs Jake Wheat Lead Architect, SQream Technologies

Bayesian Networks Machine Learning 10-601B Seyoung Kim Many

Role-Based Architecture &amp; Network Pointers Jona Schoch Seminar in Distributed Computing Oct

Will This Paper Increase Your h -index? Scientific Impact Prediction Yuxiao Dong, Reid A.

Network impact of Web access to device APIs W3C Workshop on Security for Access to Device APIs

discrimination via semantic segmentation Andy Chappell 11/12/2019 DUNE UK Meeting 2 Roadmap

5G, Preparing the Future Network Society 19 January 2017 Insert Confidentiality Level in slide

Network Science Barab asi: Ch. 1 Introduction Joao Meidanis University of Campinas,

COVID-19 44 th in a series of weekly calls, initiated in January by CDC as a forum for

Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G.

Role-Based Architecture & Network Pointers Jona Schoch Seminar in Distributed Computing Oct