Outline Introduction Data Collection Within-Area Analysis Network-wide Metrics Conclusions Structure and Dynamics of Research Collaboration in Computer Science C.Bird E.Barr A.Nash P.Devanbu V.Filkov Z.Su presented by Elina Weinbrand 2014-03-26
Outline Introduction Data Collection Within-Area Analysis Network-wide Metrics Conclusions 1 Outline 2 Introduction Motivation Related work 3 Data Collection 4 Within-Area Analysis Degree Distribution Assortativity Longitudinal Assortativity Betweenness Centralization Community Structure 5 Network-wide Metrics Area Overlap Migration 6 Conclusions
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Outline 1 Outline 2 Introduction Motivation Related work 3 Data Collection 4 Within-Area Analysis Degree Distribution Assortativity Longitudinal Assortativity Betweenness Centralization Community Structure 5 Network-wide Metrics Area Overlap Migration 6 Conclusions
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Motivation Computer science is a diverse and growing area of scholarly activity, with many sub-areas Artificial Intelligence (AI) Computational biology (CBIO) Cryptography (CRYPTO) DataBases (DB) Graphics (GRAPH) Programming Languages (PL) Software Engineering (SE) Security (SEC) Theory (THEORY)
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Motivation There are many differences between the research areas Old (e.g., THEORY) vs. Newer (e.g.,GRAPH)
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Motivation There are many differences between the research areas Old (e.g., THEORY) vs. Newer (e.g.,GRAPH) Large number of researchers (e.g., DB and GRAPH) vs. smaller (e.g., CRYPTO and SE)
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Motivation There are many differences between the research areas Old (e.g., THEORY) vs. Newer (e.g.,GRAPH) Large number of researchers (e.g., DB and GRAPH) vs. smaller (e.g., CRYPTO and SE) Stable phase (e.g.,THEORY) vs. growing rapidly (e.g., SEC)
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Motivation There are other, informal, folkloric more subtle differences in character and style between areas Intellectually unified vs. several distinct, thriving groups
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Motivation There are other, informal, folkloric more subtle differences in character and style between areas Intellectually unified vs. several distinct, thriving groups Interact strongly with others vs. more stand-alone
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Motivation There are other, informal, folkloric more subtle differences in character and style between areas Intellectually unified vs. several distinct, thriving groups Interact strongly with others vs. more stand-alone Dominated by a few researchers vs. more diffuse collaborative structure
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Motivation There are other, informal, folkloric more subtle differences in character and style between areas Intellectually unified vs. several distinct, thriving groups Interact strongly with others vs. more stand-alone Dominated by a few researchers vs. more diffuse collaborative structure Older and younger researchers collaboration vs. researchers collaborate primarily with others like them
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Motivation This paper begins to quantify and study these informal, folkloric differences to produce data that may provide ”actionable intelligence” for interested parties such as researchers and funding agencies.
Outline Introduction Data Collection Within-Area Analysis Motivation Related work Network-wide Metrics Conclusions Related work Collaboration over time: characterizing and modeling network evolution. Huang et al 2008 Group formation in large social networks: membership, growth, and evolution. Backstrom et al 2006 Community structure in social and biological networks. M. Girvan and M. Newman. 2002
Outline Introduction Data Collection Within-Area Analysis Network-wide Metrics Conclusions Outline 1 Outline 2 Introduction Motivation Related work 3 Data Collection 4 Within-Area Analysis Degree Distribution Assortativity Longitudinal Assortativity Betweenness Centralization Community Structure 5 Network-wide Metrics Area Overlap Migration 6 Conclusions
Outline Introduction Data Collection Within-Area Analysis Network-wide Metrics Conclusions DBLP Initially, the service provided by dblp was started at the d ata b ase systems and l ogic p rogramming (dblp) research group at the University of Trier, Germany, and focused on publications from this field of research. Through the years, dblp gradually expanded toward all fields of computer science, while the acronym survived. At times, the label ”Digital Bibliography & Library Project” has been adopted as a backronym for dblp.
Outline Introduction Data Collection Within-Area Analysis Network-wide Metrics Conclusions DBLP DBLP is a publicly available bibliographic data source DBLP is maintained via massive human effort with special attention paid to issues such as author name consistency DBLP data is publicly available in XML form which is easily parsed and can be found at http://dblp.uni-trier.de/xml/
Outline Introduction Data Collection Within-Area Analysis Network-wide Metrics Conclusions Data Collection Steps 1 Define researchers list; Solve the name consistency problem
Outline Introduction Data Collection Within-Area Analysis Network-wide Metrics Conclusions Data Collection Steps 1 Define researchers list; Solve the name consistency problem 2 Define the research areas in computer science as sets of first tier conferences and assign papers to conferences
Outline Introduction Data Collection Within-Area Analysis Network-wide Metrics Conclusions Data Collection Steps 1 Define researchers list; Solve the name consistency problem 2 Define the research areas in computer science as sets of first tier conferences and assign papers to conferences 3 Create collaboration graphs
Outline Introduction Data Collection Within-Area Analysis Network-wide Metrics Conclusions Collaboration graphs Definition Let C ( p ) represent some predicate or constraint on papers that identifies only those publications that we are interested in. Let P be the set of all papers, A be the set of all authors, and let W ( a, p ) be a predicate that is true if and only if author a is an author, or writer, of paper p . We then create the graph G = ( V, E ) as follows: V = { a : a ∈ A, p ∈ P, C ( p ) ∧ W ( a, p ) } E = { ( a, b ) : a, b ∈ V, p ∈ P, C ( p ) ∧ W ( a, p ) ∧ W ( b, p ) }
Outline Introduction Data Collection Within-Area Analysis Degree Distribution Network-wide Metrics Assortativity Longitudinal Conclusions Outline 1 Outline 2 Introduction Motivation Related work 3 Data Collection 4 Within-Area Analysis Degree Distribution Assortativity Longitudinal Assortativity Betweenness Centralization Community Structure 5 Network-wide Metrics Area Overlap Migration 6 Conclusions
Outline Introduction Data Collection Within-Area Analysis Degree Distribution Network-wide Metrics Assortativity Longitudinal Conclusions Degree Distribution The degree distributions of the sub-areas are almost identical, save for a scaling factor, and thus do not make good discriminators
Outline Introduction Data Collection Within-Area Analysis Degree Distribution Network-wide Metrics Assortativity Longitudinal Conclusions Assortativity Assortative mixing in networks is the tendency of vertices to be connected to like vertices Definition Define a set of properties over a graph’s vertices; Label each vertex with its value for each property. Let e xy be the fraction of all edges in the graph that start at a vertex labelled x and end at a vertex labelled y ; e is known as the mixing matrix. Let a x be the fraction of all edges in the graph incident to a vertex labelled x
Outline Introduction Data Collection Within-Area Analysis Degree Distribution Network-wide Metrics Assortativity Longitudinal Conclusions Assortativity Definition - cont. Assortativity is the Pearson correlation coefficient of the property values of any two vertices connected by an edge: Σ xy xy ( e xy − a x a y ) σ 2 a
Outline Introduction Data Collection Within-Area Analysis Degree Distribution Network-wide Metrics Assortativity Longitudinal Conclusions Assortativity The assortativity ranges from 1, which indicates that all vertices are connected only to vertices that have similar values for that property, to -1, which indicates a perfect negative correlation in the values of connected vertices.
Recommend
More recommend