pagerank related methods for analyzing citation networks
play

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: - PowerPoint PPT Presentation

PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan Presenter: Erjia Yan Boazii University, Istanbul ISSI, June 29 Objectives understandings of PageRank applications of PageRank in


  1. PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan Presenter: Erjia Yan Boğaziçi University, Istanbul ISSI, June 29

  2. • Objectives – understandings of PageRank – applications of PageRank in informetric research – tutorial: extracting journal citation networks through bibliographic data – tutorial: computing PageRank for journals in journal citation networks using Sci2 and MATLAB Objectives | 2

  3. NON-RECURSIVE RECURSIVE • journal impact factor • PageRank and its variants – AuthorRank (Liu et al., 2005) • h -index – Y-factor (Bollen et al., 2006) • accumulative number of – CiteRank (Walker et al., 2007) citations – FutureRank (Sayyadi & • accumulative number of Getoor, 2009) publications – Eigenfactor (Bergstrom & • … West, 2008) – SCImago (SCImago, 2007) – weighted PageRank (Ding, 2011; Yan & Ding, 2011) – … A comparison | 3

  4. NON-RECURSIVE RECURSIVE A comparison | 4

  5. • Observations – non-recursive methods take into account only the local structure of a citation network; thus, a citation originating from Nature or Science has the same weight as a citation originating from some obscure journals • Motivations – using recursive methods to take into account the global structure of a citation network such that citations originating from highly cited nodes are given more weight than those originating from lowly cited nodes Observations and motivations | 5

  6. • Basics of PageRank – the concept was first proposed by Pinski and Narin in 1976 (influence weight); PageRank was introduced as a method for ranking web pages by Brin and Page in 1998 • Formulation = ∑ p 1 α + − α j p ( 1 ) i m n ∈ j B j i – where α denotes the damping factor parameter, B i denotes the set of all web pages that link to web page i , m j denotes the number of web pages to which web page j links, and n denotes the total number of web pages to be ranked. Basics of PageRank | 6

  7. • In other words… – the larger the number of web pages that link to web page i , the higher the PageRank value of web page i – the higher the PageRank values of the web pages that link to web page i , the higher the PageRank value of web page i – for those web pages that link to web page i , the smaller the number of other web pages to which these web pages link, the higher the PageRank value of web page i – the closer the damping factor parameter α is set to 1, the stronger the above effects PageRank meanings | 7

  8. • On the damping factor – 1: PageRank won’t converge – just below 1 (e.g., 0.9999): extremely sensitive to small changes in the network of links – 0.5: according to Chen et al. (2007), 0.5 is preferred for citation networks based on the assumption that authors on average will browse as far as two degrees of references (references and references’ cited references, thus 1-1/2=0.5) – 0.85: the default (coincide the “six degrees of separation”: 1-1/6 ≈ 0.85) Damping factor | 8

  9. • Applications – Analyzing journal citation networks • Y-factor; Eigenfactor; SCImago Journal Rank (SJR) – Analyzing author citation networks • SARA (science author rank algorithm) – Analyzing document citation networks • CiteRank Applications | 9

  10. TUTORIALS Tutorials | 10

  11. • Tools we need – Sci2: https://sci2.cns.iu.edu/user/index.php – Sci2 plugins: http://wiki.cns.iu.edu/display/SCI2TUTORIAL/3.2 +Additional+Plugins – MATLAB or Octave: http://www.gnu.org/software/octave/ • Data materials – http://www.pages.drexel.edu/~ey86/p/tutorial/ Tools and materials | 11

  12. Steps 1-5 | 12

  13. • Step 6: merge individually downloaded files – on Windows systems, a command such as copy *.txt merged_data.txt can be entered in the Command Prompt tool – in the resulting file, make sure to remove all lines ‘FN Thomson Reuters Web of Knowledge VR 1.0’ except for the first one and all lines ‘EF’ except for the last one • Step 7: change file extension – change the extension of the text file that contains your bibliographic data from .txt into .isi. Steps 6-7 | 13

  14. Steps 8-9 | 14

  15. Steps 10-12 | 15

  16. Step 13 | 16

  17. Steps 14-19 | 17

  18. function p = calc_PageRank(C, alpha, n_iterations) % Take care of dangling nodes. m = sum(C, 2); C(m == 0, :) = 1; % Create a row-normalized matrix. n = length(C); m = sum(C, 2); C = spdiags(1 ./ m, 0, n, n) * C; % Apply the power method. p = repmat(1 / n, [1 n]); for i = 1:n_iterations p = alpha * p * C + (1 - alpha) / n; end Step 19 | 18

  19. The resulted PageRank scores for the journals Steps 20-21 | 19

  20. • Author and document citation networks and PageRank calculations can be obtained through extracting proper networks in Sci2 Other citation network types | 20

  21. • Questions? • Any further questions can be directed to: – Erjia Yan ey86@drexel.edu or – Ludo Waltman waltmanlr@cwts.leidenuniv.nl Thank you | 21

Recommend


More recommend