modeling co authorship and citation networks analytical
play

Modeling co-authorship and citation networks. Analytical models: - PowerPoint PPT Presentation

Modeling co-authorship and citation networks. Analytical models: Other models: 1) Continuum models 1) de Solla Price's modification of the Simon 2) Master Equation method model. 3) Rate Equation method.


  1. Modeling co-authorship and citation networks. Analytical models: Other models: 1) Continuum models 1) de Solla Price's modification of the Simon 2) Master Equation method model. 3) Rate Equation method. 2) TARL Model 4) Generating Function. 3) Group based Yule model

  2. Why are there so many models ? Different models emphasize different aspects of citation networks. No single model is able to reproduce citation patterns of different datasets. Parameters have to be tuned to fit different datasets

  3. Citation properties that we want to model:: Papers with more citations tend to be cited more After a certain period of time the citations of papers drop. Papers are “rediscovered” after lying dormant for a certain period of time. First is modeled by preferential attachment Second has to be modeled by including an “aging” bias in the system.

  4. Continuum approach (Barabasi-Albert) Linear Preferential Attachment (B – A) Exponential Power Aging decay Accelerating growth law decay (Zhu et al.; (B-A) + link length (Zhu et al P.Sen) Applied to citation dependance Sen et al) networks by P.Sen (M-M-S) Multiplicative node fitness (B-B)

  5. Master-Equation method (Dorogovtsev et al.) Initial attractiveness+preferential attachment (D-M-S) Aging included D-M. Disappearance of scale - Edge Inheritance (D-M-S) free structure at particular values. (Time function taken to be power law) Generalized by P.S and K.B.H which re-instates the scale free structure

  6. TARL model. (Topic, Aging and Recursive Linking) An evolving bipartite network: Vertices : authors and articles Edges : Undirected (author -author) Directed (author – article) Directed (article - article) Assumptions Assumptions Each paper has a fixed number of authors and a fixed number of references Each author and each paper has exactly one topic Consumed – produced relationships among papers and authors are restricted to authors and papers within the same topic A single fixed number od papers per author per year is assumed

  7. TARL Model The modeling process: The modeling process: A set of authors and a set of papers with randomly assigned topics are generated A predefined number of coauthors sharing the same topic is randomly selected and assigned to each paper via ‘produced by’ links ● All papers have authors but there are authors without papers ● Initially no coauthor or paper citation links, making it advantageous to start the model 1 year earlier than the period of interest

  8. TARL Model The modeling process (cont.) The modeling process (cont.) At each time step (a year) a specified number of authors is created and added to the set of existing authors Each author in the new set randomly identifies a set of coauthors, reads a specified number of randomly selected papers from within his/her topic, and produces a specified number of new papers Each new paper will cite a fixed number of existing papers. To select the papers cited, authors consume(read) a small set of papers because of time constraint

  9. TARL Model T he probability of citing a paper written t years ago was fit by a Weibull distribution of the form b controls the rightward extension of the curve. As b increases, the probability of citing older papers increases. For the present purposes, a small value of b represents a strong aging bias that favors citing papers that have been published recently.

  10. Model Validation To validate the TARL model, a 20-year (1982–2001) data set of PNAS was used. The PNAS data set contains 45,120 regular articles. The number of unique authors for those papers is 105,915. Note that the citation counts, particularly for younger papers, are artificially low because they have not existed in the literature long enough to garner many citations. Table 1. PNAS statistics in terms of total number of papers (#p), unique authors (#a), references (#r), citations received per paper (#c), number of coauthors per paper (a#ca), and the number of citations (#cwin) within the PNAS data set for each year

  11. Model Validation The PNAS dataset suggested : The PNAS dataset suggested : systematic deviations from a power law ( most cited papers are cited less often than predicted by a power law, and the less cited papers are cited more often than predicted) =>AGING =>AGING

  12. Statistic Total number of actual and simulated papers (#p) and authors (#a) ( a ) and received citations (#cwin) ( b ). The fit for the first 2 years is poor because the model has no initial citation links nor record of papers before 1981 (how to avoid??)

  13. Simple model to incorporate age bias using the Weibull function. Probability that a vertex born at “t = s” will have a degree k is given by p(k,s,t) . Then the evolution of each of the individual vertices is given by the master equation. p  k , s ,t  1 = p  k − 1  p  t − s  p  k − 1, s ,t  1 − p  k  p  t − s  p  k , s ,t  where, p  k = k / 2t − a t  a − 1  exp [− t / b  a ] p  t − s = C a b

  14. Solving the difference-differential equation obtained from the master equation we get the probability for the case of a = 2 as: 2k − 1 2k − 1  t − s C k p  k ,s ,t =  k − 1  !  b  To obtain the distribution P(k) at a particular “t” we sum over all possible values of “s” To obtain total citations we obtain the average “k” for each year and multiply it with total papers in that year.

  15. Initial conditions considered: At t =1, the first vertex (paper) is created. Each new vertex comes with one edge. Constant increase in the number of papers with time. As a result we get systematic departures from the power law as observed in the PNAS data. However due to our assumption that the first paper was created at t = 1, the total number of citations do not match the data.

  16. Much more remains to be done: Proper initialization has to be performed. Co – evolution of author-article has to be modeled. We still have to introduce some kind of a parameter into the model which will model “rediscovery” of a dormant paper. Validation with different datasets.

Recommend


More recommend