nalysis in bibliometrics
play

nalysis in bibliometrics ne network rk ana Lovro ubelj - PowerPoint PPT Presentation

nalysis in bibliometrics ne network rk ana Lovro ubelj University of Ljubljana, Faculty of Computer and Information Science CWTS 17 ovenia chicken Sl Slov Pannonian flat like NL :) Alps 2864 m Ljubljana karst seaside


  1. nalysis in bibliometrics ne network rk ana Lovro Šubelj University of Ljubljana, Faculty of Computer and Information Science CWTS ‘17

  2. ovenia “chicken” Sl Slov Pannonian flat like NL :) Alps ≤ 2864 m Ljubljana karst seaside caves & wine < 50 km :(

  3. University of Lj Ljubljana • since 1919 271 st in CWTS Leiden Ranking 2017 • 26 members 23 faculties & 3 academies • 40,110 students & 5,730 staff in 2016

  4. Faculty of Co Computer and Information Science • since 1996 cs study since 1973 • ≈1,300 students & ≈180 staff • BSc , MSc , PhD cs, prog, math, mm • research cs, db, is, dm, ml, ai, nets

  5. ne networks courses

  6. talk ou outline 1. reliability of bibliographic databases Šubelj, L., Fiala, D., & Bajec, M. (2014). Scientific Reports, 4, 6496. Šubelj, L., Bajec, M., Boshkoska, B. M., et al. (2015). PLoS ONE, 10(5), e0127390. 2. modeling paper citation networks Šubelj, L., & Bajec, M. (2013). In Proceedings of the LSNA ‘13, pp. 527–530. Šubelj, L., Žitnik, S., & Bajec, M. (2014). In Proceedings of the NetSci ’ 14, p. 1. 3. clustering paper citation networks Šubelj, L., Van Eck, N. J., & Waltman, L. (2016).PLoS ONE, 11(4), e0154404.

  7. bibliographic databases re reliability • databases basis for research & evaluation • databases can differ substantially different databases often give quite different conclusions • content & structure can differ substantially coverage, timespan, features, accuracy, acquisition etc. • only informal notions on their reliability particular case of reliability of structure of citation networks

  8. structure of ci citation networks • statistics of citation networks • mostly consistent with outliers outliers due to data acquisition in most cases • comparison over one statistic • comparison over many statistics? same problem in machine learning community

  9. methodology of database comparison me • network statistics — residuals — database rank • mean ranks of databases over many statistics • residuals since “true database” is not known database reliability seen as consistency with other databases 2 3 Pairwise Spearman correlations ρ ij Residuals mean ranks R i ∃ ρ ij : H 1 Two-tailed Fisher independence z -tests ∀ ρ ij : H 0 One-tailed Friedman rank test H 0 H 0 : ρ ij = 0 at P -value = 0 . 01 H 0 : R i = R j at P -value = 0 . 1 χ 2 -distribution with d.f. N − 1 Standard normal distribution H 1 ∃ ˆ x ij : H 1 1 4 Studentized statistics residuals ˆ x ij Residuals mean ranks R i Two-tailed Nemenyi post-hoc test ∀ ˆ x ij : H 0 Two-tailed Student statistics t -tests H 0 H 0 : ˆ x ij = 0 at P -value = 0 . 1 H 0 : R i = R j at P -value = 0 . 1 Studentized range with d.f. N 25 Student t -distribution with d.f. N − 2

  10. comparison of ci citation networks • comparison of different citation networks results robust to selection of networks, statistics, patterns etc. P -value = 0 . 1 1 2 3 4 5 6 WoS DBLP Cora PubMed arXiv APS A P → P • comparison of different information networks

  11. comparison of bi bibl bliographi phic ne networks • A paper citation networks information networks • C author collaboration networks social networks • B author citation networks social-information networks P -value = 0 . 1 P -value = 0 . 1 1 2 3 4 5 6 1 2 3 4 5 6 WoS DBLP Cora APS Cora PubMed arXiv DBLP arXiv APS WoS PubMed A B A P → P B A ↔ A P -value = 0 . 1 1 2 3 4 5 6 DBLP arXiv there is no WoS PubMed Cora APS C C A − A “best” database!

  12. talk ou outline 1. reliability of bibliographic databases Šubelj, L., Fiala, D., & Bajec, M. (2014). Scientific Reports, 4, 6496. Šubelj, L., Bajec, M., Boshkoska, B. M., et al. (2015). PLoS ONE, 10(5), e0127390. 2. modeling paper citation networks Šubelj, L., & Bajec, M. (2013). In Proceedings of the LSNA ‘13, pp. 527–530. Šubelj, L., Žitnik, S., & Bajec, M. (2014). In Proceedings of the NetSci ’ 14, p. 1. 3. clustering paper citation networks Šubelj, L., Van Eck, N. J., & Waltman, L. (2016).PLoS ONE, 11(4), e0154404.

  13. models of ci citation networks • generative models of citation networks to reason about structure, evolution, dynamics, future etc. • many possible applications in bibliometrics z z z y y y x x x i i i a a a

  14. fo forest fire network model • each new node i forms links as follows 1. i selects initial ambassador a and links to a 2. i selects its neighbors y , z and links to y , z 3. y , z are taken as new ambassadors of i w w v v z z y y x x i i a a

  15. forest fire ci citation model • each new paper i cites as follows 1. i selects initial paper a and cites a 2. i selects its references y , z and cites y , z 3. y , z are taken as new reading for i w w v v z z y y x x i i a a • then authors read all cited papers and vice-versa • only ≈20% references read (Simkin & Roychowdhury, 2003)

  16. realistic ci citation model • each new paper i cites as follows 1. i selects initial paper a and can cite a 2. i selects its references y , z and can cite y , z 3. some references are taken as new reading for i w w v v z z y y x x i i a a • read & cited papers modeled independently

  17. directed ci citation model • directed dynamics much more complicated • model reproduces WoS citation networks • clear optima (peak) in model parameters

  18. im implic licat atio ions of citation model one read paper ≈ five two cited papers!

  19. talk ou outline 1. reliability of bibliographic databases Šubelj, L., Fiala, D., & Bajec, M. (2014). Scientific Reports, 4, 6496. Šubelj, L., Bajec, M., Boshkoska, B. M., et al. (2015). PLoS ONE, 10(5), e0127390. 2. modeling paper citation networks Šubelj, L., & Bajec, M. (2013). In Proceedings of the LSNA ‘13, pp. 527–530. Šubelj, L., Žitnik, S., & Bajec, M. (2014). In Proceedings of the NetSci ’ 14, p. 1. 3. clustering paper citation networks Šubelj, L., Van Eck, N. J., & Waltman, L. (2016).PLoS ONE, 11(4), e0154404.

  20. clustering citation networks cl • clustering papers based on direct citation relations research areas or topics of papers • systematic comparison of large number of methods network clustering and partitioning there is no “best” method!

  21. thank you! network convexity LCN2 seminar next Friday at 4pm in Snellius

Recommend


More recommend