U S I N G C I TAT I O N - M A P P I N G T O A S S E S S E C O N O M I C M O D E L S O F S C I E N C E Mike Thicke PhD, IHPST, University of Toronto (2016) Bard College, Bard Prison Initiative mikethicke@gmail.com www.mikethicke.com
I N T R O D U C T I O N • Dissertation: Consequences of importing economic ideas and methods into philosophy of science. • Formal models of the division of cognitive labor in science substitute plausibility and robustness for empirical data. • Without empirical data to establish representational or predictive accuracy, only weak inferences about science can be drawn. • Citation analysis one way to inform models with data. • Two examples from my project on CDL in climate science. • Advantages and challenges of citation analysis.
T W O WAY S T O A S S E S S M O D E L S W I T H D ATA Predictive Accuracy D ATA M O D E L D ATA Representational Accuracy
W E I S B E R G : R E P R E S E N TAT I O N A L A C C U R A C Y • Volterra principle: “A general pesticide will increase abundance of prey and decrease abundance of predators.” • Data at the beginning: populations can be “described by coupled differential equations.” • Model explores consequences of that. • Robustness analysis at the end confirms results of model.
S C H E L L I N G : P R E D I C T I V E A C C U R A C Y • Racial segregation can result from “mild" racial preferences. • Individuals move if too many neighbours are of different race. • Plausibility at beginning, confirmed by data at end.
A S S E S S M E N T I N F O R M A L M O D E L S O F S C I E N C E Predictive Accuracy D ATA M O D E L D ATA Representational Accuracy
A S S E S S M E N T I N F O R M A L M O D E L S O F S C I E N C E Predictive Accuracy M O D E L P L A U S I B I L I T Y R O B U S T N E S S Representational Accuracy
P L A U S I B I L I T Y: T H O M A O N W E I S B E R G & M U L D O O N • Weisberg and Muldoon: research communities composed of mavericks and followers. • Thoma: Implausible that anyone would employ follower strategy: • Scientists can easily learn about the success of nearby approaches without investigating themselves. • Why would anyone be motivated to duplicate work for no epistemic benefit?
R O B U S T N E S S : W E I S B E R G & M U L D O O N O N K I T C H E R & S T E V E N S • Kitcher & Strevens: Self-interested scientists can achieve optimal divisions of labour between two research projects. • Weisberg and Muldoon: Result not robust to changes in scientists’ knowledge of each others’ work. • As radius of vision decreases, community diverges from optimal allocation.
W H Y I S D ATA I M P O R TA N T ? • Robustness analysis epistemically significant only to the extent that the model is representationally accurate. • Plausibility only weakly establishes representational accuracy. • Plausibility epistemically significant only to the extent that the model is predictively accurate. • Robustness only weakly establishes predictive accuracy. • Even if plausibility+robustness are informative about target systems, impossible to establish magnitude of effects without data. • To make normative claims about scientific practice, need to establish magnitudes.
M Y P R O J E C T: C O G N I T I V E D I V I S I O N O F L A B O R I N C L I M AT E S C I E N C E
S U N D B E R G ' S C L A I M S • Climate models are an obligatory passage point to climate policy. • Data flows from experiments to models through parameterizations. • Experimentalists often fail to translate their results into parameterizations that are useful to modelers. • Climate science faces a coordination problem. Sundberg, “Parameterizations as Boundary Objects on the Climate Arena” (2007).
R E S E A R C H Q U E S T I O N S • Is there really a coordination problem in climate science between modelers and experimentalists? • What is the magnitude of this problem? • If there is a problem, what is the cause? • Problem of education / communication? • Problem of incentives?
CITATION COUNTS: STANDARD DEVIATIONS 36 ABOVE MEAN C L I M AT E M O D E L 7 1 6 3 6 17 6 6 PARAMETERIZATION 851 47 2 0.2 A E R O S O L 5 6 8 7 -0.3
PARAMETERIZATION → AEROSOL CITATIONS COMPARED TO PARAMETERIZATION → RANDOM CITATIONS 1 SD 6 SD 571 Citations 240 Citations
M O D E L I N G T H E C A U S E • Assume there is a coordination problem. What is the cause? • Observation: Citation counts follow power laws. • Hypothesis: Rational scientists seeking to maximize citations will target papers narrowly. • Paper quality is group-relative, widely-targeted papers will have medium quality for many groups while narrowly targeted papers will have high quality for one group and low quality for others. • Maximizing quality relative to one group at the expense of others will maximize total citations. • It is easier to target a paper narrowly at one’s own discipline. • Few papers will be targeted outside of home discipline.
CITATIONS OF “AEROSOL” PAPERS PA P E R S 4 9 9 7 M E A N 5 . 6 M E D I A N 3 1 0 % 0 9 0 % 1 3 9 9 % 4 1 9 9 . 9 % 1 4 6 Very long tail
A S I M P L E M O D E L Q, Q ω , Q ψ ∈ (0 , 1) quality, internal quality, external quality C, C ω , C ψ total, internal, and external citation counts A ∈ (1 5 , 1 4 , 1 3 , 1 2 , 1 , 2 , 3 , 4 , 5) degree of specialization (1/5 and 5 are high) 1 q ω ,i = q a specializing trades off between internal and a q ψ ,i = q i i external quality − 1 𝜇 , 𝜆 parameters of Pareto (long-tailed) c ω ,i = λ (1 − q ω ,i ) κ − 1 distribution. − 1 c ψ ,i = λ (1 − q ψ ,i ) κ − 1 Total citations is sum of internal and external citations. c i = c ω ,i + c ψ ,i
P E R C E N T I L E 1 0 % 5 0 % 9 0 % 9 9 % U N I F O R M Q 0 3 1 6 3 8 R A N D O M A 1 5 1 7 4 1 E X T R E M E A 2 6 1 8 4 1
O T H E R P O S S I B L E M O D E L S • Alternative causes (eg. making papers useful to wider audiences takes more time). • Alternative models of specialization. • Agent-based simulations (papers accrue citations through time, papers take time to produce, authors have varying utility functions, authors have varying talent, authors discover papers through previous citation, adjustable reward structure).
C I TAT I O N S A S D ATA : A D VA N TA G E S • Can parameterize/fit models with empirical data. • Can test model predictions against empirical data. • Can measure effect sizes.
C I TAT I O N S A S D ATA : C H A L L E N G E S • Time consuming. • Long execution times. • Data access can be difficult. • Never get full coverage. • Even with good datasets (eg. Web of Science), tracking citations can be difficult. • Messy data. • Limited range of questions that can be answered. • Don’t have access to counterfactual world (hard to use data at both ends of model).
A W E B O F S C I E N C E R E C O R D
C O U N T E R FA C T U A L S • Model requires specifying 𝜇 , 𝜆 − 1 c ω ,i = λ (1 − q ω ,i ) κ − 1 parameters for each distribution. − 1 c ψ ,i = λ (1 − q ψ ,i ) κ − 1 • Currently based on real data. Alternatively, use regression. • Can’t double-dip: compare predictions with same data used to parameterize model. • How to assess predictive accuracy? • Need data other than citations at one end or the other, or substitute plausibility / robustness.
R E F E R E N C E S • Weisberg, Michael. “Robustness Analysis.” Philosophy of Science (2006). • Thoma, Johanna. “The Epistemic Division of Labor Revisited.” Philosophy of Science (2015). Mike Thicke • Weisberg, Michael, and Ryan Muldoon. “Epistemic Landscapes and the Division of mikethicke@gmail.com Cognitive Labor.” Philosophy of Science (2009). www.mikethicke.com • Muldoon, R, and M Weisberg. “Robustness and Idealization in Models of Cognitive Labor.” Synthese (2010). • Sundberg, Mikaela. “Parameterizations as Boundary Objects on the Climate Arena.” Social Studies of Science (2007).
Recommend
More recommend