Learning To Rank Academic Experts Catarina Moreira Outline - PowerPoint PPT Presentation

Instituto Superior Técnico Universidade Técnica de Lisboa Learning To Rank Academic Experts Catarina Moreira

Outline ✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ Datasets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work 2/34

Expert Finding Information Retrieval Gerard Salton Ricardo Baeza-Yates Bruce Croft 3/34

State of the Art Problems Usage of Generative Probabilistic Models Heuristics are too simple and do not reflect expertise Heuristics only based on the documents’ textual contents 4/34

Contributions 1. Different Sets of Features to Estimate Expertise 2. Rank Aggregation Framework for Expert Finding 3. Learning to Rank (L2R) Framework for Expert Finding 5/34

Features: Hypothesis Multiple estimators of expertise, based on different sources of evidence, will enable the construction of more accurate and reliable ranking models ! 7/34

Textual Similarity Term Frequency TF.IDF Inverse Document Frequency BM25 8/34

Profile Information ✓ Number of Publications with(out) query topics ✓ Number of Journals with(out) query topics ✓ Years Between Publications with(out) query topics ✓ Average Number of Publications per year 9/34

Graphs ✓ Total/Max/Avg citations of the authors’ papers ✓ Total Number of Unique Collaborators ✓ Publications’ PageRank ✓ Academic Indexes 10/34

Hirsch Index Hirsch ¡Index 11/34

Other Indexes a-Index Contemporary h-Index (extension of h Index) Trend h-Index (extension of h Index) 12/34

Datasets DBLP - Computer Science Dataset - Covers journal and conference publications - Contains abstracts and citation links - All this information was processed and stored in a database 13/34

Datasets Arnetminer - Validation - Contains experts for 13 query topics - Experts collected from important Program Committees related to the query topics 14/34

Outline ✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ DataSets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work 15/34

Question How can we combine these features? 16/34

Answer Traditional IR techniques use frameworks inspired in traditional search engines to combine different sources of evidence! 17/34

Rank Aggregation Framework for Expert Finding 18/34

Data Fusion Algorithms ✓ Positional ✓ Based on the position that a candidate occupies in a ranked list ✓ Algorithms: Borda Fuse and Reciprocal Rank Fuse ✓ Score Aggregation ✓ Based on the score that a candidate achieved in a ranked list ✓ Algorithms: CombSUM, CombMNZ and CombANZ ✓ Majoritarian ✓ Based on pairwise comparisons between candidates ✓ Algorithms: Condorcet Fusion 19/34

Results Rank Aggregation (MAP)* CombMNZ 48,43% [-10,25%] Cond. Fusion 43,82% CombSUM 41,34% [+6,00%] Borda Fuse 39,99% [+9,58%] Rec. Rank Fuse 39,99% [+9,58%] 35,61% [+23,06%] CombANZ* *Sig. Tests of 0.95 conf. *Mean Average Precision 20/34

Impact of the Features with Condorcet Fusion(MAP)* Graph 43,86% [- 0,09%] Text + Profile + Graph 43,82% Profile + Graph 41,65% [+4,95%] Text + Graph* 39,08% [+10,82%] Profile* 36,87% [+15,86%] Text + Profile* 32,67% [+25,45%] Text* 29,75% [+32,11%] *Sig. Tests of 0.95 conf. *Mean Average Precision 21/34

Question How can we combine these features in an optimal way ? 23/34

Answer IR literature focuses on Machine learning techniques, They enable the combination of multiple estimators in an optimal way! 24/34

The L2R Framework For Expert Finding 25/34 25

L2R Algorithms ✓ Pointwise ✓ Input : single candidate ✓ Goal : use scoring functions to predict relevance ✓ Algorithms : Additive Groves ✓ Pairwise ✓ Input : pair of candidates ✓ Goal : loss function to minimize number of misclassified candidate pairs ✓ Algorithms : RankBoost, SVMrank and RankNet ✓ Listwise ✓ Input : list of candidates ✓ Goal : loss function which directly optimizes an IR metric ✓ Algorithm : SVMmap, Coordinate Ascent and AdaRank 26/34

Results Learning to Rank (MAP)* Additive Groves 89,40% SVMmap 87,02% [+2,66%] 83,11[+7,04%] SVMrank RankBoost* 78,40 [+12,30%] Coord. Ascent* 75,77 [+15,25%] RankNet* 65,30% [+26,96%] AdaRank* 64,78% [+27,54%] *Sig. Tests of 0.95 conf. *Mean Average Precision 27/34

Impact of the Features with Additive Groves(Map)* Text + Profile + Graph 89,40% 88,25% [+1,29%] Text + Graph* Profile 87,28% [+2,37%] 87,14% [+2,53%] Text + Profile* Text 86,60% [+3,13%] Graph 85,26% [+4,63%] Profile + Graph* 82,37% [+7,86%] *Sig. Tests of 0.95 conf. *Mean Average Precision 28/34 Text + Profile + Graph

Comparison with State of the Art (MAP)* Balog’s Model 2 39,15% [+56,21%] Deng’s AuthorRank 49,06% [+45,12%] Yang’s SVMrank 63,56% [+28,90%] Moreira’s Add. Groves 89,40% *Mean Average Precision 29/34

Prototype

Conclusions ✓ Effectiveness of the Learning to Rank Framework ✓ Best algorithms: Additive Groves, SVMmap and SVMrank ✓ Effectiveness of the Rank Aggregation Approach ✓ Best algorithms: CombMNZ and Condorcet Fusion ✓ Effectiveness of the Proposed Features ✓ Set of full features are the best 32/34

Future Work ✓ Feature Selection Techniques (ex: PCA) ✓ Expert Finding in an organizational environment (TREC dataset) ✓ Tasks beyond expert finding ✓ Natural Language Processing ✓ Geographic Information Retrieval 33/34

Publications ✓ C. Moreira , P . Calado and B. Martins, Learning to Rank for Expert Search in Digital Libraries of Academic Publications, In proceedings of the 15th portuguese conference on Artificial Intelligence, 2011 ✓ C. Moreira , B. Martins and P . Calado, Using Rank Aggregation for Expert Search in Academic Digital Libraries , In Simpósio de Informática, INFORUM, 2011 ✓ C. Moreira , A. Mendes, L. Coheur and B. Martins, Towards the Rapid Development of a Natural Language Understanding Module, In proceedings of the 11th conference on intelligent virtual agents, 2011 34/34

Learning To Rank Academic Experts Catarina Moreira Outline - PowerPoint PPT Presentation

Instituto Superior Tcnico Universidade Tcnica de Lisboa Learning To Rank Academic Experts Catarina Moreira Outline Introduction State of the Art Problems Features to Estimate Expertise Datasets Approaches and Results

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Learning from Academic Learning from Academic Learning from Academic Learning from Academic

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Public consultation EXPERTS WIPO ADR PRESENTATION AND CURRENT STATE OF THE EXPERTS WIPO ADR

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin

Vice Provost for Academic Personnel September 22, 2006 Elizabeth Lord Ladder Rank Structure Rank

RANK & TENURE REVIEW PROCESS Tenure, Promotion, Advancement within Rank August 25, 2016 The

1 Academic Leader Academic Leader Funding Funding Vote for me ! Challenges Challenges

Cross-Domain Learning-to-rank with SVM Erheng Zhong 1 1 Department of Computer Science and

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

Introduction to rank-based cryptography Philippe Gaborit University of Limoges, France ASCRYPTO

Aggregating Preferences CMPUT 366: Intelligent Systems S&LB 9.1-9.4, 10.1-10.3

Fully Proportional Representation as Resource Allocation: Approximability Results Piotr Skowron 1

Impartial-culture asymptotics a central limit theorem for manipulation of elections Geoffrey

MA111: Contemporary mathematics . Jack Schmidt . . . University of Kentucky August 29, 2012

18.175: Lecture 9 Borel-Cantelli and strong law Scott Sheffield MIT 1 18.175 Lecture 9 Outline

18.175: Lecture 10 Zero-one laws and maximal inequalities Scott Sheffield MIT 1 18.175 Lecture 10

Derandomization in Game- Theoretic Probability Kenshi Miyabe, Meiji University, Japan (joint

Distributional fixed points and attractors in queueing theory Sergio L opez Universidad

Learning To Rank Academic Experts Catarina Moreira Outline - PowerPoint PPT Presentation

Instituto Superior Tcnico Universidade Tcnica de Lisboa Learning To Rank Academic Experts Catarina Moreira Outline Introduction State of the Art Problems Features to Estimate Expertise Datasets Approaches and Results

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Learning from Academic Learning from Academic Learning from Academic Learning from Academic

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Public consultation EXPERTS WIPO ADR PRESENTATION AND CURRENT STATE OF THE EXPERTS WIPO ADR

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin

Vice Provost for Academic Personnel September 22, 2006 Elizabeth Lord Ladder Rank Structure Rank

RANK &amp; TENURE REVIEW PROCESS Tenure, Promotion, Advancement within Rank August 25, 2016 The

1 Academic Leader Academic Leader Funding Funding Vote for me ! Challenges Challenges

Cross-Domain Learning-to-rank with SVM Erheng Zhong 1 1 Department of Computer Science and

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

Introduction to rank-based cryptography Philippe Gaborit University of Limoges, France ASCRYPTO

Aggregating Preferences CMPUT 366: Intelligent Systems S&amp;LB 9.1-9.4, 10.1-10.3

Fully Proportional Representation as Resource Allocation: Approximability Results Piotr Skowron 1

Impartial-culture asymptotics a central limit theorem for manipulation of elections Geoffrey

MA111: Contemporary mathematics . Jack Schmidt . . . University of Kentucky August 29, 2012

18.175: Lecture 9 Borel-Cantelli and strong law Scott Sheffield MIT 1 18.175 Lecture 9 Outline

18.175: Lecture 10 Zero-one laws and maximal inequalities Scott Sheffield MIT 1 18.175 Lecture 10

Derandomization in Game- Theoretic Probability Kenshi Miyabe, Meiji University, Japan (joint

Distributional fixed points and attractors in queueing theory Sergio L opez Universidad

RANK & TENURE REVIEW PROCESS Tenure, Promotion, Advancement within Rank August 25, 2016 The

Aggregating Preferences CMPUT 366: Intelligent Systems S&LB 9.1-9.4, 10.1-10.3