������ Who is more likely to gain a large number of citations � ��2� ()�� ������������ �1����������)� �0 ������������ • Predicting the future influential researchers in big scholarly network
1 2 Dataset & Preprocessing Introduction MENU 4 3 Results and Predicting Conclusion
1 Introduction
Introduction A Whether to be accepted or identified often depends on the influence of a paper(or work). One of the essential factors that indicate the of a scientific work is its frequency of citation. In this project, we firstly introduce the threshold model to get some ideas of the information diffusion in real condition and then we introduce some regression models to fit the features and citation to implement the predicting work
2 Dataset & Preprocessing
Dataset Citation Network Dataset Citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. It can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc
Preprocess #* --- paperTitle #@ --- Authors #year ---- Year #conf --- publication venue #citation --- citation number #! --- Abstract … Extraction Iteration Parsing Extract the text features from Iterating the citation with Use word2vec in Gensim to the original text. the same author or extract the feature inside the publication venue to titles and abstracts. convert the feature into numeric values
Parsing
Analysis Results We plot some diagrams to get clearer ideas about the preprocessed features.
3 Predicting
Threshold Model Sales Generally, earlier publication will have less influence in the future. Each node v has an information acceptance threshold vand is affected by all of its active neighbor nodes A(v). Node v will be activated when
Linear Regression & NLR using the regress() to obtain the weight array for multiple variables and then get the predicting results NLR models add some features by multiplying others. SVM Different Models Use fitrsvm to fit the model and predict the result. Related errors are calculated as well. Regression Tree Use fitrtree to fit the model and predict the result. Related errors are calculated as well.
Common Steps · Data preprocessing Firstly we import the data (”output.csv”), extracting the citation column as the result for training. · Split Then we split the data set into training set and testing set. · Fitting We choose different models to fit the feature and citation and obtain the optimal weight vector. · Post-processing Accumulate the citation of the same author to represent an author’s future impact. · Compare Calculate the errors to compare the performance.
Specific Procedure
Specific Procedure
4 Results & Conclusion
Comparison of Performance
Conclusion We quantify the impact as the citation times of a researcher. By comparing the performance of the different models, we find that the Non-linear regression and SVM models obtain the respectively better predicted results among the four and we are expecting to implement more complicated and accurate algorithms in the future to deeply study the future impact prediction of a researcher.
������ 2018.05
Recommend
More recommend