Text Classification and Sentiment Analysis Alejandro Moreo AFIRM 16th January 2019 Alejandro Moreo Text Classification and Sentiment Analysis
Overview The Toolkit: scikit-learn The Environment: Jupyter Guided Exercise: topic classification Hands-on Activities: sentiment classification Concluding Remarks Alejandro Moreo Text Classification and Sentiment Analysis
The Toolkit Alejandro Moreo Text Classification and Sentiment Analysis
The Toolkit Alejandro Moreo Text Classification and Sentiment Analysis
Plan of the Hands-on activities We will explore scikit-learn ’s tools for text analysis and text mining that instantiate the most important methods described in the lectures. Guided exercise: text classification by topic Loading datasets: 20 Newsgroups Data preprocessing: n-grams extraction, stop-words removal, and stemming with NLTK Corpus representation: tf-idf vectorial representation Learning a classifier: Support Vector Machines Test and Evaluation of results Alejandro Moreo Text Classification and Sentiment Analysis
Exercises The participants will create and optimize their own sentiment classifier. Concretely, we will explore: 1 Feature Selection : χ 2 -based filtering 2 Weighting Functions : binary, tf, tf-idf, ... 3 Parameter Optimization : get the most of the classifier 4 Comparing Learners : Logistic regression, k-NN, Naive Bayes, ... 5 Competition! ... let’s get started! Alejandro Moreo Text Classification and Sentiment Analysis
Recommend
More recommend