Research Group Social Computing Department of Informatics Technical University of Munich Preliminary Meeting of the NLP Lab Course SS2020 Master Lab Course - Machine Learning for Natural Language Processing Applications (IN2106, IN4249) Gerhard Hagerer, M.Sc., Monika Wintergerst, M.Sc., Maximilian Wich, M.Sc., PD Dr. Georg Groh Research Group Social Computing, Department of Informatics, Technical University of Munich 31.01.2020
Research Group Social Computing Department of Informatics Technical University of Munich Outline 1 Requirements 2 Registration 3 Procedure 4 Domains − Opinion Mining − Virtual Dietary Advisor − Hate Speech Detection Preliminary Meeting of the NLP Lab Course SS2020 1
Research Group Social Computing Department of Informatics Technical University of Munich Requirements Minimum: • Master student in computer science, data engineering, or "alike" • Good enough English skills • Basic programming and machine learning knowledge Important: • Hands-on experience in Python, Pandas, Numpy, and SciPy • Basic knowledge about artificial neural networks • Basic knowledge about natural language processing Optimal: • Practical experience with Deep Learning frameworks, such as PyTorch, Tensorflow, Theano, Keras, etc. Preliminary Meeting of the NLP Lab Course SS2020 2
Research Group Social Computing Department of Informatics Technical University of Munich Registration • Until 12 Feb, send an email to ghagerer@mytum.de , maximilian.wich@tum.de , and monika.wintergerst@tum.de containing − the subject "NLP Lab Course Registration - Domain X" − your CV, − your transcript of records, − a ranked list (1., 2., 3.) of the domains you are interested in, − a motivational statement (one paragraph). • This email is considered when ranking the interested students for the course. • Until 12 Feb, you also have to register for the course on the matching system. • Until 20 Feb, you are (probably) notified by the matching system about the status of participation. • Until the end of February, you are informed by me about the available topics. Preliminary Meeting of the NLP Lab Course SS2020 3
Research Group Social Computing Department of Informatics Technical University of Munich Email template Only emails following this format will be considered: To: ghagerer@mytum.de; maximilian.wich@tum.de; monika.wintergerst@tum.de Subject: NLP Lab Course Registration - Domain 1 Text: Hi, CV.pdf Transcript.pdf I would like to participate in the NLP Lab Course. My domain priorities are: 1. Domain 1 2. Domain 2 3. Domain 3 Motivation: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. (max. 300 characters) Relevant skills/experiences: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. (max. 300 characters) Best, Your Name Preliminary Meeting of the NLP Lab Course SS2020 4
Research Group Social Computing Department of Informatics Technical University of Munich Procedure Project teams: • You are going to work in teams of 2 or 3 persons on one project topic. • You can choose with whom to work with the project topic. • Every project member has to report and work equally (no dirty business!). Procedure: • There will be one kickoff meeting at the beginning of the semester. • There are going to be bi-weekly consulting and progress report sessions. • You have to conduct a final project presentation and report at the end of the semester. Everything else will be announced at the beginning of the semester. Preliminary Meeting of the NLP Lab Course SS2020 5
Research Group Social Computing Department of Informatics Technical University of Munich Domains The course consists of three parts which are contentwise and organisational independent. • Opinion Mining – Gerhard Hagerer • Virtual Dietary Advisor – Monika Wintergerst • Hate Speech Detection – Maximilian Wich In your registration email, you have to tell us which domain you find most interesting by ranking all three domains from 1 (most interesting) to 3 (least interesting). Please do not forget to mention the most favorable domain in the subject! Preliminary Meeting of the NLP Lab Course SS2020 6
Research Group Social Computing Department of Informatics Technical University of Munich Domains – Opinion Mining Gerhard Hagerer, M.Sc. Our case study: Review Aspects + OPINION Internet Collection Sentiments MINING SYSTEM Web Crawler (relevance filter) • Research Questions: − What are the consumer beliefs on organic food expressed in social media? − In cooperation with TUM School of Management, Chair of Marketing and Consumer Research Preliminary Meeting of the NLP Lab Course SS2020 7
Research Group Social Computing Department of Informatics Technical University of Munich Domains – Opinion Mining Gerhard Hagerer, M.Sc. Methodology: Samples by nationality • Deep pre-trained embedding models − BERT, Universal Sentence Encoder, GloVe, fastText, word2vec, ... − Semantically aligned multi-lingual embeddings (XLING, MUSE, ...) − Derive meaningful document representations from these. • Clustering techniques − Optimization of semantic coherence − Density-based vs. convex clustering • Tasks − Unsupervised Aspect Extraction − Sentiment Analysis − Neural Topic Modelling − Cluster visualization − Class coherence and overlapping analysis Preliminary Meeting of the NLP Lab Course SS2020 8
Research Group Social Computing Department of Informatics Technical University of Munich Domains – Virtual Dietary Advisor Monika Wintergerst, M.Sc. Preliminary Meeting of the NLP Lab Course SS2020 9
Research Group Social Computing Department of Informatics Technical University of Munich Domains – Virtual Dietary Advisor Monika Wintergerst, M.Sc. Areas of interest: • Substitute recommendation − Motivation: make favorite dishes healthier through small changes − Use recipe texts, ontological knowledge − Identify similar ingredient alternatives • Dialog interaction − Moivation: emulate a real dietician − Detect a user’s state of mind and react empathically − Encourage self-reflection and mindfulness Preliminary Meeting of the NLP Lab Course SS2020 10
Research Group Social Computing Department of Informatics Technical University of Munich Domains – Hate Speech Detection | Overview Maximilian Wich, M.Sc. Improve machine learning models for hate Hate Speech Detection speech detection (e.g., integrating social media, identifying other relevant features besides text) Make predictions of machine learning models more transparent (explainable AI) Preliminary Meeting of the NLP Lab Course SS2020 11
Research Group Social Computing Department of Informatics Technical University of Munich Domains – Hate Speech Detection | Topics Maximilian Wich, M.Sc. Potential topics/ideas: • Multitask learning to combine data sets with different labeling schemes − Problem: there are many hate speech data sets, but they use different labeling schemes − Idea: train a multitask classifier (e.g., BERT) with shared layers based on several data sets • Learning from weak supervision to increase the amount of training data without manual labeling − Problem: we do not have enough trainings data − Idea: train classifiers on available data, collect new data with these classifiers, and retrain the classifiers • Classify hate speech based on stylistic elements (e.g., POS, usage of emojis...) − Problem: implicit hate speech is often hard to identify − Idea: use stylistic elements to find patterns in hate speech and train an classifier Preliminary Meeting of the NLP Lab Course SS2020 12
Research Group Social Computing Department of Informatics Technical University of Munich Questions? Preliminary Meeting of the NLP Lab Course SS2020 13
Recommend
More recommend