Topic Modelling (and Natural Language Processing) workshop PyCon - PowerPoint PPT Presentation

Apr 22, 2023 •125 likes •271 views

Topic Modelling (and Natural Language Processing) workshop PyCon UK 2019 @MarcoBonzanini github.com/bonzanini/topic-modelling Nice to meet you Data Science consultant: NLP, Machine Learning, Data Engineering Corporate

Topic Modelling   (and Natural Language Processing)   workshop PyCon UK 2019 @MarcoBonzanini github.com/bonzanini/topic-modelling
Nice to meet you • Data Science consultant:   NLP, Machine Learning,   Data Engineering • Corporate training:   Python + Data Science • PyData London chairperson github.com/bonzanini/topic-modelling
This tutorial • Introduction to Topic Modelling • Depending on time/interest:   Happy to discuss broader applications of NLP • The audience (tell me about you):   - new-ish to NLP?   - new-ish to Python tools for NLP? github.com/bonzanini/topic-modelling
Motivation Suppose you: • have a huge number of (text) documents • want to know what they’re talking about • can’t read them all github.com/bonzanini/topic-modelling
Topic Modelling • Bird’s-eye view on the whole corpus (dataset of docs) • Unsupervised learning   pros: no need for labelled data   cons: how to evaluate the model? github.com/bonzanini/topic-modelling
Topic Modelling Input:   - a collection of documents - a number of topics K github.com/bonzanini/topic-modelling
Topic Modelling movie, actor,   soundtrack,   director, … Output:   goal, match,   - K topics referee,   - their word distributions champions, … price, invest, market, stock, … github.com/bonzanini/topic-modelling
Distributional Hypothesis • “You shall know a word by the company it keeps”   — J. R. Firth, 1957 • “Words that occur in similar context, tend to have similar meaning”   — Z. Harris, 1954 • Context approximates Meaning github.com/bonzanini/topic-modelling
Term-document matrix Word 1 Word 2 Word N Doc 1 1 7 2 Doc 2 3 0 5 Doc N 0 4 2 github.com/bonzanini/topic-modelling
Latent Dirichlet Allocation • Commonly used topic modelling approach • Key idea:   each document is a distribution of topics   each topic is a distribution of words github.com/bonzanini/topic-modelling
Latent Dirichlet Allocation • “Latent” as in hidden:   only words are visible, other variables are hidden • “Dirichlet Allocation”:   topics are assumed to be distributed with a specific probability (Dirichlet prior) github.com/bonzanini/topic-modelling
Topic Model Evaluation • How good is my topic model?   “Unsupervised learning”… is there a correct answer? • Extrinsic metrics: what’s the task? • Intrinsic metrics: e.g. topic coherence • More interesting:   - how useful is my topic model?   - data visualisation can help to get some insights github.com/bonzanini/topic-modelling
Topic Coherence • It gives a score of the topic quality • Relationship with Information Theory   (Pointwise Mutual Information) • Used to find the best number of topics for a corpus github.com/bonzanini/topic-modelling
Demo

Recommend

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

TOPIC TOPIC TOPIC TOPIC TOPIC B TOPIC C TOPIC E TOPIC F A D G H Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC B TOPIC C TOPIC E TOPIC F A A D G H Breakfast and

619 views • 22 slides

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Topic 1 Topic 1 ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic 4 Topic 4 Topic 5 Topic 5 Topic 6 Topic 6 Using CDBG for ConnectHome Topic 7 Topic 7 Topic 8 Topic 8 1 Using CDBG

487 views • 16 slides

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 5: Constraint-based grammars Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 5: Constraint-based

730 views • 30 slides

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 6: Compositional Semantics Simone Teufel (Materials by Ann

493 views • 22 slides

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 10: Discourse Simone Teufel (Materials by Ann Copestake)

501 views • 36 slides

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula Buttery (materials by Ann Copestake) Computer Laboratory

554 views • 37 slides

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 7: Lexical Semantics Simone Teufel (Materials mostly by Ann

552 views • 31 slides

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Industrial Natural Language Processing & Information Extraction Industrial Natural Language Processing Industrial Natural Language Processing Overview Natural Language Processing Developing and applying techniques NLP and methods for

479 views • 20 slides

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Language and Language and Language and Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6: CALL Language learning Language learning Language learning First language aquisition First language

343 views • 6 slides

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova ILLC University of Amsterdam 6 December 2018 Natural Language Processing 1 Language generation Language generation

828 views • 50 slides

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova ILLC University of Amsterdam 2 December 2019 1 / 51 Natural Language Processing 1 Language generation Language

831 views • 54 slides

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing 1 Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia Shutova ILLC University of Amsterdam 26 November 2018 1 / 45 Natural Language Processing 1 Compositional

1.06k views • 80 slides

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

CMSC 473/673 Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358 ferraro@umbc.edu Semantics Monday: 2:15-3 Tuesday: 11:00-11:30 Vision & language processing by appointment Learning with low-to-no

1.46k views • 117 slides

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans use language to communicate.

1.18k views • 33 slides

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural Language Processing Similarity and Clustering Advanced Natural Language Processing 1 Similarity and Clustering Similarity and Clustering

360 views • 35 slides

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Overview Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background and Overview Why is NLP hard? What will this course be about? Michael Collins EECS/CSAIL September 6, 2007 Advanced Natural

296 views • 7 slides

Gibbs Sampling for LDA Lei Tang Department of CSE Arizona State University January 7, 2008 1 /

Gibbs Sampling for LDA Lei Tang Department of CSE Arizona State University January 7, 2008 1 / 10 Graphical Representation , are fixed hyper-parameters. We need to estimate parameters for each document and for each topic. Z are

483 views • 11 slides

Load Granularity Refinements Gillian Biedler Senior Market Design & Policy Specialist Market

Load Granularity Refinements Gillian Biedler Senior Market Design & Policy Specialist Market Surveillance Committee Meeting October 8, 2010 Objectives for this discussion Background and context Additional benchmarking against other

245 views • 9 slides

Welcoming Inclusion Network (WIN) When Inclusion Happens We All WIN LORI STILLMAN AND JOHN C.

Welcoming Inclusion Network (WIN) When Inclusion Happens We All WIN LORI STILLMAN AND JOHN C. COOK WIN STAKEHOLDER CO-CHAIRS DARYL WASHINGTON CSB EXECUTIVE DIRECTOR DECEMBER 11, 2018 Welcoming Inclusion Network Launched February 2018

110 views • 9 slides

Conceptual Graphs Based Modeling of Semi-Structured Data Andrea Eva Molnar, Viorica Varga, and

Conceptual Graphs Based Modeling of Semi-Structured Data Andrea Eva Molnar, Viorica Varga, and Christian Sacarea Babes -Bolyai University, Cluj-Napoca, Romania ICCS 2018, June 20-22, 2018, Edinburgh 1/17 I NTRODUCTION Visual query

457 views • 17 slides

Command line completion (CLC) an illustration of learning and decision making using the imprecise

Command line completion (CLC) an illustration of learning and decision making using the imprecise Dirichlet model Erik Quaeghebeur p. 1/15 Classical CLC in action login: erik Password: Last login: Tue Feb 17 08:24:47 on tty1

665 views • 53 slides

MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies

MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard Baum-Welch training produces a

143 views • 12 slides

CONCATENATION AND SPECIES TREE METHODS Joao Tonini, EXHIBIT STATISTICALLY INDISTINGUISHABLE

CONCATENATION AND SPECIES TREE METHODS Joao Tonini, EXHIBIT STATISTICALLY INDISTINGUISHABLE Andrew Moore, David Stern, ACCURACY UNDER A RANGE OF SIMULATED Maryia Scheglovitova, CONDITIONS Guillermo Orti TREE CONCATENATION Differences

403 views • 13 slides

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature selection The approach

866 views • 30 slides

Topic Modelling (and Natural Language Processing) workshop PyCon - PowerPoint PPT Presentation

Topic Modelling (and Natural Language Processing) workshop PyCon UK 2019 @MarcoBonzanini github.com/bonzanini/topic-modelling Nice to meet you Data Science consultant: NLP, Machine Learning, Data Engineering Corporate

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Gibbs Sampling for LDA Lei Tang Department of CSE Arizona State University January 7, 2008 1 /

Load Granularity Refinements Gillian Biedler Senior Market Design & Policy Specialist Market

Welcoming Inclusion Network (WIN) When Inclusion Happens We All WIN LORI STILLMAN AND JOHN C.

Conceptual Graphs Based Modeling of Semi-Structured Data Andrea Eva Molnar, Viorica Varga, and

Command line completion (CLC) an illustration of learning and decision making using the imprecise

MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies

CONCATENATION AND SPECIES TREE METHODS Joao Tonini, EXHIBIT STATISTICALLY INDISTINGUISHABLE

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet

Sambuz

Useful Links

Newsletter

Mail Us

Topic Modelling (and Natural Language Processing) workshop PyCon - PowerPoint PPT Presentation

Topic Modelling (and Natural Language Processing) workshop PyCon UK 2019 @MarcoBonzanini github.com/bonzanini/topic-modelling Nice to meet you Data Science consultant: NLP, Machine Learning, Data Engineering Corporate

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Gibbs Sampling for LDA Lei Tang Department of CSE Arizona State University January 7, 2008 1 /

Load Granularity Refinements Gillian Biedler Senior Market Design &amp; Policy Specialist Market

Welcoming Inclusion Network (WIN) When Inclusion Happens We All WIN LORI STILLMAN AND JOHN C.

Conceptual Graphs Based Modeling of Semi-Structured Data Andrea Eva Molnar, Viorica Varga, and

Command line completion (CLC) an illustration of learning and decision making using the imprecise

MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies

CONCATENATION AND SPECIES TREE METHODS Joao Tonini, EXHIBIT STATISTICALLY INDISTINGUISHABLE

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet

Sambuz

Useful Links

Newsletter

Mail Us

Load Granularity Refinements Gillian Biedler Senior Market Design & Policy Specialist Market