When we have a large amount of data, we would like to know if they - PDF document

<Your Name> LDA and LSA for Topic Modeling on ORA Joshua Uyheng juyheng@cs.cmu.edu CASOS Center, Institute for Software Research Carnegie Mellon University CASOS Summer Institute 2020 Center for Computational Analysis of Social and Organizational Systems http://www.casos.cs.cmu.edu/ Topic Models • When we have a large amount of data, we would like to know if they can be grouped in a meaningful way • “Topics” are a way of thinking of the clustering problem – Data instances are “documents” – Different documents use different “words” – When documents use similar words in similar ways, they might belong to the same “topic” 2 June 2020 1

<Your Name> Some examples Literal texts More figurative “documents” Dogs like to run and play. Dogs are people’s best friend. Dogs like to chew on bones. Biology is the study of living organisms. Chemistry is the study of matter. Psychology is the study of human behavior and mental processes. One Direction will hold their concert next week. Did you buy the One Direction merchandise? Harry is my favorite One Direction member. 3 June 2020 LSA vs. LDA • Latent Semantic Analysis or Latent Semantic Indexing – Based on matrix factorization – Big difference: You can have negative values • Latent Dirichlet Allocation – Based on probabilistic graphical model – Big difference: Scores expressed as probabilities • Both popular 4 June 2020 2

<Your Name> Latent Semantic Analysis Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407. 5 June 2020 Latent Dirichlet Allocation Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022. 6 June 2020 3

<Your Name> In practice… • There is no hard and fast way to decide which model is better • A large factor in deciding on the quality and interpretation of a topic model is human judgment • Many will work for general purposes 7 June 2020 In a network setting 1. Documents and words don’t have to be literal documents and words People can serve as “documents” • Hashtags can serve as “words” • Topics can represent tendencies between certain agents to • invoke certain hashtags 2. We can visualize multiple kinds of connections between agents and concepts 8 June 2020 4

<Your Name> Case of NATO Trident Juncture 2018 Uyheng, J., Magelinski, T., Villa-Cox, R., Sowa, C., & Carley, K. M. (2019). Interoperable pipelines for social cyber-security: Assessing Twitter information operations during NATO Trident Juncture 2018. Computational and Mathematical Organization Theory. Advance online publication. 9 June 2020 Topics extracted Uyheng, J., Magelinski, T., Villa-Cox, R., Sowa, C., & Carley, K. M. (2019). Interoperable pipelines for social cyber-security: Assessing Twitter information operations during NATO Trident Juncture 2018. Computational and Mathematical Organization Theory. Advance online publication. 10 June 2020 5

<Your Name> Topics for social cyber-security Uyheng, J., Magelinski, T., Villa-Cox, R., Sowa, C., & Carley, K. M. (2019). Interoperable pipelines for social cyber-security: Assessing Twitter information operations during NATO Trident Juncture 2018. Computational and Mathematical Organization Theory. Advance online publication. 11 June 2020 LDA and LSA for Topic Modeling on ORA Joshua Uyheng juyheng@cs.cmu.edu CASOS Center, Institute for Software Research Carnegie Mellon University CASOS Summer Institute 2020 Center for Computational Analysis of Social and Organizational Systems http://www.casos.cs.cmu.edu/ 6

When we have a large amount of data, we would like to know if they - PDF document

<Your Name> LDA and LSA for Topic Modeling on ORA Joshua Uyheng juyheng@cs.cmu.edu CASOS Center, Institute for Software Research Carnegie Mellon University CASOS Summer Institute 2020 Center for Computational Analysis of Social and

JET Job Skills Elementary School I Like Rain By Sarah Rogers-Tanner I like rain I dont like

Two Theads, One Shared Variable Two threads updating shared variable amount T 1 wants to decrement

Two Theads, One Shared Variable Two threads updating shared variable amount T 1 wants to decrement

Know how. Know now. Know how. Know now. Please Thank our sponsor! The Nebraska Soybean Board

What You Dont Know What You Dont Know What You Dont Know What You Dont Know That

Thank you for making me feel What would you like to walk away with? What I would like to share

A. B. Like Neutral Dislike Like Neutral Dislike 3 1 15 24 3 2 22 16 1 A. B. Like

would like to YOU go to like to go? college? How do How do I I apply? get there? Have

I Know it Was the Blood Verse 1 I know it was the blood I know it was the blood I know it was

HOW TO BECOME AN EFFECTIVE GROUP FACILITATOR How do I prepare? Know your Know your Know your

Rendering a Large Amount of Units Contents Overdraw Culling Draw calls

Fasson EXACT Roll Widths Designed To Fit Your Bottom Line 1 The ideal world ? Would you

No place like No place like HOME No place like No place like HOME HOME HOME (Harmonising

Cheshire Puss, she began, rather timidly, as she did not at all know whether it would like

Passages worth the dig: Passages worth the dig: Picking a Pastor/Leader How would YOU

DARE Disability Access Route to Education School Presentation 2018 Would Where YOU would

Alva L. Couch, Ph.D. Tufts University Network management is at a critical juncture It was

The Quest for the Perfect Search Engine : Values, Technical Design, and the Flow of Personal

Gyrotonic Aftermarket Bench Lift Design ME ME 395 F Fin inal l Pr Proje ject Gr Group up

The Oort cloud: shape an dynamics Marc Fouchard (University of Lille 1 / IMCCE) Hans Rickman

COVID UX: How to engage users in a crisis Expertise webinar April 3, 2020 Our agenda for today

65 ships (Sweden & Finland) 250 aircraft 10,000 vehicles 50,000 personnel Trident Juncture

Cycle 1 2020: Improving Methods for Conducting Patient-Centered Outcomes Research (PCOR)

Perspectives on the Internet Version of the Language and Culture Atlas of Ashkenazic Jewry

Sambuz

Useful Links

Newsletter

Mail Us

When we have a large amount of data, we would like to know if they - PDF document

<Your Name> LDA and LSA for Topic Modeling on ORA Joshua Uyheng juyheng@cs.cmu.edu CASOS Center, Institute for Software Research Carnegie Mellon University CASOS Summer Institute 2020 Center for Computational Analysis of Social and

JET Job Skills Elementary School I Like Rain By Sarah Rogers-Tanner I like rain I dont like

Two Theads, One Shared Variable Two threads updating shared variable amount T 1 wants to decrement

Two Theads, One Shared Variable Two threads updating shared variable amount T 1 wants to decrement

Know how. Know now. Know how. Know now. Please Thank our sponsor! The Nebraska Soybean Board

What You Dont Know What You Dont Know What You Dont Know What You Dont Know That

Thank you for making me feel What would you like to walk away with? What I would like to share

A. B. Like Neutral Dislike Like Neutral Dislike 3 1 15 24 3 2 22 16 1 A. B. Like

would like to YOU go to like to go? college? How do How do I I apply? get there? Have

I Know it Was the Blood Verse 1 I know it was the blood I know it was the blood I know it was

HOW TO BECOME AN EFFECTIVE GROUP FACILITATOR How do I prepare? Know your Know your Know your

Rendering a Large Amount of Units Contents Overdraw Culling Draw calls

Fasson EXACT Roll Widths Designed To Fit Your Bottom Line 1 The ideal world ? Would you

No place like No place like HOME No place like No place like HOME HOME HOME (Harmonising

Cheshire Puss, she began, rather timidly, as she did not at all know whether it would like

Passages worth the dig: Passages worth the dig: Picking a Pastor/Leader How would YOU

DARE Disability Access Route to Education School Presentation 2018 Would Where YOU would

Alva L. Couch, Ph.D. Tufts University Network management is at a critical juncture It was

The Quest for the Perfect Search Engine : Values, Technical Design, and the Flow of Personal

Gyrotonic Aftermarket Bench Lift Design ME ME 395 F Fin inal l Pr Proje ject Gr Group up

The Oort cloud: shape an dynamics Marc Fouchard (University of Lille 1 / IMCCE) Hans Rickman

COVID UX: How to engage users in a crisis Expertise webinar April 3, 2020 Our agenda for today

65 ships (Sweden &amp; Finland) 250 aircraft 10,000 vehicles 50,000 personnel Trident Juncture

Cycle 1 2020: Improving Methods for Conducting Patient-Centered Outcomes Research (PCOR)

Perspectives on the Internet Version of the Language and Culture Atlas of Ashkenazic Jewry

Sambuz

Useful Links

Newsletter

Mail Us

65 ships (Sweden & Finland) 250 aircraft 10,000 vehicles 50,000 personnel Trident Juncture