Process & Methodology ( Quantitative ) Frequency analysis Thematic seasonality analysis Lexicon: richness & ProQuest Newsstand Hierarchical clustering complexity: describe Search filter on: Probabilistic topic models statistics on # of words, NYT, WSJ, FT Statistical associations types of words, and # of “analytics” sentences Words in context 2004-2015 Document Similarities: Named Entity Recognition à 8102 articles cosine distance Corpus News Text analytics & evaluation of (NYT, Text pre- Natural Features and News Sampled readability, WS, FT processing Language Entities (NYT, Corpus complexity, and Processing WS, FT) lexical diversity Stop words: syntax vs Random semantic words sample with Stemming: words in its root 33% form stratification DTM: document term matrix à 2352 articles representation of the corpus Sparsity : handling on zeros in DTM J.Bonilla | PhD Defense |12/08/2016 Slide #22
Case: “analytics”+“CUSP” à Corpus: (202 articles; years 2011-15) J.Bonilla | PhD Defense |12/08/2016 Slide #23
Case: “CUSP”& “analytics” à Corpus: (202 articles; years 2011-15) Frequency Analysis Words for analysis J.Bonilla | PhD Defense |12/08/2016 Slide #24
Case: “CUSP”& “analytics” à Corpus: (202 articles; years 2011-15) Hierarchical clustering What is this corpus about? J.Bonilla | PhD Defense |12/08/2016 Slide #25
Case: “CUSP”& “analytics” à Corpus: (202 articles; years 2011-15) Hierarchical clustering What is this corpus about? center, technology, nyu univer brooklyn research new s program, urban, inform institute york will one, work, innov scienc citi school Include, progres, appli engin J.Bonilla | PhD Defense |12/08/2016 Slide #26
Case: “CUSP”& “analytics” à Corpus: (202 articles; years 2011-15) Probabilistic topic models Hidden patterns & emerging themes J.Bonilla | PhD Defense |12/08/2016 Slide #27
Case: “CUSP”& “analytics” à Corpus: (202 articles; years 2011-15) “director”, “ koonin ”, “president”, “sexton”, “faculty”, “people”, “researchers”, “professor”, “student”, “leaders” à HR ” cyber ”, “tech”, “app”, “mobile”, “online”, “energy”, data”, ”climate”, “computer”, “wireless”, “campus” à INFRASTRUCTURE “private”, “partnership”, ” entrepreneurship ” à GOVERNANCE Probabilistic topic models Hidden patterns & emerging themes J.Bonilla | PhD Defense |12/08/2016 Slide #28
Recommend
More recommend