DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg - PowerPoint PPT Presentation

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg Department of Statistics & Center for Automated Learning and Discovery Carnegie Mellon University Pittsburgh, PA, U.S.A. 1

Some Integrative Themes • Integrating diverse data sources • Privacy/confidentiality • Data across time and space • Signal detection and setting cutoffs • Datamining to the rescue? • Models and methods of inference 2

Integrating Diverse Data Sources • Public health data/non-traditional data – Grocery store sales – Pharmacy sales – School attendance records • Matching records/identifiers? – Fellegi–Sunter and modern Bayesian embellishments – Capture-recapture methods for estimating population totals of exposure and infection 3

What Do Following Populations Have in Common? • People in the U.S. • Fish • Penguins • People infected with HIV virus • Homeless • Prostitutes in • Adolescent Glasgow injuries in • Italians with Pittsburgh, PA diabetes • WWW • Atrocities in Kosovo 4

Multiple List Data for Query 140 Northern Light n =159 yes no Lycos Lycos yes no yes no HotBot HotBot HotBot HotBot yes no yes no yes no yes no yes 1 0 2 0 0 0 1 0 yes Excite no 2 0 3 2 0 0 0 2 yes Infoseek yes 1 0 2 1 0 0 3 4 no Excite no 1 3 0 8 2 0 3 19 AltaVista yes 0 0 0 1 0 0 0 0 yes Excite no 0 0 1 1 0 0 5 4 no Infoseek yes 0 0 0 1 0 0 4 22 no Excite 5 no 0 0 7 17 2 3 31 ?

Simple Models Often Work • Let the y ij ’s be independent r.v.’s, with p i j = Pr { y ij = 1} for page i observed in list j , where log { p ij /( 1- p ij ) } = θ i + β j i = 1, 2, . . . , N; j = 1, 2 , . . . k . • If we take into account individual heterogeneity represented by { θ i }, samples are “independent.” 6

Posterior Distribution of N for Query 140 n = 159 Q1,Q3 Median 0.0015 n Observed GL* GL* Average = 165 0.0010 GL* Max = 322 0.0005 0.0000 0 500 1000 1500 2000 2500 N 7

Privacy/Confidentiality • Matching records raises major issues of privacy and confidentiality – Can we integrate sources without identifiers? – Role of intermediaries for linkage and then application of disclosure limitation methods 8

Conceptual Confidentiality Kernel Confidentiality Checks: I Data Users Data Merger (record linkage) Data Disclosure Sources Detection/Warning Risk Low ? Kernel Confidentiality Checks: II 9

Time and Space • Recording timing of occurrence of events is crucial component of data • Data result in multivariate time series or point processes for events/purchases/reports – Multiple products purchased – Doctors visits – School absences • Spatial information makes data sparser • Crude counts versus individual records 10

Supermarket Sales Records All Products 50,000 … Dairy Health & Beauty Produce 2,050 Analgesics Cough & Cold Stomach 650 850 550 11

Confounding Natural Periodicities 12

Signal Detection • Adverse events � Discovery of cause – e.g., detecting signature of outbreak in response to anthrax attack – What about alternative explanations? 13

Setting Detection Cutoffs • Fixed thresholds? • Tradeoff between false positives and false negatives • Nature of followup? – Back to privacy issues again 14

What Are We Looking For? • Anticipating specific problems, e.g., in response to smallpox vaccination campaign • Surveillance systems to measure everything 15

Datamining to the Rescue? • Bad News : – For broad based screening and surveillance, p>>n and we encounter curse of dimensionality – Model selection on large numbers of features has major problems • Good News : – For prediction we may be willing to settle for black box (or at least gray box) predictions – Datamining methods may turn out to be useful here but jury is out 16

Models and Inference Methods • Black box approaches (including simple “robust” methods) versus models for underlying phenomena • Frequentist vs. Bayesian methods – Specifying likelihood is hard – Picking priors based on real information or for smoothing is relatively easy • First get statistical tools that work, and then figure out how to move them into the field or to approximate 17

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg - PowerPoint PPT Presentation

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg Department of Statistics & Center for Automated Learning and Discovery Carnegie Mellon University Pittsburgh, PA, U.S.A. 1 Some Integrative Themes Integrating diverse data

Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing

OPENING V.1 OPENING V.2 - for improvisation OPENING V.3 OPENING V.4 OPENING V.5

Algorithmic Mathematical Art: Special Cases and Their Applications May 11 - 13, 2009 DIMACS

Sunday, Dec. 3, 5:00 - 5:30 pm Opening Comments and Dedication of CRWAD 2017 - Chicago Ballroom,

DIMACS workshop Immuno-epidemiology Closing remarks 1 Observations We had an interesting and

DIMACS Workshop on Facing the Themes Invited participants Challenges of Infectious Diseases in

8:10 Welcome and opening comments - Joe Nunez 8:20 Welcoming comments by - Bakersfield Mayor

USAS-R USAS-R Fiscal Year-End Closing Procedures 2019 Pre-Closing Procedures The following

Post-Closing Indemnity Negotiating and Structuring Closing Conditions, Termination Rights and

Sales Closing System Proven Process To Consistently Close SEO Prospects For High Monthly Prices

Closing the Circle project goal: restore entire area as sustainable nature Closing the Circle

Appendix C-4 Questions/comments received after the conference In closing the meeting the European

Closing the Loop: Connecting Closing the Loop: Connecting 2. Apply a three part framework to

Budget 2018-19 Introductions Opening Comments Community Feedback AGENDA A Review

A Wordcount Approach to Assessing the Moral Color of Old & New Media John Voiklis DIMACS

Comments to CASAC on Comments to CASAC on Draft #2 of the SO 2 Risk and Exposure Assessment

CS490W Web Search (I) Luo Si Department of Computer Science Purdue University Slides from

Multiprocessors and Thread-Level Parallelism 1 MO401 Tpicos IC-UNICAMP Centralized

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

CSE 373: Analysis of Algorithms Topic: Reinventing search engines using Tries Nov 03, 2003

Lecture 14 HCI History Mark Woehrer CS 3053 - Human-Computer Interaction Computer Science

Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 ,

Overview Agenda Architecture of search on the web including an overview of Crawling,

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg - PowerPoint PPT Presentation

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg Department of Statistics & Center for Automated Learning and Discovery Carnegie Mellon University Pittsburgh, PA, U.S.A. 1 Some Integrative Themes Integrating diverse data

Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing

OPENING V.1 OPENING V.2 - for improvisation OPENING V.3 OPENING V.4 OPENING V.5

Algorithmic Mathematical Art: Special Cases and Their Applications May 11 - 13, 2009 DIMACS

Sunday, Dec. 3, 5:00 - 5:30 pm Opening Comments and Dedication of CRWAD 2017 - Chicago Ballroom,

DIMACS workshop Immuno-epidemiology Closing remarks 1 Observations We had an interesting and

DIMACS Workshop on Facing the Themes Invited participants Challenges of Infectious Diseases in

8:10 Welcome and opening comments - Joe Nunez 8:20 Welcoming comments by - Bakersfield Mayor

USAS-R USAS-R Fiscal Year-End Closing Procedures 2019 Pre-Closing Procedures The following

Post-Closing Indemnity Negotiating and Structuring Closing Conditions, Termination Rights and

Sales Closing System Proven Process To Consistently Close SEO Prospects For High Monthly Prices

Closing the Circle project goal: restore entire area as sustainable nature Closing the Circle

Appendix C-4 Questions/comments received after the conference In closing the meeting the European

Closing the Loop: Connecting Closing the Loop: Connecting 2. Apply a three part framework to

Budget 2018-19 Introductions Opening Comments Community Feedback AGENDA A Review

A Wordcount Approach to Assessing the Moral Color of Old &amp; New Media John Voiklis DIMACS

Comments to CASAC on Comments to CASAC on Draft #2 of the SO 2 Risk and Exposure Assessment

CS490W Web Search (I) Luo Si Department of Computer Science Purdue University Slides from

Multiprocessors and Thread-Level Parallelism 1 MO401 Tpicos IC-UNICAMP Centralized

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

CSE 373: Analysis of Algorithms Topic: Reinventing search engines using Tries Nov 03, 2003

Lecture 14 HCI History Mark Woehrer CS 3053 - Human-Computer Interaction Computer Science

Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 ,

Overview Agenda Architecture of search on the web including an overview of Crawling,

A Wordcount Approach to Assessing the Moral Color of Old & New Media John Voiklis DIMACS