Leveraging Artificial Intelligence and Big Data to Create Value Dr. Sudha Ram Director, INSITE Center for Business Intelligence and Analytics Anheuser-Busch Professor of MIS, Entrepreneurship & Innovation Professor of Computer Science Eller College of Management Email: ram@eller.arizona.edu August 19, 2020 EROSS-2020
BIG DATA: From Petabytes to ZettaBytes 2
Meaning of “BIG”
Meaning of “BIG”
Big Data – Traditionally Defined VOLUME VARIETY VELOCITY VERACITY VALUE 5
Diverse Sources of Data Many Different Sources generating Data
An Internet Minute
PARADIGM SHIFT Sensors embedded in “Datafication” of the Physical Objects PARADIGM SHIFT! IP Protocol based world communication
Health Internet of Things
Paradigm Shift Billions of Users and “Laboratory” for Objects Temporal and Spatial understanding the pulse Leaving Massive Traces of Dimensions of humanity Activity
QUE QUEST f for the he HOL HOLY GR GRAIL Predicting the Future 12
INSITE Center for Business Intelligence and Analytics • Interdisciplinary Research Center at University of Arizona • www.insiteua.org 13
Creating a Smarter/Better World • Data Science and Network Science • Visualizations Using Time and Space • Scalable techniques for network analysis and graph mining • Predictive Modeling • Train students in Data science • Work on interesting research projects with industry partners to solve real world problems 14
RESEARCH PROJECTS • Health Care • Education • News Media/Journalism SOCIAL • Crowdfunding IMPLICATIONS • Crowdsourcing • Internet of Things and Wearable devices • Social Media 15
Leveraging Data Science • Define a problem/challenge • Identify signals • Use data science methods • Solve the problem Repurposing Data is Key 16
PREDICTION MODELS Predict Emergency Department Visits in near Real Time Using Big Data Freshman Retention Prediction COVID-19 Research 17
Leverage Big data Big Data not just about volume • Social media • Internet search • Environmental sensors • Wearable sensors • Spatial and Temporal Dimensions • Fine Grained - Spatial/Temporal 18
Focus on Asthma • 25 million people affected in the United States • 2 million emergency department (ED) visits • 0.5 million hospitalizations • 3,500 deaths • 50 billion dollars in medical costs annually • 11 million missed school days every year • 14 million missed work days every year Source: CDC Reports (2011, 2012) 19
Pediatric asthma ER Visits, USA, 2011 20
Our Research Objective Develop Robust Models to predict Asthma Related Emergency Department Visits in near Real Time Using Big Data Partner: Parkland Center for Clinical Innovation Joint work with Wenli Zhang, Dr. Yolande Pengetenze, Max Williams, funded in part by Parkland Center for Clinical Innovation 21
Leverage Big data Big Data not just about volume • Social media • Internet search • Environmental sensors • Wearable sensors • Spatial and Temporal Dimensions • Fine Grained - Spatial/Temporal 22
EXTRACTING SIGNAL from Noisy Data True asthma related tweets Not actually related to asthma 23
Asthma Related Tweets 24
Asthma Related Tweets 25
Asthma Keywords Asthma Inhaler Wheezing Sneezing Runny Nose 26
Asthma Keywords Asthma Inhaler Wheezing Sneezing Runny Nose 27
Asthma-Related Stream Twitter Asthma Stream - United States Asthma related tweets, United States, (Asthma stream, 11 Oct, 2013 – 31 Dec, 2013) 28
Extracting Signals Distinguish tweets that are relevant to asthma from tweets that mentioned asthma in an irrelevant context. 1. Tweets indicating awareness of disease, E.G., “Hope I don’t get an asthma attack again today..” 2. Using disease as rhetoric, e.G., “He is so cute I think I got asthma” 29
Emergency Room Visits and Tweets 30
Air Quality Sensor Data • Identify and include AQI data from a specific geographic region. • Collected pollution data from 27 air quality sites around the Dallas area. • Selected sites closest to the zip codes of the ED asthma patients in our ED visits dataset. Using this data, we calculated daily average AQI for our model. 31
Pollutants • CO : Carbon monoxide • NO2 : Nitrogen dioxide • O3 : Ozone • Pb : Lead • PM2.5 : Atmospheric particulate matter, diameter of 2.5 micrometres or less • PM10 : Atmospheric particulate matter, diameter of 10 micrometres or less • SO : Sulfur monoxide 32
EPA Pollution Sensor Data and Emergency Visits 33
Prediction Models Using Streaming Data • Air Quality Sensor data streams • Tweets • Google Trends search data • Machine Learning Techniques to predict number of ED visits per day with high accuracy 34
Best Predictors Successfully predicted with 80% accuracy • # of asthma tweets • CO • NO 2 • PM2.5 35
USEFUL for Public Health NOTIFICATION I. Epidemiologic surveillance of asthma disease activity in the community, e.g., the department of health and human services (DHHS) II. Stakeholders notifications of community-level asthma- disease activity and risk factors 36
Hospital/ED Preparedness Predicting asthma ED visits and staffing ED consequently 37
Targeted Patient Interventions Targeted patient interventions using patient address and geo-localization data for tweets. E.g., patient alerts about asthma risks and counseling for preventive methods. 38
Contributions Promising Results Demonstrate the utility and value of linking big data from diverse sources in developing predictive models for non-communicable diseases Specific focus on asthma Relevant for other chronic conditions – Diabetes, Cardiac problems, Obesity 39
Internet of Things and Big Data Big Data for Improving Education Internet of Things: Smart Cards, Wifi Logs, Mobile Apps 40
BUILDING A SMARTER CAMPUS 41
Combining Network Science and Machine Learning Societal Challenge: Student Retention Proactive Prediction is very Important Social Science theories indicate: • Social Interactions • Regularity of Routine 42
Objective Predict freshman retention at individual level Make proactive prediction before knowing first term GPA Learn students’ behavioral patterns from their CatCard transactions Provide actionable suggestions for retention management
BIG DATA Institutional Student Dataset ~ 7000 full-time registered freshmen, 6500 are left after removing international students for whom SAT scores or high school GPAs were not available 479 (7.37%) drop-out after Fall and 843 (12.98%) drop-out at the end of Spring SmartCard Transaction Dataset 1.8 million transactions made by freshmen from Aug 2012 thru May 2013 271 different locations include restaurants, vending machines, printers, parking, labs.
Behavior and Interactions
Patterns and Differences 46
Movement and Behavior
COMPUTATIONAL and NETWORK SCIENCE APPROACH Fills gaps in behavioral and extant data-driven approaches New prediction approach CatCard transactions implicit social networks and spatial sequences Proactive prediction Predicting retention before the end of 1 st semester with 90% recall
COVID-19 Related Research Projects 49
What is Contact Tracing? Digital vs. Manual Methods Three Different methods a. Manual contact Tracing b. Manual with Digital assistance from Prompted Mobility Pathway aka Memory Jogger c. Digital: BlueTooth App for exposure notification 50
51
Memory Jogger using Wifi Logs
Working with Jeremy Frumkin, Research and Discovery Technologies Using Wifi network logs with Catcard data to support strategic efforts related to congestion tracking on campus and managing campus foot traffic Understanding Movement Patterns among Campus spaces Complementing app-based and manual contact tracing efforts with the additional insights that can be gained through the wifi logs. Design a Memory Jogger – prompted Mobility pathway tool to enhance manual contact tracing 53
Traffic/Crowd Analysis Select Date: Feb 3, 2020 Time 8 am-9 am Building Traffic on campus between 8am and 9am Top ten traffic spots visualized and compared with selected building (in red) User types Comparison of hourly Traffic in selected building
To compare the three methods for Contact Tracing and Exposure notification. How do the three contact tracing approaches differ in their outcomes such as timeliness and coverage of contacts and other metrics? How do these methods complement each other and what are their relative strengths and weaknesses? How do these methods perform overall in preserving privacy while allowing for comprehensive contact tracing? What are the tradeoffs? How acceptable are these three strategies to the community and what is an effective path to deploying comprehensive contact tracing? 55
Recommend
More recommend