Research paper Project proposal Austin Wilson Roberto Campos Isaac Shah
Economic impact of epidemics and pandemics
Market losses! https://www.europarl.europa.eu/RegData/etudes/BRIE/2020/646195/EPRS_BRI(2020)646195_EN.pdf Market losses from a pandemic could be up to $500 billion Lower-middle income countries are impacted more than high income countries
Industries affected Healthcare industry sees a huge spike in costs when a pandemic occurs. Also insurance industry because of people going to the doctor
Industries affected Agricultural industry is adversely impacted. ● In developed countries the agriculture industry is incentivized to prioritize spending on reducing infectious disease prevention. ● In less developed countries agricultural companies are not incentivized to spend to reduce infectious disease ● Some of these less developed countries may cause an infectious disease outbreak, the result being travel and trade isolation
Travel industry ● People do not want to travel to places where the disease is running rampid ● People don’t want to be on planes or ships where they think there might be an outbreak ● Estimated $2.8 billion loss to Mexican travel industry from H1N1
Time Series Data Mining by Phillippe Esling ● Data representation: how can time series be represented, what is the shape? Similarity measurement: how do we compare two time series objects ● Indexing method: how can we speed up query time for big data? ●
Clustering ● Whole series clustering tries to maximize the distance between different clusters while also maximizing the variance within each cluster We can also use subsequence clustering where we try to subset a single ● time series into different clusters ● Classification is similar to whole series clustering where we are given sets of time series and a label for each set, the task is to train a classifier to label new time series
Segmentation ● Create an accurate approximation while reducing dimensionality of the time series Want to keep the essential features and drop redundant or uninsightful ● features
Piecewise linear approximation ● One of the most successful approaches of segmentation over the years ● Try and split the time series up into segments Fit individual polynomial or linear cures to each segment ● Slicing windows ● ○ Keep growing a window until it exceeds an error threshold Top-down ● ○ Recursively partition a data set until some stopping criteria is met ● Bottom-up Start from the finest segments and iteratively merge segments ○
Data-adaptive vs non-data-adaptive vs model-based ● Data-adaptive: parameters are modified based on the values of consecutive segments Non-data-adaptive: parameters of transformation remain the same for ● every series ● Model-based: assume the time series has been produced by an underlying model and find the parameters of the model
Data COVID - Detailed Novel Corona Virus 2019 Dataset COVID - South Korea https://www.kaggle.com/kimjihoo/coronavirusdataset Stocks https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs .
Research: Isaac S. Task Perform + Study past events and find dataset’s that we can use to analyze the initial Task Division problems in past situations. Locate when the problem first initiated, when the situation plateaued and when the situation returned to normal. Tableau: Isaac Visualization will be done before choosing which models to work with. Try to find trends that are visible. Seek patterns and similarities between events. Try to map each case in a US Map and find if there is a correlation between its performance in the stock market. Modeling: Roberto Austin Explore which types of models can be used to solve each problem. For example, should we use linear regression vs logistic regression, can we find which variables are important. Is a fully connected neural network a useful method for the problem we are currently analyzing. Should we use CNN to identify important features. Can we use SVM to categorize the different events from the past and categorize the current event COVID-19. Data Pre-Processing: Roberto Austin Data pre-processing will play an important role. We have to analyze the types of data we will be inputting into which model. Different models require different processes.
Tools Jupyter Interactive notebook to visually present our models in detail. Python Our language of choice to pre-process data and create ML models. We are interested in using ANN or CNN for our model. We will also consider simple linear or log-linear models as well. R Used in support of Python as R is a great statistical tool that provides statistical inference. It can help us mathematically prove that there is a correlation between that which we seek to answer. Tableau A visualization tool that is versatile and creates a custom robust graph.
Initial design/case study/prototype/ experiments Progress + With the expertise of the team combined, we will be able to analyze and seek Data that can help us answer our problem statement. Once the data is gathered quick Experience visualizations will be rendered to further gain insights. All three members of the team have extensive knowledge of Tableau. Models can be easily prototyped with the use of Sklearn and Tensorflow libraries. Two Members of the team have experience using these libraries and have access to consulting outside of the classroom. Progress milestones what will be completed by week 11 and 14 By week 11 and 14, the team will have developed visual aids and prototype models to begin refining and preparing to approach specific details that will need to be specifically taken care of. For example, increasing the accuracy of our model. Experience Modeling with Sklearn and Tensor Flow Modeling with R Data Pre-processing Tableau Google Colab for Big Data
sources Economic impact of epidemics/pandemics https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6491983/ Time Series Data Mining https://www.researchgate.net/publication/261722458_Time-Series_Data_Mining
Thanks!
Recommend
More recommend