Nine Credit-Point Core Lecture on Information Retrieval & Data Mining Tuesday 14–16 & Thursday 16–18 @ HS003 (E1.3) Martin Theobald Pauli Miettinen Data Mining is… • About finding new and interesting information from data - Association rules - Clusterings - Latent models - Classifiers IR&DM, WS'11/12 18 October 2011
Data Mining — motivation What to do with the information you’ve retrieved? The ”PHT” Pirate wanted all information of the world. But before he realized most of it was useless, he was already buried under it. —Stanis ł aw Lem, The Cyberiad IR&DM, WS'11/12 18 October 2011 2
Data Mining — definition Data mining is the process of extracting hidden patterns from data. — Wikipedia Data mining, in a broad sense, is the set of techniques for analyzing and understanding data. —Zaki & Meira: Fundamentals of Data Mining Algorithms Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. —Hand, Mannila & Smyth: Principles of Data Mining IR&DM, WS'11/12 18 October 2011 3
Data Mining — definition Data mining, in a broad sense, is the set of techniques for analyzing and understanding data . —Zaki & Meira: Fundamentals of Data Mining Algorithms IR&DM, WS'11/12 18 October 2011 4
Data Mining Applications IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence – What customers buy together? IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications – How to make more money? $? • Business intelligence – What customers buy together? – What are the seasonal trends? IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? – What happens if average temperature raises? IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? – What happens if average temperature raises? • And anything else where you have data… IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? – What happens if average temperature raises? • And anything else where you have data… – Who Barack Obama should persuade to vote him? IR&DM, WS'11/12 18 October 2011 5
Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? – What happens if average temperature raises? • And anything else where you have data… – Who Barack Obama should persuade to vote him? – Is there a problem in International Space Station? IR&DM, WS'11/12 18 October 2011 5
What do You need to do Data Mining • Data • Domain knowledge • Data mining techniques IR&DM, WS'11/12 18 October 2011 6
What do You need to do Data Mining • Data • Domain knowledge • Data mining techniques This course IR&DM, WS'11/12 18 October 2011 6
The Techniques • Frequent itemset mining & association rules • Clustering • Dimensionality reduction • Matrix factorization & latent factor models • Classifiers IR&DM, WS'11/12 18 October 2011 7
Frequent itemset mining demo IR&DM, WS'11/12 18 October 2011 8
Clustering for Medical Data • Temperament data – Individuals are assigned values on different scales • Fear of uncertainity, shyness, impulsiveness, etc. – Data is clustered (people with similar value combinations go to same cluster) – Results: • 4 clusters are enough • strong association between temperament and socio-economic status and education • males and females cluster similarly, even if clustered independently Wessman: Clustering methods in the Analysis of Complex Diseases, manuscirpt IR&DM, WS'11/12 18 October 2011 9
Clustering for Medical Data Stable, persistent, not very impulsive High socio-economical status and education IR&DM, WS'11/12 18 October 2011 10
Clustering for Medical Data Outgoing, impulsive, energetic High socio-economical status and education IR&DM, WS'11/12 18 October 2011 10
Clustering for Medical Data No extreme scales High hypomania and psychosis proneness IR&DM, WS'11/12 18 October 2011 10
Clustering for Medical Data Shy, pessimistic, prefer routines and privacy Low socio-economic status, high levels of depression and schizophrenia IR&DM, WS'11/12 18 October 2011 10
Ecological Niche Modeling • Goal: Describe the area species inhabit using bio- ecological variables – Temperature, rainfall, etc. • Application: Forecast what happens to species if bio- ecological environment changes – Consequences of global warming • Data Mining Problem: classification – Classify the areas inhabited by species using the bio- ecological variables IR&DM, WS'11/12 18 October 2011 11
Ecological Niche Modeling • Either - February’s max temperature is between -9.8°C and 0.4°C - July’s max temperature is between 12.2°C and 24.6°C - August’s average rainfall is European Elk between 56.85 mm and 136.46 mm • Or - September’s average rainfall is between 183.27 mm and 238.78 mm Galbrun & Miettinen: From Black and White to Full Colour: Extending Redescription Mining Outside the Boolean World. SDM ’11 IR&DM, WS'11/12 18 October 2011 12
Ecological Niche Modeling IR&DM, WS'11/12 18 October 2011 13
Recommend
More recommend