Wrappi ping ng It Up It Up Pauli M li Miettine inen Jill - PowerPoint PPT Presentation

Wrappi ping ng It Up It Up Pauli M li Miettine inen Jill illes V s Vreeken 24 24 Ju July 2014 2014 (TAD ADA) A)

Wha hat did did we do we do? Introduction Tensors Information Theory Mixed Grill Wrap-up + < ask-us-anything>

T ake Home: ove overa rall Overview of the hot topics in data mining that Pauli and Jilles think are cool strongly biased sample – by interest and available time We wanted to give a general picture of what data mining is, what makes it special, and what’s currently happening at the edge of human knowledge

Key T y T ake ke-Ho Home Messa e Message Data mining is descriptive not predictive the goal is to give you insight into your data, to offer (parts of) candidate hypotheses, what you do with those is up to you.

T ake Home: T ensors enso Multi-way extensions of matrices Anything you can do with matrices you can do with tensors… …only harder …and taking into account multi-way relationships

T ake Home: Dec Decompo posit itio ions Different tensor decompositions reveal different types of patterns The choice of correct decomposition must be based on application’s needs ; there’s no golden bullet

T ake Home: In Informatio ion Th Theo eory Exploratory data analysis wandering around your data, looking for interesting things, without being asked questions you cannot know the answer of. Questions like: What distribution should we assume? How many clusters/factors/patterns do you want? Please parameterize this Bayesian network?

T ake Home: Interest stingne ness ss Interestingness is ultimately subjective Still, to have algorithms that can find potentially interesting things we somehow need to formalize it

T ake Home: In Informatio ion Th Theo eory Information Theory is a branch of statistics, concerned with measuring information information = reduction of uncertainty Uncertainty can be quantified in bits Everything new you learn about your data allows you to compress it better

T ake Home: MDL MDL The Minimum Description Length (MDL) principle given a set of models  , the best model M ∊  is that M that minimizes in which is the length, in bits, of the description of M is the length, in bits, of the description of the data when encoded using M

T ake Home: Ma Maxim ximum E Entropy The principle of Maximum Entropy given a set of testable statistics 𝐶 , the best distribution 𝑞 ∗ is that 𝑞 that satisfies while maximizing 𝑞 ∗ is the mos most uniform, le least biased distribution that corresponds with belief set 𝐶 it models yo your expectation – assuming you use 𝐶 optimally

T ake Home: Gr Graph ph Min Mining ing Most graph mining approaches are global and predictive ‘Explain everything in one go’ real graphs are too complex for that Taking a local and descriptive approach allows for more detailed results, richer problems, easier formalization, efficient solutions very little done so far, many cool open problems

T ake Home: Red edesc escrip iptio ions Redescriptions explain the same thing many times Emerging topic that has not yet fully broken into the data mining canon Can be seen as translation within a dataset

T ake Home: Dy Dynamic Da Data Data is rarely static even though many algorithms expect that Streaming algorithms work when data is too big to fit anywhere while dynamic algorithms aim to adjust the answer with the changing data

T ake Home: Assign Assignmen ents “What the hell where they thinking??” We wanted you to learn to read scientific papers without getting lost in details quickly forming high level pictures of complex ideas read critically , seeing through scientific sales-pitches show independent thinking , make ideas your own We were not disappointed.

T ake Home: TA TADA Data analysis is important, upcoming , but still very young aims to tackle impossible problems , such as finding interesting things in enormous search spaces is a weird mix of theory and practice: likes to be foundational , yet not afraid of ad hoc and, not unimportant, it’s lots of fun.

Exam d date tes The Exam type: oral when: September 11 th time: individual where: E1.3 room 0.16 what: all material discussed in the lectures, plus one assignment (your choice) per topic The Re-Exam type: oral when: October 1 st time: individual where: E1.3 room 001

Evaluation: I did I did n not lik like “Slides are not detailed enough for revision”

Evaluation: Sugge gest stions ns “More ways for discussing assignment solution” More ways for understanding the suggestion? “Bit heavy course for 5 ECTS“ Yes. “More details for practical stuff, like how and why” Maybe. Maybe not here. “More lectures with both lecturers” Really?

Th Things ings t to do Master thesis projects  in principle: yes!  in practice: depending background, motivation, interests, and grades --- plus, on whether we have time  interested? mail Pauli and/or Jilles Student Research Assistant (HiWi) positions  in principle: maybe!  in practice: depends on background, grades, and in particular your motivation and interests  interested? mail Jilles and/or Pauli, include CV and grades

Sample T Sam T op opics – JV JV Graphs Causality - characterising viruses - did X cause Y? - realistic graph generators - mining causal graphs - mining interesting sub graphs - what’s the cause of this ? - patterns in tweets - predicting the future Useful Patterns Rich Data & Text - the Difference & the Norm - pattern-based topic models - privacy & data generation - grammar & compression - pattern-based indexing - rich MaxEnt modelling - noise reduction - outliers in rich data

Sample T Sam T op opics – PM PM Matrices Tensors – tropical algebras – new decompositions – Boolean algebras – efficient algorithms – efficient algorithms – applications – good applications Theory Redescriptions – approximability – new algorithms – computational complexity – new applications – practical results – new formulations – DM motivated

Go Good r d reads s – PM Understanding Complex Datasets Matrix Computations Mining of Massive Datasets D. Skillicorn G.H. Golub & C. Van Loan Rajaraman, Lescovec & Ullman (light reading on matrix and tensor decomps.) (anything-but-light, reference book) (work-in-progress textbook)

Go Good R d Rea eads ds – JV JV Data Analysis: a Bayesian Tutorial Elements of Information Theory The Information D.S. Sivia & J. Skilling Thomas Cover & Joy Thomas James Gleick (very good, but skip the MaxEnt stuff) (very good textbook) (great light reading)

T each u ea us s Mo More! e! Well, ok… but, we are still thinking what/if to teach next semester. Options include: Information Theory (regular course – JV ) Mining and Using Patterns (seminar/discussion – JV ) Causal Inference (seminar/discussion – JV ) Tensor Methods (seminar/discussion – PM) Redescription Mining (seminar/discussion – PM) Fixing It (or, Reproducible Science) (seminar/practical – PM&JV ) Data Mining Lab (practical – PM&JV )

Algo Algorit ithmic ic Da Data An Analy lysis is Group …coming soon… a joint-venture of the MPI groups on Data Mining and Exploratory Data Analysis. ada.mpi-inf.mpg.de We’ll include announcements of relevant talks and events, and cool new work by yours truly (maybe even mailing list)

Quest uestio ion Tim Time! e!

Priv Pr ivac acy & Da Data Mining a Mining “What is your opinion on privacy preserving data mining? Have you ever worked with it? Do you think it is useful, or does it somehow contradicts 'the spirit' of data mining?”

T ext ext Mining Mining “Have you ever worked with text mining? Do you think considering grammar is necessary, or is mere statistics enough?”

Big Da Big Data “Does Big Data exist?” “How big is Big Data?” “When is the data Big enough? Is more data always better?”

Min Mining ing Ma Massiv ssive Da e Data Map Reduce, Hadoop, Big Table, Cassandra, Spark, Dremel, etc, etc engineering or science ? Essentially tricks – not magic – that work well for certain specific problems For KDD 2014, at least 25 out of 150 presentations will be specifically aimed at ‘large scale’ stuff

Min Mining t ing the he Clo loud ud “How about data analytics in the cloud?”

Social N l Net etwo work An Analysi sis Many, many, many papers about social network analysis So far: lots of statistics, not much ‘mining’ That is, most are about how to model a graph probabilistically , how to fit a given distribution . The Elephant in the Room: what is the ‘graph’ distribution? Nobody knows. Yet.

Gr Graph ph Min Mining ing This is the part where Pauli and Jilles may or may not say something about graphs.

Yo Your Quest uestio ion Here!

Conclusi sions This concludes TADA’14. We hope you enjoyed the ride.

Thank you! This concludes TADA’14. We hope you enjoyed the ride.

Wrappi ping ng It Up It Up Pauli M li Miettine inen Jill - PowerPoint PPT Presentation

Wrappi ping ng It Up It Up Pauli M li Miettine inen Jill illes V s Vreeken 24 24 Ju July 2014 2014 (TAD ADA) A) Wha hat did did we do we do? Introduction Tensors Information Theory Mixed Grill Wrap-up + < ask-us-anything>

Some Ping Examples ping -c4 www.linuxjournal.com ping -c2 -b 149.153.100.255 ping -c2 224.0.0.2

Ping Ans AI + Financial Service Mei Han Director of Ping An Technology, US Research Labs Talk

Session 2 Dum um ping and Ant nt idum um ping Measures Prepared by Wenguo Cai Director, I

PING-ing AS A STRATEGY jane hamill 1 What is PING-ing? 2 Lets assuming youve done your

The Ping An story Joe Ye TIAN, Deputy Managing Director, Ping An International Financial Leasing

Session 3 Dum um ping and Ant nt i-du dum pi ping g Measures Prepared by Peter Clark

Creating and optimizing the digital technology in tourism business Alex Lee Yun Ping (CEO) Ping

Time-to-Live TLV for LSP-Ping draft-ietf-mpls-lsp-ping-ttl-tlv-01 Sami Boutros

Introduction Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

Ping An International Financial Leasing Aviation Business Unit 20 February 2017 1 CONTENTS 1.

Peer-to-Peer Networks 15 Self-Organization Christian Schindelhauer Technical Faculty

Compressed Counting Ping Li Department of Statistical Science Faculty of Computing and

Sup.py ping like functionality for higher up the stack chasemp@gmail.com

Va ping Pre pa re d b y De b o ra h Ha g le r MD FAAP Pre sid e nt E le c t Ma ine Cha pte r

Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

We lc o me to the NRF C We b ina r Ke e ping the Door s Ope n: Sustainability T ips for F

Generalization Error Analysis of Quantized Compressive Learning Xiaoyun Li Ping Li Department

Upper bounds on the size of 4 - and 6 -cycle-free subgraphs of the hypercube Ping Hu Joint work

Ch06. Introduction to Statistical Inference Ping Yu Faculty of Business and Economics The

Least Squares Estimation- Large-Sample Properties Ping Yu School of Economics and Finance The

Additional Topics on Linear Regression Ping Yu School of Economics and Finance The University of

Since 1957, Made in Italy Quality and Innovation in Decorative Laminates ABET LAMINATI PING PONG

Jum ping around in Blaise I S Maurice Martens m.g.j.martens@uvt.nl CentERdata 10/ 17/ 2013 Jum

Passpet Convenient Password Management and Phishing Protection Ka-Ping Yee Kragen Sitaker