tada
play

TADA! T opics Algorithmic Data Analysis Jilles Vreeken 24 April - PowerPoint PPT Presentation

TADA! T opics Algorithmic Data Analysis Jilles Vreeken 24 April 2015 Question of the Course What are the hot t topics in data mining that are coo cool*? * and important to know Question of the Course How can we extract no novel kno


  1. TADA! T opics Algorithmic Data Analysis Jilles Vreeken 24 April 2015

  2. Question of the Course What are the hot t topics in data mining that are coo cool*? * and important to know

  3. Question of the Course How can we extract no novel kno knowledge and nd insi nsight from large data?

  4. Organization This is an advanced ced lecture,  with lectures,  and reading,  and assignments. Beware!  this lecture will ill be well-worth its 5 ECTS

  5. I’m I’m not a afraid id!  You will be, you will be. I’m not afraid.

  6. I’m I’m not a afraid id!  You will be, you will be. Yes… I’m not afraid. you will be. You will be.

  7. Organization This is an advanced ced lecture,  with lectures,  and reading,  and assignments. Beware!  this lecture will ill be well-worth its 5 ECTS  a lot of reading, a lot of thinking; it’ll take quite some some effort, but you’ll le learn n a lo lot

  8. Reading Materials We’ll mainly consider scientific articles All will be available on the website  directly accessible from the MPI network,  or using login/password that you can get by email

  9. Lectures Meetings that cover the basic topics  format: ‘sit, listen, shut up interact’ Required reading  announced on website  read at your own convenience but, strongly pref efer erred ed, before the lecture

  10. Exam Type tba  most likely oral Day and place tba  most likely in early August Grading  final grade will be based on final exam and assignments

  11. Assignments: gen enera ral 4 assignments Grading scale: fa fail il, pass, excel cellent ent. You may fail on one assignment  two fails ils and you fail il the course Every excel cellent ent gives 1/3 bo bonus nus poin oint on final exam grade  with maximum of 1 full point You must u must p pass t ss the he fina nal exam t m to pass t ss the he co course

  12. Assignments: requir equirem emen ents To be written in proper academic-style English Us Use proper cit citatio ions  you are given sources  you are encouraged to find additional sources  all sources must be mentioned  pla lagia iaris ism  instant fail il (at best)

  13. Assignments: format mat Return assignment reports as PDF files by email  no .doc(x), .odt, .rtf, .txt, .xml, .html, .pages, .ps, .eps, .etc No page limit!  probably most will need 3 to 5 pages  more is not necessarily better Reports must clearly state on the first page  name, matriculation number, email address and topic

  14. Assignments: returning ng Return assignment reports are to be returned by email  tada@ a@mpi-inf.mpg. g.de de Deadline is on 1400 hours on the stated day  NO NO delays, no excuses, time base on mail time stamp. Submissions that I receive before the DL day I will ACK

  15. Assignments: grading ing Assignments are not for repeating what papers say  perhaps surprisingly, but I have already read the papers. You are expected to cr crit itic ically ly discuss the sources, build connections, point out differences, provide new insights, etc. Some assignments are marked as hard rd  this is because they are  and this will be taken n in into account unt when grading

  16. News & Updates Urgent and personal messages by email  everything else via the website

  17. Question of the Course How can we extract no novel kno knowledge and nd insi nsight from large data?

  18. 1 st st Paradi digm gm: Empir pirical S l Scien ience For thousands of years, science was empir iric ical: describing natural phenome omena

  19. 2 nd nd Paradig igm: Th Theo eoretical l Scien ience The last few hundred years science was theoretical al: used models, generalizations, made predic ictio ions ns

  20. 3 rd rd Paradigm gm: C Computatio iona nal S l Scienc nce The last decades, science was comput utationa nal: complex models sim imul ulating ing complex phenome omena

  21. 4 th th Paradig digm: Da Data-Intensi nsive S Scienc nce Interesting phenomena are too oo compl plex x to come up with good hypotheses. We need to unify theory, experimentation, and simulation capture re data, mi mine ne hypotheses, inspec pect and evaluate, genera erate e extra data to sele lect ct the best ones, iterate itera erative e procedure between wo world and nd mod model, scientist in the middle

  22. Power laws

  23. Sho hopp ppin ing Da Data Which products are often bought toget ether er?

  24. Train in Dela Delays Which trains are delayed because of othe other trains?

  25. Dr Drug Disc Discover ery What part of the molecule makes the drug work?

  26. More patterns than you can shake a stick at

  27. Pattern-based Modelling support vector machin svm associ rule mine nearest neighbor frequent itemset mine naïv bay linear discrimin analysi lda cluster high dimension state art frequent pattern mine algorithm synthet real Mining Algorithm summary of JMLR abstract database

  28. Summaris ising ing Which sales chara racteri rise se your customers?

  29. Summaris ising ing

  30. Jilles Vreeken’s Professional Network as of April 21, 2015 Jilles Vreeken

  31. Go Google gle Flu

  32. Quit uite He e Healt lthy hy

  33. Patient D Dece ceased

  34. Big Big Da Data, Bigg Bigger er Da Data, Big iggest gest Da Data

  35. No model is del is per erfec ect

  36. Scien ience h e has lo lots o s of data, not t the the to tools to to analy lyse se it it

  37. Soci cial Sci cience & e & th the Web

  38. Astronomy my Sloan Sky Su Survey: 100TB between 2000 and 2008 1 billion objects: 260M galaxies, 260M stars non on-trivia ial l analy lysis: currently impossible

  39. With Your Help! Maybe!

Recommend


More recommend