advanced analytics in business d0s07a big data platforms
play

Advanced Analytics in Business [D0S07a] Big Data Platforms & - PowerPoint PPT Presentation

Advanced Analytics in Business [D0S07a] Big Data Platforms & Technologies [D0S06a] Setting the Scene The Data Science Process Supervised and Unsupervised Learning Introduction Overview Setting the scene Data scientists Data quality The


  1. Advanced Analytics in Business [D0S07a] Big Data Platforms & Technologies [D0S06a] Setting the Scene The Data Science Process Supervised and Unsupervised Learning Introduction

  2. Overview Setting the scene Data scientists Data quality The analytics process model Predictive versus descriptive analytics Example applications 2

  3. Setting the scene 3

  4. Living in a data flooded world “ DeepMind’s AI became a superhuman chess player in a few hours, just for fun The descendant of DeepMind’s world champion Go program stretches its muscles in a new “ domain 2015 https://deepmind.com/blog/alphago-zero-learning-scratch/ 4

  5. Living in a data flooded world 2017 https://deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/ 5

  6. Living in a data flooded world 2019 https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/ 6

  7. Living in a data flooded world http://www.nvidia.com/object/drive-px.html http://kevinhughes.ca/blog/tensor-kart 7

  8. Living in a data flooded world http://affinelayer.com/pixsrv/index.html 8

  9. Living in a data flooded world 9

  10. Living in a data flooded world, continued https://www.nature.com/articles/nature21056.epdf 10

  11. Living in a data flooded world, continued “ An AI just beat top lawyers at their own game A new study, conducted by legal AI platform LawGeex in consultation with law professors from Stanford University, Duke University School of Law, and University of Southern California, pitted twenty experienced lawyers against an AI trained to evaluate legal contracts. Competitors were given four hours to review five non-disclosure agreements (NDAs) and identify 30 legal issues, including arbitration, confidentiality of relationship, and indemnification. They were scored by how accurately they identified each issue. Unfortunately for humanity, we lost the competition — “ badly. https://mashable.com/2018/02/26/ai-beats-humans-at-contracts/ 11

  12. Living in a data flooded world, continued http://3dgan.csail.mit.edu/ 12

  13. Living in a data flooded world, continued “ Image2Mesh Most of us take for granted the ability to effortlessly perceive our surroundings world and its objects in three dimensions. In general, we have great ideas about the 3D space only by looking at a single 2D image of an object even when there are many possible shapes that could have produced the same image. We simply rely on assumptions and prior knowledge acquired throughout our lives for the inference. It is one of the fundamental goals of computer vision to give machines the ability to perceive its surroundings as we do, for the purpose of providing solutions to tasks such as selfdriving cars, virtual and augmented reality, robotic surgery, to “ name a few. https://arxiv.org/abs/1711.10669 13

  14. Living in a data flooded world, continued http://web.mit.edu/vondrick/tinyvideo/ 14

  15. Living in a data flooded world, continued http://karpathy.github.io/2015/10/25/selfie/ 15

  16. Living in a data flooded world, continued https://github.com/ipsingh06/ml-desnapify 16

  17. Living in a data flooded world, continued https://arxiv.org/abs/1701.04928 17

  18. Living in a data flooded world, continued https://www.theguardian.com/technology/2017/sep/07/new-artificial-intelligence-can-tell-whether-youre-gay-or-straight-from-a- photograph 18

  19. Living in a data flooded world, continued https://www.technologyreview.com/s/612775/algorithms-criminal-justice-ai/ 19

  20. Living in a data flooded (real) world 20

  21. Data science and data scientists 21

  22. Data science Data contains value and knowledge But to extract this knowledge, you need to be able to: Store it Manage it Analyze it Terms often used interchangeably: Data Mining ≈ Big Data ≈ Data Analytics ≈ Data Science ≈ Knowledge Discovery ≈ Artificial Intelligence ≈ Deep Learning Don't worry too much about this and don't be too swayed by Venn diagrams or infographics 22

  23. Data science https://vas3k.com/blog/machine_learning/?ref=hn What even is this? 23

  24. Data scientists https://www.mckinsey.com/~/media/mckinsey/business functions/mckinsey digital/our insights/big data the next frontier for innovation/mgi_big_data_exec_summary.ashx 24

  25. Data scientists “ "I suspect AI today is like big data ten years ago" Exactly. Also as soon as big data came around nobody was doing just data, everyone was doing big data even if they had the same 10GB MySQL database they had from previous years. AI is a bit the same. Doing any analytics? - Now it's AI. Opening an excel spreadsheet and doing a curve fit: I am a data scientist doing AI. Doing https://www.bloomberg.com/news/articles/2018-02-13/in- any actual ML: not learning anymore the-war-for-ai-talent-sky-high-salaries-are-the-weapons but super deep learning. (https://news.ycombinator.com/item? > If you want to command a multiyear, seven- “ id=16366815) figure salary, you used to have only four career options: chief executive officer, banker, celebrity entertainer, or pro athlete. Now there’s a fifth—artificial intelligence expert. 25

  26. Data scientists https://www.techrepublic.com/article/why-data-scientist-is-the-most-promising-job-of-2019/ LinkedIn found... Top careers in data science include core data scientist, researcher, and big “ “ data specialist. 26

  27. Defining the data scientist A data scientist should have solid quantitative skills A data scientist should be a good programmer A data scientist should excel in communication and visualization skills A data scientist should have a solid business understanding A data scientist should be creative 27

  28. What's analytics all about? Given ((huge) lots) of data, discover patterns and models that are: Valid: hold on new data with some certainty, i.e. generalizable Over time, seasonal effects, overfitting, sub-groups, regional differences… Useful: should be possible to act on the item, i.e. actionable Business question, implementation, maintenance costs, ease-of-use… Unexpected: non-obvious to the system, i.e. interesting Balance between trust and discovery… Big and “weird” data Understandable: humans should be able to interpret the pattern Black box vs. white box, trust, validity… 28

  29. Valid, generalizable https://www.gwern.net/Tanks “ RL agent in Udacity self-driving car rewarded for speed learns to spin in circles “ (https://twitter.com/mat_kelcey/status/886101319559335936) “ NASA Mars mission planning, optimizing food/water/electricity consumption for total man-days “ survival, yields an optimal plan of killing 2/3 crew & keep survivor alive as long as possible 29

  30. Useful, actionable We can predict who will churn, and then what? 30

  31. Unexpected, interesting https://www.fastcompany.com/3063110/the-rise-of-weird-data Optimizing right turns for UPS drivers Typing with proper capitalization indicates creditworthiness Users of the Chrome and Firefox browsers make better employees But: not everything which is unexpected is interesting, or valid 31

  32. Understandable "Why does your model predict fraud?" "Which attributes of the customer are important?" "If age goes up you're more at risk?" "I don't understand interaction effects" "What do you mean 'just trust us'?" ... 32

  33. Ethical? Models become increasingly complex… But also rule many aspects in our life, from credit scoring to employment, all the way down to predicting recidivism People are becoming increasingly aware of what is being done with their data and are becoming more protective of their privacy and their rights to challenge a model’s conclusion White House released a statement regarding the promises and dangers of analytics: ”Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights”, and many more examples 33

  34. Ethical? 34

  35. Ethical? 35

  36. The analytics process model 36

  37. CRISP-DM Cross Industry Standard Process for Data Mining 37

  38. Others SEMMA Sample, Explore, Modify, Model, and Assess https://en.wikipedia.org/wiki/SEMMA The drivetrain approach Jeremy Howard, Margit Zwemer and Mike Loukides https://www.oreilly.com/ideas/drivetrain-approach-data-products https://www.oreilly.com/ideas/drivetrain-approach-data-products 38

  39. Challenges 39

  40. Challenges Mapping the business question to a technique / setup (there is no one-size fits all) (Not) realizing the amount of effort required in pre-processing Low amount of training data, either instances or features Or too many features… Huge data imbalance, or not even labeled data Quality of data, noise Predicting the future is hard (who’d have thought!) – hard to extrapolate towards the future for many models (machines are naïve and lazy) Incorporating domain knowledge, explaining models Strong validation / backtesting setup requires time and enough data Organizational aspects, teams, management 40

Recommend


More recommend