syllabus link 1 syllabus bda17 syllabus version c 2
play

Syllabus link 1 syllabus BDA17 Syllabus version C 2 Syllabus - PowerPoint PPT Presentation

Syllabus Syllabus link 1 syllabus BDA17 Syllabus version C 2 Syllabus supplement about memos Memorandum format for assignments 3 Overheads: Microsoft intro BDA day 1 Microsoft presentation 4 Overheads: some sample problems BDA


  1. Syllabus Syllabus link

  2. • 1 syllabus BDA17 Syllabus version C • 2 Syllabus supplement about memos Memorandum format for assignments • 3 Overheads: Microsoft intro BDA day 1 Microsoft presentation • 4 Overheads: some sample problems BDA examples

  3. • 1 Main course Web site https://irgn452.wordpress.com • 2 Handouts page https://irgn452.wordpress.com/irgn452-big-data- analytics/handouts/ • 3 TritonED page ▾ 5 Udacity course page https://classroom.udacity.com/courses/ud651/lessons/7556187 12/concepts/8140985970923

  4. • • 1 main text DMBA-R Shmueli w R Wiley(Bohn) • • 2 Rattle text http://link.springer.com/book/10.1007/978-1-4419-9890-3 • • 3 ISLR book http://link.springer.com/book/10.1007/978-1-4614-7138-7 • • 4 Use R! series http://link.springer.com/search?facet- series=%226991%22&facet-content-type=%22Book%22 • • 5 R for Stata users http://link.springer.com/book/10.1007/978-1-4419- 1318-0 • • 6 R in Action R in Action, Second Edition • • 7 Library page for O’REilly books http://ucsd.worldcat.org/title/r- cookbook/oclc/733755354?referer=br&ht=edition • Udacity course https://classroom.udacity.com/courses/ud651/lessons/755298985/concepts/ 8651687310923#

  5. • From Week 3 onward, all homework must be done using R. • Attend the weekly TA tutorials, which will cover both R and data analytics. These tutorials are recommended for all students. • Take a special evening R test in week 10. The test is open book, open computer. • Submit all code for your final project as a loadable R workspace.

  6. § § § § § § § § §

  7. • Go through Udacity course, up to ggplot • DMBA Ch. 3: Can you reproduce pp 60 and 61? • Use ggplot2. • Reach Ch. 2

  8. BIG DATA

  9. How can we make it happen? What will happen? Why did it happen? What happened? Traditional BI Advanced Analytics

  10. Models Data Integration Visualization Mashups Predictions Applications Uncertainty Problems Effective Data Sources Data Credibility Applications Drew Conway http://www.dataists.com/2010/09/the-data- science-venn-diagram/

  11. • Exploratory data analysis • Time-to-event models • GAM survival models CUSTOM VARIABLES CUSTOM DATA FORMAT (PMML) • ETL • Scoring for inference • Marketing channel data • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per retailer • Overlay data

  12. Trends

  13. “I don't think the web would exist without 1996: open source and Linux. 10x 4Gb Hard So there would have Drives been no Google.” — Chris DiBona, Google 2000: 5000 Linux PCs Today: > 2 billion servers (estimated) http://commons.wikimedia.org/wiki/File:Google%E2%80%99s _First_Production_Server.jpg CC-BY-2.0

  14. • Cost Reduction (freedom to use / redistribute) • Time-to-market (freedom to share) • Innovation (freedom to tinker)

  15. • Most widely used data analysis software • Most powerful statistical programming language • Create beautiful and unique data visualizations • Thriving open-source community • Fills the talent gap www.revolutionanalytics.com/what-is-r

  16. New York Times, June 25 2009 (3 hours after Michael Jackson’s death)

  17. • TruSkill Matchmaking System • Player Churn • Game design • In-game purchase optimization • Fraud detection • Player communities

  18. • R ≃ Stata • Across all fields • In economics , Stata dominates (not shown)

  19. blog.revolutionanalytics.com/popularity Language P Popularity IEEE Spectrum Top Programming Languages #9: R Rexer Data Miner Survey IEEE Spectrum, July 2014 • •

  20. THE FUTURE: CLOUD DATA ANALYTICS SERVICES

  21. • Exposing the expertise of data scientists as APIs • Bringing the utility of data science to applications • Addressing the Data Science talent gap

  22. Azure: Huge infrastructure scale 19 Regions ONLINE…huge datacenter capacity around the world…and we’re growing North Europe West Europe Central US Ireland North Central US Netherlands Illinois Iowa China North * US Gov Beijing East US Iowa Japan East China South * Virginia Saitama West US Shanghai East US 2 California Japan West Virginia India East India West Osaka US Gov TBD South Central US TBD Virginia Texas East Asia Hong Kong SE Asia Singapore Australia East Sydney Brazil South Sao Paulo Australia West 100+ datacenters § Melbourne One of the top 3 networks in the world (coverage, speed, connections) § 2 x AWS and 6x Google number of offered regions § Announced Operational G Series – Largest VM available in the market – 32 cores, 448GB Ram, SSD… § * Operated by 21Vianet

  23. SQL Server 2016 Built-in in-database analytics Example Solutions Extensibility • Fraud detection R Integration • Sales forecasting R ? • Warehouse efficiency New R scripts • Predictive maintenance 010010 010010 010010 010010 100100 100100 100100 100100 010101 010101 010101 010101 Data Scientist Analytic Library Microsoft Azure Interact directly with data Machine Learning Marketplace 010010 010010 100100 100100 010101 010101 T-SQL Interface Data Developer/DBA Relational Data Manage data and analytics together Built-in to SQL Server

  24. R on a R on a server server Invoking RRE pulling data ScaleR Inside via SQL the EDW minutes rows

Recommend


More recommend