ww www. w.big bigbang bang-datasc atascience.com ience.com
Agenda BBDS Team Data Science Portfolio Data Science Process Data Explosion Why Data Science? Career in Data Science What is Data Science? Machine Learning Type of Analytics Data Types BBDS 12 Weeks Program & BBDS Programs 2
BBDS Team Dr. Ying Xie Shan Nabi Dr. Ying Xie is a tenured full professor and PhD 14+ Years of experience in IT advisor with intensive research and industrial experiences in the field of machine learning and deep 7+ Years of experience in SAP consultant in Cloud learning Services Dr. Xie is currently served as the Director of Equifax 1+ Years of experience in Data Science and related Data Science Research Lab @KSU. technologies In the past, Dr. Xie was the chief scientist at Araicom Master degrees (MS-IT) Life Science. He worked with numerous companies, such as LexisNexis and Emerson Climate Technology, on collaborative researches Mo Medwani Edward Bujak 12+ Years of experience in IT (Service Delivery 25 years of experience in IT Management) 18 years of experience in education: computer science, 7+ Years of experience in Data Analytics mathematics, engineering 3+ Years of experience in Data Science and related 2 Masters degrees (MS-Electrical Engineering & technologies Computer Science, MS-Education 3 Master degrees (MBA, MS-IT & MS- Data Science) Founder of Big Bang Data Science Solutions
Data Explosion
Some interesting facts about Data Every day, we create 2.5 quintillion bytes of data, so much that 90% of the data in the world today has been created in the last two years alone. Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 PB of data Twitter generates 12 TB of data every day. Airbus A380 generates 10 TB every 30 minutes of flight. NYSE generates a TB of data every month. What do we do we so much amount of data? Ignore or use it.
How much data is getting generated?
How much data is getting generated? 7
… The model has changed Old Model – only a few companies were generating data (like news outlets), all others are consuming data New Model – all of us are generating data, and all of us are consuming data
Opportunities for New Approach to Analytics Over 2.5 Exabyte (2.5 billion gigabytes) of data is generated every day. In 2020, the world will generate 50x more data than we generated in 2011 9
Data -The Most Valuable Resource “In its raw form, oil has little value. Once processed and refined, it helps power the world.” — Ann Winblad “Data is the new oil.” — Clive Humby, CNBC
What is Data Science ?
Data Science – A Definition A decade after the term data science was first used, there is continued debate among practitioners and academics about what data science means. Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products. “The ability to take data— to be able to understand it, process it, to extract value from it, to visualize it, to communicate it —that’s going to be a hugely important skill in the next decades.” - Hal Varian, Google’s Chief Economist 12
Data Science – A Definition Source – Big data University 13
Data Science – A Visual Definition Multidisciplinary Statistics quantifies numbers Data Mining explains patterns From my perspective a data scientist Machine Learning predicts with models have a blend of many skills Artificial Intelligence behaves and reasons
Data Science – A Definition Data Science is a “concept of unifying Statistics , data analysis and their related methods” in order to “ understand and analyze an actual phenomena “ with data. IBM 15
Data science overlaps with Computer science: computational complexity, Internet topology and graph theory, distributed architectures such as Hadoop, data plumbing ,data compression, computer programming (Python, Perl, R) and processing sensor and streaming data Statistics: design of experiments including multivariate testing, cross-validation, stochastic processes, sampling… Machine learning and data mining Operations research: data science encompasses most of operations research as well as any techniques aimed at optimizing decisions based on analyzing data. Business intelligence: every BI aspect of designing/creating/identifying great metrics and KPI's, creating database schemas (be it NoSQL or not), dashboard design and visuals, and data-driven strategies to optimize decisions and ROI, is data science
DS vs Analytics Disciplines fields 18
DS vs Analytics Disciplines fields 19
Why Data Science ?
Why Data Science ? Harvard Business : Data scientist is the sexiest career of the 21st century LinkedIn: Statistical Analysis & Data Mining were the hottest skills that got recruiters’ attention in 2014 Glassdoor ranked data scientist as the #1 job to pursue in 2016 McKinsey : the US alone faces a shortage of 150,000+ data analysts and an additional 1.5 million data- savvy managers Salary trends have followed the impact of data science. With a national average salary of $118,000 (which increases to $126,000 in Silicon Valley), data science has become a lucrative career path where you can solve hard problems and drive social impact. 21
“Data Science” an Emerging Field The future belongs to the companies and people that turn data into products O’Reilly Radar report, 2011 Goal of Data Science Turn data into data products . 22
Types of Analytics
There are four distinct types of Analytics Explained what Suggests why it Indicates what Recommends what has happened happened could happen should happen
There are four distinct types of Analytics
There are several area of Analytics
Customer Analytics is a process that helps organizations make critical decision and deliver options that are anticipated All the telecom companies these days use different marketing methods to retain their customers Financial Analytics helps financial executives explore different ways to answer specific finance-related business questions and forecast future financial situations Reading cash flow statement, balance sheets, and income statements comes under financial analytics Performance Analytics is the practice of using data and technology to study how your business is performing to continuously make it better In HR Management, the performance of the employees is monitored on a regular basis dependent on the expected outcome Risk Analytics foresees the uncertainties of the predicted future that helps evaluate a project’s success of failure In the banking industry, credit scores are built to predict an individual’s delinquency behavior and is used to represent the credit worthiness of each individual
Data Science Portfolio
Data Scientist Profile (Competencies) 1.Quantitative skills, such as mathematics or statistics 2.Technical aptitude, such as software engineering, machine learning, and programming skills. 3.Skeptical …..this may be a counterintuitive trait, although it is important that data scientists can examine their work critically rather than in a one-sided way. 4.Curious & Creative , data scientists must be passionate about data and finding creative ways to solve problems and portray information 5.Communicative & Collaborative : it is not enough to have strong quantitative skills or engineering skills. To make a project resonate, you must be able to articulate the business value in a clear way, and work collaboratively with project sponsors and key stakeholders. 30
Data Science Is a T eam Sport 31
Data Science Is a T eam Sport 32
“Citizen Data Scientist” ? Market trends indicate that the emergence of “ Citizen Data Scientist”
Different Data Science Roles Before we dive into what skills you need to become a data scientist, you should be aware that there are different roles in data science. Role Skills Citizen Data Scientist No coding R/Python background, but some Statistical & Analytical experience Data Scientists Rely on their training in statistics and mathematical modeling, Business Analysts Rely more heavily on their analytical skills and domain expertise Data Engineers Rely mostly on software engineering skills,
Recommend
More recommend