CS 215: Data Interpretation and Analysis Fall 2017 Instructors: - PowerPoint PPT Presentation

CS 215: Data Interpretation and Analysis Fall 2017 Instructors: Ajit Rajwade & Suyash Awate

Where all do you analyze and interpret data? (1) In Medicine: Examples • Pathology reports, • Epidemiology studies https://ethnomed.org/clinical/tuberculosis/firlan d/epidemiology-of-tb

Where all do you analyze and interpret data? http://i.dawn.com/primary/2 015/02/54d32f884dfd0.jpg?r =1999182479 (2) In Sports • Tournament data • Player data • Questions like: which is the best team? Which is the best batsman? Which is the best batsman from so and so age-group?

Where all do you analyze and interpret data? (3) In Economics and List by the International Monetary Fund (2014 Finance: Rank Country/Region GDP (Millions of US$) World • Country-wise data 1 United States 17,418,925 2 China 10,380,380[n 2] 3 Japan 4,616,335 4 Germany 3,859,547 Gross Domestic Product ( GDP ) is 5 United Kingdom2,945,146 6 France 2,846,889 the broadest quantitative measure 7 Brazil 2,353,025 of a nation's total economic 8 Italy 2,147,952 activity. More specifically, GDP 9 India 2,049,501 represents the monetary value of all 10 Russia 1,857,461[n 3] goods and services produced within 11 Canada 1,788,717 a nation's geographic borders over 12 Australia 1,444,189 13 South Korea 1,416,949 a specified period of time. 14 Spain 1,406,855 15 Mexico 1,282,725 http://www.investinganswer 16 Indonesia 888,648 17 Netherlands 866,354 s.com/financial- 18 Turkey 806,108 dictionary/economics/gross- 19 Saudi Arabia 752,459 20 Switzerland 712,050 domestic-product-gdp-1223

Where all do you analyze and interpret data? (3) In Economics and http://ihds.umd.edu/IHDS_files/02HDinIndia.pdf Finance: • Country-wise data

Where all do you analyze and interpret data? (3 ) In Economics and Finance: • Region-wise data within a country GDP of Indian states and union territories in 2014 – 15 • over ₹ 14 lakh crore (US$220 billion) • ₹ 10 lakh crore (US$160 billion) to ₹ 14 lakh crore (US$220 billion) • ₹ 8 lakh crore (US$120 billion) to ₹ 10 lakh crore (US$160 billion) • ₹ 6 lakh crore (US$93 billion) to ₹ 8 lakh crore(US$120 billion) • ₹ 4 lakh crore (US$62 billion) to ₹ 6 lakh crore(US$93 billion) • ₹ 2 lakh crore (US$31 billion) to ₹ 4 lakh crore(US$62 billion) • ₹ 1 lakh crore (US$16 billion) to ₹ 2 lakh crore(US$31 billion) • ₹ 0.5 lakh crore (US$7.8 billion) to ₹ 1 lakh crore (US$16 billion) • ₹ 0.25 lakh crore (US$3.9 billion) to ₹ 0.50 lakh crore (US$7.8 billion) • less than ₹ 0.25 lakh crore (US$3.9 billion) Source: wikipedia article

Where all do you analyze and interpret data? (5) In many other fields: • Weather forecasting • Psephology • Stock markets • Industrial testing • Market research (eg: in industry and storehouses)

So what’s this course all about?  Sounds like everything under the http://www.clipartpanda.com/clipart_images/clipart-sun-rays-clipart-1587813

What’s this course all about?  A beginning course on probability and statistics  A very useful base for future courses in machine learning, data mining, statistics, image processing and computer vision.

What’s this course all about? Three sections  Data analysis: Process of gathering, displaying/visualizing and summarizing the data  Probability: The “chance” that something happens  Statistical Inference: The science of drawing precise inferences from the data gathered using tools from probability

Example in Toxicology  Imagine I invent two new medicines (say) to reduce blood pressure (BP).  I test the two medicines on two groups of rats – A and B – respectively.  I will then periodically measure BP of rats in groups A and B.  And seek to determine which medicine is “better”.

Example in Toxicology: Data Analysis  What should be the size of A and B?  How should I pick the members of A and B? Example: can A be all males, B be all females? Can A be all white rats and B be all black rats?  Once I acquire the BP measurements, how do I display them succinctly? How do I compute averages?

Example in Toxicology: Data Interpretation (or Statistical Inference)  Let’s say the average BP of A was much lower than that of B after feeding the two drugs.  Does this mean the first medicine is more effective?  Or was this just a matter of chance? (Example: If I flip an unbiased coin 50 times, I could land up with 30 heads – just by chance!)

One more example  Suppose your friend performs 10,000 independent tosses of an unbiased coin.  He reports 5200 heads.  Is (s)he serious or joking?

Course Information  Instructors: Ajit Rajwade (first half) and Suyash Awate (second half)  Lecture venue: CDEEP EEG 401 (GG Building 4 th Floor), timings: Slot 10, Tue and Fri, 2:00 to 3:25 pm (i.e. post lunch - and strong coffee  ). The class will be broadcast live to IIT Goa.  Course webpage (for the first half): http://www.cse.iitb.ac.in/~ajitvr/CS215_Fall2017/

Descriptive Statistics Fall 2017 Instructor: Ajit Rajwade 16

Topic Overview  Some important terminology  Methods of data representation: frequency tables, graphs, pie-charts, scatter-plots  Data mean, median, mode, quantiles  Chebyshev’s inequality  Correlation coefficient 17

Terminology  Population : The collection of all elements which we wish to study, example: data about occurrence of tuberculosis all over the world  In this case, “population” refers to the set of people in the entire world.  The population is often too large to examine/study.  So we study a subset of the population – called as a sample .  In an experiment, we basically collect values for attributes of each member of the sample – also called as a sample point .  Example of a relevant attribute in the tuberculosis study would be whether or not the patient yielded a positive result on the serum TB Gold test.  See http://www.who.int/tb/publications/global_report/en/ for more information. 18

Terminology  Discrete data: Data whose values are restricted to a finite set. Eg: letter grades at IITB, genders, marital status (single, married, divorced), income brackets in India for tax purposes  Continuous data: Data whose values belong to an uncountably infinite set (Eg : a person’s height, temperature of a place, speed of a car at a time instant). 19

Methods of Data Representation/Visualization 20

Frequency Tables  For discrete data having a relatively small number of values , one can use a frequency table .  Each row of the table lists the data value followed by the number of sample points with that value ( frequency of that value).  The values need not always be numeric! The definition of an Grade Number of students ideal course (per AA 100 student perspective) AB 0 at IITB ;-) BB 0 BC 0 CC 0 21

Frequency Tables  The frequency table can be visualized using a line graph or a bar graph or a frequency polygon . 35 Grade Number of students 30 AA 5 25 Number of students AB 10 20 BB 30 BC 35 15 CC 20 10 A bar graph plots the distinct 5 data values on the X axis and their frequency on the Y axis by 0 50 60 70 80 90 means of the height of a thick Marks 22 vertical bar!

35 Grade Number of students 30 AA 5 25 AB 10 Number of students BB 30 20 BC 35 15 CC 20 10 5 0 50 55 60 65 70 75 80 85 90 Marks A line diagram plots the distinct data values on the X axis and their frequency on the Y axis by means of the height of a vertical line! 23

35 Grade Number of students 30 AA 5 Number of students 25 AB 10 BB 30 20 BC 35 CC 20 15 10 5 50 55 60 65 70 75 80 85 90 Marks A frequency polygon plots the frequency of each data value on the Y axis, and connects consecutive plotted points by means of a line. 24

Relative frequency tables  Sometimes the actual frequencies are not important.  We may be interested only in the percentage or fraction of those frequencies for each data value – i.e. relative frequencies . Grade Fraction of number of students AA 0.05 AB 0.10 BB 0.30 BC 0.35 CC 0.20 25

Pie charts  For a small number of distinct data values which are non-numerical, one can use a pie-chart (it can also be used for numerical values).  It consists of a circle divided into sectors corresponding to each data value.  The area of each sector = relative frequency for that data value. Population of native English speakers: https://en.wikipedia.org/wiki/Pie_chart 26

Pie charts can be confusing A big no-no with too many categories. http://stephenturbek.com/articles/2009/06/better-charts-from-simple-questions.html 27

Dealing with continuous data  Many a time the data can acquire continuous values (eg: temperature of a place at a time instant, speed of a car at a given time instant, weight or height of an animal, etc.)  In such cases, the data values are divided into intervals called as bins .  The frequency now refers to the number of sample points falling into each bin.  The bins are often taken to be of equal length, though that is not strictly necessary. 28

CS 215: Data Interpretation and Analysis Fall 2017 Instructors: - PowerPoint PPT Presentation

CS 215: Data Interpretation and Analysis Fall 2017 Instructors: Ajit Rajwade & Suyash Awate Where all do you analyze and interpret data? (1) In Medicine: Examples Pathology reports, Epidemiology studies

SUPER FAST 15 MINS SUPER FAST 15 MINS 1300 733 215 1300 733 215 UNLIMITED DATA UNLIMITED DATA

INTERPRETATION INTERPRETATION INTERPRETATION INTERPRETATION How can I know what How can I know

6th Grade Fraction & Decimal Computation 2015-10-20 www.njctl.org Slide 3 / 215 Slide 4 /

Trends in Interpretation SCIC-Universities Conference 6-7 April 2017 Ana MOUZINHO DE

2005 MARKET STREET SUITE 1700 PHILADELPHIA PA 19103-7077 T 215.575.9050 F 215.575.4939 901 E

CGT 215 Computer Graphics Programming I Introduc9on CGT 215

3rd Grade Fractions 2015-03-31 www.njctl.org Slide 3 / 215 Table of Contents Click title to

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

An interpretation of surface displacements An interpretation of surface displacements An

Using Geochemical Data: Evaluation, Presentation, Interpretation Using Geochemical Data:

Flow Analysis Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM Helpful

Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation TDDC90: Software

Outline Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation Overview

Agenda Overview of New Statute and Regulation Process General Considerations Whats

Abstract interpretation based Analysis [FPCA 95], Predicate Abstraction [Mannas festschrift

Why Fuzzy Interpretation of . . . Bernstein Polynomials Fuzzy Interpretation . . . How Can We .

Table Infrastructures 27th October 2015 Katrin Hlldobler, Pedram Mir Seyed Nazari, and Bernhard

F2 Information Systems Presented By: Danielle McConville Pu Purpose of of the the session

Identity Management with midPoint Radovan Semank FOSDEM, January 2016 Radovan Semank

the HEAnet NOC(s) Gareth Eason (HEAnet) TF-NOC meeting, Ljubljana, 15 th Feb 2011 the network

Strategy, trends and the marketing man Proton Global MDP December 26, 2008 Where were

Views on Dr. Pradeep K. Sinha Vice Chancellor & Director IIIT Naya Raipur (CG) December

Delivery of Public Services: Models, Experiments, and Policy Maitreesh Ghatak LSE IGC India

BSNLs VRS PROPOSAL 11 VR scheme Proposal In view of the companys very high wage bill,

CS 215: Data Interpretation and Analysis Fall 2017 Instructors: - PowerPoint PPT Presentation

CS 215: Data Interpretation and Analysis Fall 2017 Instructors: Ajit Rajwade & Suyash Awate Where all do you analyze and interpret data? (1) In Medicine: Examples Pathology reports, Epidemiology studies

SUPER FAST 15 MINS SUPER FAST 15 MINS 1300 733 215 1300 733 215 UNLIMITED DATA UNLIMITED DATA

INTERPRETATION INTERPRETATION INTERPRETATION INTERPRETATION How can I know what How can I know

6th Grade Fraction &amp; Decimal Computation 2015-10-20 www.njctl.org Slide 3 / 215 Slide 4 /

Trends in Interpretation SCIC-Universities Conference 6-7 April 2017 Ana MOUZINHO DE

2005 MARKET STREET SUITE 1700 PHILADELPHIA PA 19103-7077 T 215.575.9050 F 215.575.4939 901 E

CGT 215 Computer Graphics Programming I Introduc9on CGT 215

3rd Grade Fractions 2015-03-31 www.njctl.org Slide 3 / 215 Table of Contents Click title to

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

An interpretation of surface displacements An interpretation of surface displacements An

Using Geochemical Data: Evaluation, Presentation, Interpretation Using Geochemical Data:

Flow Analysis Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM Helpful

Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation TDDC90: Software

Outline Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation Overview

Agenda Overview of New Statute and Regulation Process General Considerations Whats

Abstract interpretation based Analysis [FPCA 95], Predicate Abstraction [Mannas festschrift

Why Fuzzy Interpretation of . . . Bernstein Polynomials Fuzzy Interpretation . . . How Can We .

Table Infrastructures 27th October 2015 Katrin Hlldobler, Pedram Mir Seyed Nazari, and Bernhard

F2 Information Systems Presented By: Danielle McConville Pu Purpose of of the the session

Identity Management with midPoint Radovan Semank FOSDEM, January 2016 Radovan Semank

the HEAnet NOC(s) Gareth Eason (HEAnet) TF-NOC meeting, Ljubljana, 15 th Feb 2011 the network

Strategy, trends and the marketing man Proton Global MDP December 26, 2008 Where were

Views on Dr. Pradeep K. Sinha Vice Chancellor &amp; Director IIIT Naya Raipur (CG) December

Delivery of Public Services: Models, Experiments, and Policy Maitreesh Ghatak LSE IGC India

BSNLs VRS PROPOSAL 11 VR scheme Proposal In view of the companys very high wage bill,

6th Grade Fraction & Decimal Computation 2015-10-20 www.njctl.org Slide 3 / 215 Slide 4 /

Views on Dr. Pradeep K. Sinha Vice Chancellor & Director IIIT Naya Raipur (CG) December