COS868 - Probabilidade e COS868 - Probabilidade e Estatstica para - PowerPoint PPT Presentation

COS868 - Probabilidade e COS868 - Probabilidade e Estatística para Aprendizado de Estatística para Aprendizado de Máquina Máquina Rosa M. M. Leão Primeiro trimestre de 2020 UFRJ - COPPE Programa de Engenharia de Sistemas e Computação Rosa Leão - 2020

O que é probabilidade e O que é probabilidade e estatística ? estatística ? Por que é importante ? Por que é importante ? Rosa Leão - 2020

O que é probabilidade ? O que é probabilidade ? Definição: É o estudo das regras matemáticas que governam os eventos aleatórios O que é aleatoriedade ? Informalmente, um evento aleatório é um evento que não sabemos o resultado sem observá-lo A probabilidade nos fornece informações sobre estes eventos Rosa Leão - 2020

O que é estatística ? O que é estatística ? Definição: Estatística é a ciência que define como realizar a coleta e análise de dados aleatórios Estatística é usada para: Projetar experimentos Explorar/analisar dados - Descriptive statistics Fazer inferências a partir de dados coletados – Inferential Statistics Rosa Leão - 2020

Experimental Design Experimental Design The design of an experiment is crucial to making sure the collected data is useful. The adage ‘garbage in, garbage out’ applies here. A poorly designed experiment will produce poor quality data, from which it may be impossible to draw useful, valid inferences. Rosa Leão - 2020

Descriptive Statistics Descriptive Statistics Summary statistics like the mean, median, and interquartile range. We can also visualize the data using graphical devices like histograms, scatter plots, and the empirical CDF These methods are useful for both communicating and exploring the data to gain insight into its structure, such as whether it might follow a familiar probability distribution. Rosa Leão - 2020

Inferential Statistics Inferential Statistics To draw inferences about the world. Often this takes the form of specifying a statistical model for the random process by which the data arises. To draw inferences about model parameters. For example, assuming gestational length follows a N (μ, σ) distribution, we’ll use the data of the gestational lengths of, say, 500 pregnancies to draw inferences about the values of the parameters μ and σ. Rosa Leão - 2020

Examples Examples Explosion of digital data sensor signals, user behaviour network measures public records social network data surveillance tapes Sophisticated Meaningful patterns mathematical Insights models Performance evaluation Reliability Recommendation Rosa Leão - 2020

Exemplos de Aplicação Exemplos de Aplicação Demanda dos usuários Dados Planejamento da coletados rede Capacidade instalada Sugestão de produtos, filmes, músicas, amigos Sistemas de recomendação Sugestão de tópicos a serem estudados, exercícios, outras aulas Rosa Leão - 2020

Exemplos de Aplicação Exemplos de Aplicação Requisitos: Probabilidade de falha do Sistema < 10 -12 Planejamento do sistema de Arquitetura do HW/SW do computadores de Sistema Sistema bordo de um avião Dados clínicos dos População com maior pacientes probabilidade de ter um certo tipo de doença Sistemas de inferência Tipo de medicamento que surte mais efeito em uma dada população Medicamentos usados Rosa Leão - 2020

Example of Study Example of Study To study the effectiveness of new treatment for cancer, patients are recruited and then divided into an experimental group and a control group. The experimental group is given the new treatment and the control group receives the current standard of care. Data collected from the patients might include demographic information, medical history, initial state of cancer, progression of the cancer over time, treatment cost, the effect of the treatment on tumor size, longevity, and quality of life. The data will be used to make inferences about the effectiveness of the new treatment compared to the current standard of care. Rosa Leão - 2020

Example of Study Example of Study The experimental design must specify: the size of the study, who will be eligible to join, how the experimental and control groups will be chosen, how the treatments will be administered, whether or not the subjects or doctors know who is getting which treatment, and precisely what data will be collected, among other things. Descriptive Statistics: the goal is to obtain summary statistics for both groups. Inferential Statistics: For example we want to test the hypothesis that the new treatment is more (or less) effective than the current one(s), and by how much. Rosa Leão - 2020

Qual deve ser o tamanho da amostra ? Como estimar o erro ? Rosa Leão - 2020

Rosa Leão - 2020

What is a statistic ? What is a statistic ? Consider the data of 1000 rolls of a die. All of the following are statistics: the sample average of the 1000 rolls; the number of times a 6 was rolled; the sample variance of the 1000 rolls. The probability of rolling a 6 is not a statistic, whether or not the die is truly fair. This probability is a property of the die (and the way we roll it) which we can estimate using the data. Such an estimate is given by the statistic ‘proportion of the rolls that were 6’. Rosa Leão - 2020

Rosa Leão - 2020

Screening for a disease Screening for a disease Suppose a screening test for a disease has a 1% false positive rate and a 1% false negative rate. Suppose also that the rate of the disease in the population is 0.002. Finally suppose a randomly selected person tests positive. Then we have: Hypothesis: H = ‘the person has the disease’ Data: D = ‘the test is positive.’ What we want to know: P (H|D) = P (the person has the disease | a positive test) Rosa Leão - 2020

Screening for a disease Screening for a disease P(hypothesis | data) = P(the person has the disease | a positive test) P(H|D) = (P(D|H) P(H)) / P(D) = False Person does not True Person has positive have disease positive disease (0.99 * 0.002) / (0.99 * 0.002 + 0.01 * 0.998) = 0.166 P(D) = P(D|H)P(H) + P(D|H)P(H) Before the test we would have said the probability the person had the disease was 0.002. After the test we see the probability is 0.166. That is, the positive test provides some evidence that the person has the disease. Rosa Leão - 2020

Estimating Parameters Estimating Parameters Suppose we know we have data consisting of values x_1 , . . . , x_n drawn from an exponential distribution. The question remains: which exponential distribution? We are often faced with the situation of having random data which we know (or believe) is drawn from a parametric model, whose parameters we do not know. For example, in an election between two candidates, polling data constitutes draws from a Bernoulli(p) distribution with unknown parameter p. In this case we would like to use the data to estimate the value of the parameter p, as the latter predicts the result of the election. Rosa Leão - 2020

Estimating Parameters Estimating Parameters Our focus so far has been on computing the probability of data arising from a parametric model with known parameters. Statistical inference flips this on its head: we will estimate the probability of parameters given a parametric model and observed data drawn from it. Rosa Leão - 2020

Maximum Likehood Estimate Maximum Likehood Estimate MLE answers the question: For which parameter value does the observed data have the biggest probability? Definition: Given data the maximum likelihood estimate (MLE) for the parameter p is the value of p that maximizes the likelihood P (data | p). That is, the MLE is the value of p for which the data is most likely. Rosa Leão - 2020

Rosa Leão - 2020

Maximum Likehood Estimate Maximum Likehood Estimate 1. The MLE for p turned out to be exactly the fraction of people that say that cilantro tastes like a soup in our experiment. 2. The MLE is computed from the data. That is, it is a statistic. 3. Officially you should check that the critical point is indeed a maximum. You can do this with the second derivative test. Rosa Leão - 2020

Rosa Leão - 2020

Log Likehood Log Likehood Maximizing likelihood is the same as maximizing log likelihood. Rosa Leão - 2020

Rosa Leão - 2020

COS868 - Probabilidade e COS868 - Probabilidade e Estatstica para - PowerPoint PPT Presentation

COS868 - Probabilidade e COS868 - Probabilidade e Estatstica para Aprendizado de Estatstica para Aprendizado de Mquina Mquina Rosa M. M. Leo Primeiro trimestre de 2020 UFRJ - COPPE Programa de Engenharia de Sistemas e

MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1

A variational formula for functionals of fBM and applications to LDPs Andr e de Oliveira Gomes

Convergence of symmetric Feller processes on metric trees Anita Winter , University of

On the Use of Analytical Techniques for Parameter Identification in Radiation and Particle

A unified view on lifetime distributions arising from selection mechanisms Josemar Rodrigues

Alternative methods for modeling of the cure rate in survival studies Vera Tomazella

Modelos de mudan ca de fase irrevers veis Gabriela Planas Departamento de Matem atica

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

RECESSI ECESSION PR ON PROOF OOF REAL EAL EST ESTAT ATE J Sc J Scott ott Who Am I?

Fraquezas no Cart ao MIFARE Classic Wellington Baltazar de Souza Instituto de Matem atica e

M etodos de estad stica computacional y machine learning para ciencias de la vida, con

Building on quality BTB B REAL ESTAT TATE INV NVESTMENT STMENT TRUS UST Annual and Special

Morg organ an St Stanl anley ey Virtu irtual al Fi Fixe xed Incom ome real al estat

ftg~~~ Ashu Gupt a Encl: as above CIN: L5 1900MH2000 PL C126473 RE GISTERED ADDRESS: A nj aneya,

Fiscal 2019 11 th June,19 Corporate Presentation Safe Harbor Statement The information contained

The heteroscedastic odd log-logistic generalized gamma regression model for censored data F

Principales cambios introducidos en el SEC- 2010 respecto a la clasificacin sectorial de entes

The geometric exegesis of the Dirac algorithm J. Fernando Barbero G. Instituto de Estructura de

Genome-wide Regression & Prediction with the BGLR statistical package Paulino P erez

On solving the multi-period location-assignment problem under uncertainty a Albareda-Sambola 2

Y outh Resettlement and the Law Care into Practice Applying Children Act 1989 Guidance and

One-Way ANOVA modelling for RRAM reset curves alez 1 , Ana M. Aguilera 1 , Christian J. Acal

Invariant Ricci-flat K ahler metrics on tangent bundles of compact symmetric spaces Jos e

OSM and the European agenda Vlado Cetl, Alexander Kotsev, Michael Lutz, Robert Tomas (EC DG JRC)*

COS868 - Probabilidade e COS868 - Probabilidade e Estatstica para - PowerPoint PPT Presentation

COS868 - Probabilidade e COS868 - Probabilidade e Estatstica para Aprendizado de Estatstica para Aprendizado de Mquina Mquina Rosa M. M. Leo Primeiro trimestre de 2020 UFRJ - COPPE Programa de Engenharia de Sistemas e

MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1

A variational formula for functionals of fBM and applications to LDPs Andr e de Oliveira Gomes

Convergence of symmetric Feller processes on metric trees Anita Winter , University of

On the Use of Analytical Techniques for Parameter Identification in Radiation and Particle

A unified view on lifetime distributions arising from selection mechanisms Josemar Rodrigues

Alternative methods for modeling of the cure rate in survival studies Vera Tomazella

Modelos de mudan ca de fase irrevers veis Gabriela Planas Departamento de Matem atica

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

RECESSI ECESSION PR ON PROOF OOF REAL EAL EST ESTAT ATE J Sc J Scott ott Who Am I?

Fraquezas no Cart ao MIFARE Classic Wellington Baltazar de Souza Instituto de Matem atica e

M etodos de estad stica computacional y machine learning para ciencias de la vida, con

Building on quality BTB B REAL ESTAT TATE INV NVESTMENT STMENT TRUS UST Annual and Special

Morg organ an St Stanl anley ey Virtu irtual al Fi Fixe xed Incom ome real al estat

ftg~~~ Ashu Gupt a Encl: as above CIN: L5 1900MH2000 PL C126473 RE GISTERED ADDRESS: A nj aneya,

Fiscal 2019 11 th June,19 Corporate Presentation Safe Harbor Statement The information contained

The heteroscedastic odd log-logistic generalized gamma regression model for censored data F

Principales cambios introducidos en el SEC- 2010 respecto a la clasificacin sectorial de entes

The geometric exegesis of the Dirac algorithm J. Fernando Barbero G. Instituto de Estructura de

Genome-wide Regression &amp; Prediction with the BGLR statistical package Paulino P erez

On solving the multi-period location-assignment problem under uncertainty a Albareda-Sambola 2

Y outh Resettlement and the Law Care into Practice Applying Children Act 1989 Guidance and

One-Way ANOVA modelling for RRAM reset curves alez 1 , Ana M. Aguilera 1 , Christian J. Acal

Invariant Ricci-flat K ahler metrics on tangent bundles of compact symmetric spaces Jos e

OSM and the European agenda Vlado Cetl, Alexander Kotsev, Michael Lutz, Robert Tomas (EC DG JRC)*

Genome-wide Regression & Prediction with the BGLR statistical package Paulino P erez