deux m thodes que l on devrait utiliser plus souvent
play

Deux mthodes que l'on devrait utiliser plus souvent : Corrlations - PowerPoint PPT Presentation

Deux mthodes que l'on devrait utiliser plus souvent : Corrlations partielles Rgression robuste Sbastien Djean Institut de Mathmatiques de Toulouse www.math.univ-toulouse.fr/~sdejean/ Rencontre Ingnieurs-statisticiens 2 fvrier


  1. Deux méthodes que l'on devrait utiliser plus souvent : Corrélations partielles Régression robuste Sébastien Déjean Institut de Mathématiques de Toulouse www.math.univ-toulouse.fr/~sdejean/ Rencontre Ingénieurs-statisticiens 2 février 2015, UT3 Paul Sabatier

  2. CORRÉLATION

  3. Spurious correlations http://www.tylervigen.com/

  4. cor(x,y) = 0.849

  5. cor(x,z) = 0.883 cor s (x,z) = 0.962

  6. cor(x,x2) = -0.156 cor s (x,x2) = -0.168 MI(x,x2) = 0.65

  7. Numata J, Ebenhöh O, Knapp EW. Measuring correlations in metabolomic networks with mutual information. Genome Inform . 2008;20:112-22. Package bioDist

  8. The MINE application http://www.exploredata.net/Downloads/MINE-Application

  9. Terry Speed A Correlation for the 21st Century Science 16 December 2011: Vol. 334 no. 6062 pp. 1502-1503

  10. Corrélation partielle Il est surprenant de constater qu'une technique statistique aussi puissante et aussi facile à obtenir que la corrélation partielle ne soit pas plus fréquemment utilisée en psychologie. Cette technique permet d'évaluer la corrélation entre deux variables après avoir contrôlé l'effet perturbateur d'une ou de plusieurs autres variables. Pr Jacques Baillargeon http://www.uqtr.uquebec.ca/~baillarg/srp-6001/cours3/partielle.htm Wikipedia : Formally, the partial correlation between X and Y given a set of n controlling variables Z = {Z1, Z2, …, Zn}, written ρXY·Z, is the correlation between the residuals RX and RY resulting from the linear regression of X with Z and of Y with Z, respectively.

  11. http://plus.maths.org/content/coincidence-correlation-and-chance And talking of Jenny, there she is now, buying an ice cream from the local shop. With her family about to go out to Australia for a holiday, I ought to go and warn her that the more ice creams there are sold, the more shark attacks there are . Again, I've done my research quite thoroughly, and the numbers do not lie. Perhaps I should recommend an apple instead! Finally, let's pop into my local primary school to chat to the head teacher. I want to tell her about research I've uncovered which shows a clear and proven link between literacy levels and hand size in children. Bigger hands make better readers, it seems. With my son starting there in the autumn, maybe now is the time to set up some sort of hand-stretching programme - perhaps on Wednesday afternoons, now that PE's been scrapped? These examples may seem bizarre and improbable, but they are not the result of bad statistics. All the information is absolutely correct. Their strangeness comes from our own reasoning. We see two things changing together and our instinct is to assume that they are tied by cause and effect . Unfortunately, our instinct is often wrong. In all these examples a third "confounding" variable is actually the cause of two correlated variables . It is absolutely true that people who play loud music are more likely to suffer from acne, but only because teenagers make up a big part of both groups . Acne and loud music are certainly correlated. But correlation is not causation . The same thing is true with the sharks and ice cream. The number of shark attacks and ice creams sold both go up during the summer, with the good weather encouraging people both to go in swimming and to eat ice cream. And as for large hands? Older children are bigger, and can read better!

  12. pairs(data.frame(x,y,z)) > x <- runif(100) > y <- x+rnorm(100,0,0.1) > z <- x+rnorm(100,0,0.3) > cor(data.frame(x,y,z)) x y z x 1.000 0.942 0.721 y 0.942 1.000 0.649 z 0.721 0.649 1.000 > res.y.x <- lm(y~x)$residuals > res.z.x <- lm(z~x)$residuals > cor(res.y.x,res.z.x) [1] -0.126402 > res.y.z <- lm(y~z)$residuals > res.x.z <- lm(x~z)$residuals > cor(res.x.z,res.y.z) Packages R : [1] 0.8992353 ● ppcor Température x > library(ppcor) ● corpcor > pcor(data.frame(x,y,z)) ● parcor $estimate ● ... x y z z y x 1.000 0.899 0.426 y 0.899 1.000 -0.126 Vente de glaces Attaque de z 0.426 -0.126 1.000 requins

  13. RÉGRESSION ROBUSTE

  14. ● pas de points particuliers ● les courbes rouge et bleue sont très proches

  15. ● par rapport au cas précédent, un point a été modifié (en haut à gauche) ● la courbe rouge est « attirée » par ce point : la pente est plus faible pour que le début de la courbe soit plus proche de ce point atypique ● la courbe bleue reste quasiment inchangée par rapport au cas précédent

  16. Moindres carrés Moindres écarts absolus Moindres carrés médians

  17. # Cas 1 : régression linéaire # Cas 2 : régression linéaire simple sans valeur atypique simple avec valeur atypique > x1 <- sort(runif(20)) > x2 <- x1 > y1 <- x1+0.1*rnorm(20) > y2 <- y1 > library(MASS) > y2[2] <- 1 > regr1 <- lm(y1~x1) > regr2 <- lm(y2~x2) > rob1 <- r lm(y1~x1) > rob2 <- r lm(y2~x2) > plot(x1,y1,pch=16) > plot(x2,y2,pch=16) > abline(regr1,col="red",lwd=2) > abline(regr2,col="red",lwd=2) > abline(rob1,col="blue",lwd=2) > abline(rob2,col="blue",lwd=2) > legend("bottomright", > legend("bottomright", c("Régression classique", c("Régression classique", "Régression robuste"), "Régression robuste"), col=c("red","blue"),lty=1,lwd=2) col=c("red","blue"),lty=1,lwd=2)

Recommend


More recommend