1
play

1 Random vectors I Some experiments produce outcomes that are - PDF document

Advanced topics from statistics Anders Ringgaard Kristensen Advanced Herd Management Slide 1 Outline Covariance and correlation Random vectors and multivariate distributions The multinomial distribution The multivariate


  1. ”Advanced” topics from statistics Anders Ringgaard Kristensen Advanced Herd Management Slide 1 Outline Covariance and correlation Random vectors and multivariate distributions • The multinomial distribution • The multivariate normal distribution Hyper distributions and hyper parameters Commonly used hyper distributions Conjugate families Slide 2 Covariance and correlation Let � and �� be two random variables having expected values µ � , µ � and standard deviations σ � and σ y the covariance between � and � is defined as • Cov( � , � ) = σ �� = E(( � − µ � )( � − µ � )) = E( �� ) . µ � µ � The correlation beween � and �� is In particular we have Cov( � , � ) = σ � 2 and Corr( � , � ) = 1 If � and � are independent, then E( �� ) = µ � µ � and therefore: • Cov( � , � ) = 0 • Corr( � , � ) = 0 Slide 3 1

  2. Random vectors I Some experiments produce outcomes that are vectors. Such a vector � is called a ������ ������� We write � = ( � 1 � 2 … � � )’. Each element � � in � is a random variable having an expected value E( � � ) = µ � and a variance Var( � � ) = σ � 2 . The covariance between two elements � � and � � is denoted σ �� For convenience we may use the notation σ �� = σ � 2 Slide 4 Random vectors II A random vector � = ( � 1 � 2 … � � )’ has an expected value, which is also a vector. It has a ”variance”, Σ , which is a matrix: Σ is also called the variance.covariance matrix or just the covariance matrix. Since Cov( � � , � � ) = Cov( � � , � � ), we conclude that Σ is symmetric, i.e σ �� = σ �� Slide 5 Random vectors III Let � be a random vector of dimension ��� Assume that E( � ) = µ µ µ µ , and let Σ Σ Σ Σ be the covariance matrix of � . Define �� = �� + � , where �� is an � × � matrix and � is an � dimensional vector. Then � is an � dimensional random vector with E( � ) = � µ µ µ µ + � , and covariance matrix � Σ Σ Σ Σ � ’ (compare with corresponding rule for ordinary random variables). Slide 6 2

  3. Multivariate distributions The distribution of a random vector is called a ������������ distribution. Some multivariate distributions may be expressed by a certain function over the sample space. We shall consider 2 such common multivariate distributions: • The multinomial distribution (discrete) • The multivariate normal distribution (continuous) Slide 7 The multinomial distribution I Consider an experiment with categorical outcomes. Assume that there are � mutually exclusive and exhaustive outcomes. • Rolling a dice → 1, 2, 3, 4, 5, 6 ( �� = 6) • Testing for somatic cell counts � at cow level → • � ≤ 200,000; • 200,000 < � ≤ 300.000; • 300,000 < � ≤ 400.000; • 400,000 < � ( � = 4) Assume that the probability of category � is � � and ∑ ik � � = 1 The experiment is repeated � times. Let � = ( � 1 � 2 … � � ) be a random vector defined so that � � is the total number of outcomes belonging to category � . The sample space of the compound � experiments is � = { � ∈ R � | ∑ �� � � = n} Slide 8 The multinomial distribution II The random vector � is then said to have a multinomial distribution with parameters � = ( � 1 � 2 … � � )’ and � . The probability distribution for � ∈ � is The expected value is E( � ) = � � The covariance matrix Σ Σ Σ is Σ Slide 9 3

  4. The multivariate normal distribution I A �� dimensional random vector �� with sample space �� = R � has a multivariate normal distribution if it has a density function given as The expected value is E( � ) = µ , and the covariance matrix is Σ . Slide 10 The multivariate normal distribution II The density function of the 2 dimensional random vector �� = ( � 1 � 2 )’. What is the sign of Cov( � 1 � 2 )? Slide 11 The multivariate normal distribution III Conditional distribution of subset: • Suppose that �� = ( � 1 … � k )’ is N( µ , Σ ) and we partition �� into two sub.vectors � 1 = ( � 1 … � j )’ and � � = ( � j+1 … � k )’. We partition the mean vector µ and the covariance matrix Σ accordingly and write • Then � 1 ∼ N( µ 1 , Σ 11 ) and � 2 ∼ N( µ 2 , Σ 22 ) Slide 12 4

  5. The multivariate normal distribution IV Conditional distribution, continued: • The matrix Σ Σ Σ Σ 12 = Σ Σ Σ ’ 21 contains the co. Σ variances between elements of the sub.vector � 1 and the sub.vector � 2 . • Moreover, for � 1 = � 1 the conditional distribution of � 2 is N( ν ν ν , � ) where ν .1 ( � 1 − µ • ν ν ν ν = µ µ µ µ 2 + Σ Σ Σ Σ 21 Σ Σ 11 Σ Σ µ 1 ) µ µ • �� = Σ Σ Σ Σ 22 − Σ Σ Σ Σ 21 Σ Σ Σ Σ 11 .1 Σ Σ Σ Σ 12 Slide 13 The multivariate normal distribution V Example: • Let � 1 , � 2 , … � 5 denote the first five lactations of a dairy cow. • It is then reasonable to assume that �� = ( � 1 � 2 … � 5 )’ has a 5 dimensional normal distribution. • Having observed e.g. � 1 , � 2 and � 3 we can predict � 4 and � 5 according to the conditional formulas on previous slide. Slide 14 Hyper distributions: Motivation I Until now, our assessment to distributions has been under the assumption that a parameter has a fixed (but often unknown) value. When we for instance observe the number �� of sows conceiving out of �� inseminated, we have tested the hypothesis that �� is drawn from a binomial distribution with parameter �� = � 0 where � 0 is some fixed value (e.g. 0.84). For analysis of production results this is not really a problem. Slide 15 5

  6. Hyper distributions: Motivation II Even though it is interesting to analyze whether an achieved conception rate is “satisfactory”, the real challenge is to use the information achieved for planning the future. In planning the future, predictions play an important role. If we mate 50 sows, how many can we expect to farrow in 115 days? A central question in production planning. Slide 16 Hyper distributions: Motivation III Prediction of number of farrowings: • “Naïve”: Assume that you know the conception rate with certainty, e.g. �� = 0.84. With 50 matings this gives us an expected number of farrowings is ��� = 0.84 × 50 = 42. • “Semi.naïve”: Take the binomial variation into account. By use of the binomial probability function we can calculate the probability for any �� ≤ 50 • “Correct”: Take the uncertainty about the true value of �� into account. Slide 17 Binomial variation of prediction Slide 18 6

  7. Full variation of prediction Slide 19 Can we take uncertainty into account In planning (prediction) we should take uncertainty into account. Can we do that? Yes, we can specify a distribution for the parameter (in this case the conception rate). Such a distribution for a parameter is called a hyper distribution. Slide 20 Hyper distributions In all kinds of planning, representation of uncertainty is important. We use hyper distributions to specify our belief in the true value. Hyper distributions learn over time. As observations are collected, our uncertainty may decrease. This is reflected in the hyper distribution. The parameters of a hyper distribution are called hyper parameters. Slide 21 7

  8. Conjugate families Most standard distributions have corresponding families of hyper distributions that are suitable for representation of our belief in parameter values. If the family of hyper distributions is closed under sampling it is called a ����������������� To be ”closed under sampling” means that if our ������ belief in a parameter value is represented by a distribution from a certain family, then the ���������� distribution after having taken further observations belongs to the same family. Slide 22 Conjugate families: Binomial I The conjugate family for the probability parameter of a binomial distribution is the family of Beta distributions. The sample space of a Beta distribution is �� = ]0; 1[ The Beta distribution has two parameters α and β . The density function is (where Γ is the gamma function): The expectation and variance are Slide 23 Conjugate families: Binomial II Assume that our ������ belief in the probability of a binomial distribution is Beta( α , β ) and we take a new observation from the binomial distribution in question. If we observe �� successes out of �� trials the ���������� belief in �� given the new observation is Beta( α + � , β + �� – � ). When we specify a Beta prior we may think of it this way: • What is our initial guess about the parameter? This guess is put equal to α /( α + β ). • How certain are we about our guess? As if we have observed �� �� 10, 100, 1000 trials (cases)? We then put α + β = �� and from these two equations we can determine the parameters α and β . Slide 24 8

Recommend


More recommend