Rank correlation coefficients and their generalizations for interval data Karol Opara and Olgierd Hryniewicz Systems Research Institute Polish Academy of Sciences Konferencja „Statystyka Matematyczna” Będlewo 28 October 2016
Introduction Measuring of dependence Quantifying and testing dependence is one of the major tasks of statistics Imprecise data How to compute correlation coefficients for imprecise data? Opara K. and Hryniewicz O. (2016) Computation of general correlation coefficients for interval data International Journal of Approximate Reasoning 73 pp. 56–75. Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 2 / 46
Crisp correlation coefficients
General (crisp) correlation coefficient We have a set of n objects characterized by two properties x and y To any pair of individuals, say i -th and j -th, one can assign x -score a ij = − a ji and y -score b ij = − b ji Kendall (1955) describes a general correlation coefficient Γ as � n i , j = 1 a ij b ij Γ = (1) �� n � n i , j = 1 a 2 i , j = 1 b 2 ij ij Scores a ij and b ij are regarded zero if i = j Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 4 / 46
Pearson’s r General correlation coefficient � n i , j = 1 a ij b ij Γ = (2) �� n � n i , j = 1 a 2 i , j = 1 b 2 ij ij Pearson’s r is based on variate values a ij = x j − x i (3) b ij = y j − y i (4) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 5 / 46
Rank coefficients ρ and τ General correlation coefficient � n i , j = 1 a ij b ij Γ = (5) �� n � n i , j = 1 a 2 i , j = 1 b 2 ij ij Spearman’s ρ is based on ranks a ij = p j − p i (6) b ij = q j − q i (7) Kendall’s τ is based on ± 1 scores � + 1 if p i < p j a ij = (8) − 1 if p i > p j � + 1 if q i < q j b ij = (9) − 1 if q i > q j Variants τ a and τ b differently resolve ties in data, τ a ≤ τ b Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 6 / 46
Relations between rank correlation coefficients Daniels’ inequality − 1 ≤ 3 τ − 2 ρ ≤ 1 (10) refined by Durbin and Stuart 3 2 τ − 1 2 ≤ ρ ≤ 1 2 + τ + 1 2 τ 2 for τ > 0 (11) 2 τ 2 + τ − 1 1 2 ≤ ρ ≤ 3 2 τ + 1 2 for τ < 0 (12) Fredricks and Nelsen (2007) proved relation that for a limiting case of dependence weakening towards independence 2 ρ → 3 τ (13) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 7 / 46
Possible values of ρ and τ (Nelsen, 1991) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 8 / 46
Relation between Pearson’s r and Kendall’s τ For elliptical distributions (e.g. bivariate normal) (Frahm et al., 2003) τ = 2 π arc sin ( r ) (14) Typically (Hauke and Kossowski, 2011): large r ⇒ large τ and ρ (15) small r �⇒ small τ and ρ (16) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 9 / 46
Computational formulas for crisp Kendall’s τ Counting concordant and discordant pairs C = card i � = j { ( x i − x j ) · ( y i − y j ) > 0 } (17) D = card i � = j { ( x i − x j ) · ( y i − y j ) < 0 } (18) C − D τ = (19) n · ( n − 1 ) Denœux et al. (2005) imposed linear orders L X and L Y on each variate and counted the number of pairs ordered the same way by both of them τ = τ ( L X , L Y ) = 4 card { L X ∩ L Y } − 1 (20) n ( n − 1 ) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 10 / 46
Interpretations Kendall’s τ – a function of minimal number of transpositions required to order a sample Spearman’s ρ – interpretation in terms of concordances exists requiring a bivariate sample and a pair of independent random variables with the same marginals as the initial ones (Kendall, 1955; Nelsen, 1991) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 11 / 46
Copulas Sklar’s theorem Let F ( x ) and G ( y ) be continuous CDFs and H ( x , y ) be two-dimensional CDF of a random variable with marginals F and G . There exists a unique function C , called copula, such that H ( x , y ) = C ( F ( x ) , G ( y )) Copulas are invariant against order-preserving transformations such as ranking X → F ( X ) Rank-based measures of association are properties of copulas Spearman’s ρ has geometric interpretation in terms of copulas as a scaled proportion of volume of the [ 0 , 1 ] 3 cube under the copula surface. Nelsen (1992) gives a similar interpretation for Kendall’s τ . Gaussian copula, parameter ρ G equals Pearson’s r for normal marginals C ( u 1 , u 2 ; ρ ) = Φ N (Φ − 1 ( u 1 ) , Φ − 1 ( u 2 ); ρ G ) (21) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 12 / 46
Copulas and correlation Population version of Kendall’s τ and Spearman’s ρ can be defined in terms of copulas (Pearson’s r cannot be) τ ( X , Y ) = 4 E ( C ( F ( X ) , G ( Y )) − 1 (22) Genest and McKay (1986) used CDF K ( t ) of a random variable T = C ( U 1 , U 2 ) , where U 1 and U 2 are random variables uniformly distributed on [ 0 , 1 ] to show that � 1 τ = 3 − 4 K ( t ) dt (23) 0 Kendall’s τ as the difference between the probabilities of concordance and discordance (for population version of the statistic) τ = P (( x 1 − x 2 )( y 1 − y 2 ) > 0 ) − P (( x 1 − x 2 )( y 1 − y 2 ) < 0 ) (24) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 13 / 46
Interval correlation coefficients
Interval correlation coefficients For interval data z ∈ [ z L , z U ] , y ∈ [ y L , y U ] one obtains interval correlation coefficients [ τ L , τ U ] τ L ([ z L , z U ] , [ y L , y U ]) = arg min τ ( z , y ) z ∈ [ z L , z U ] , y ∈ [ y L , y U ] τ U ([ z L , z U ] , [ y L , y U ]) = arg max τ ( z , y ) z ∈ [ z L , z U ] , y ∈ [ y L , y U ] Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 15 / 46
Interval data Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46
Interval data Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46
Interval data Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46
Interval data Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 16 / 46
Cross-section (Kendall’s τ ) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 17 / 46
1 0.6 0.5 Pearson’s r 0.55 0 y 0.5 −0.5 1 1 0 0 −1 −1 −1 −1 −0.5 0 0.5 1 y x x 1 0.6 0.5 Spearman’s ρ 0.5 y 0 0.4 −0.5 1 1 0 0 −1 −1 −1 y −1 −0.5 0 0.5 1 x x 1 0.4 0.5 0.38 Kendall’s τ 0.36 0 0.34 y 0.32 −0.5 1 1 0 0 −1 −1 −1 −1 −0.5 0 0.5 1 y x x
Computational formulas for Pearson’s r Mean product of standard scores n 1 � x i − ¯ � � y i − ¯ � x y � r = (25) n − 1 s x s y i = 1 After studentization simplifies to a quadratic form n 1 � x ′ i y ′ r = (26) i n − 1 i = 1 or equivalently r = 1 1 � 0 n � I n n − 1 [ x ′ 1 , ..., x ′ n , y ′ 1 , ..., y ′ [ x ′ 1 , ..., x ′ n , y ′ 1 , ..., y ′ n ] T n ] (27) I n 0 n 2 The matrix has eigenvalues ± 1 Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 19 / 46
Computation of interval correlation coefficients
Inner and outer bounds on correlation coefficients Computational complexity For large problems (say n > 8) only approximate solutions are feasible Lower outer bound Upper outer bound −1 𝜍 𝑀 𝜍 𝑉 1 Lower inner bound Upper inner bound Possible values of ranks p i ∈ { p i , L , p i , L + 1 , ..., p i , U } (28) q i ∈ { q i , L , q i , L + 1 , ..., q i , U } (29) Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 21 / 46
Outer bounds for Spearman’s ρ Crisp Spearman’s ρ can be computed as n 12 p i q i − 3 ( n + 1 ) � ρ = (30) n ( n 2 − 1 ) n − 1 i = 1 Products of ranks can be bounded by p i q i ≤ p i , U q i , U (31) p i q i ≥ p i , L q i , L (32) Interval Spearman’s coefficient can be bounded by n 12 p U q U − 3 ( n + 1 ) � ρ ≤ (33) n ( n 2 − 1 ) n − 1 i = 1 n 12 p L q L − 3 ( n + 1 ) � ρ ≥ (34) n ( n 2 − 1 ) n − 1 i = 1 Karol Opara, Olgierd Hryniewicz (SRI PAS) Interval Correlation Coefficients 29 November 2016 22 / 46
Recommend
More recommend