An evolutionary analysis of association patterns Alfonso Iodice DEnza - PowerPoint PPT Presentation

Introduction Notation Criterion Procedure Example An evolutionary analysis of association patterns Alfonso Iodice D’Enza 1 Francesco Palumbo 2 Correspondence Analysis and Related MEthods 2011 Rennes, 8 - 11 February 2011 1Universit` a di Cassino 2Universit` a degli Studi di Napoli An evolutionary analysis of association patterns 1 / 25

Introduction Notation Criterion Procedure Example Introduction 1 Notation 2 Criterion 3 Procedure 4 Example 5 An evolutionary analysis of association patterns 2 / 25

Introduction Notation Criterion Procedure Example Background A common approach in finding patterns of association in high dimensional and sparse data is to combine dimension reduction and clustering techniques. Qualitative data Quantitative data multiple correspondence analysis and Tandem-analysis clustering [Hwang et al. (2006)] [Arabie and Hubert(1994)] non-symmetric correspondence analysis and Factor K-means clustering [Vichi and Kiers(2001)] [Palumbo and Iodice D’Enza(2010)] An evolutionary analysis of association patterns 3 / 25

Introduction Notation Criterion Procedure Example Aim and scope This contribution consists of a dynamic clustering procedure for high dimensional binary data that are arranged into subsequent batches; the first data batch is used to determine a ‘starting’ solution that is updated as further data batches are processed. two-fold problem clustering very large data sets or data produced at a high rate (data flows); perform a comparative analysis of data stratified according to time or space. An evolutionary analysis of association patterns 4 / 25

Introduction Notation Criterion Procedure Example Notation and data structures n number of statistical units; p number of binary attributes; K number of groups of statistical units. Z j , 1 . . . , p , Bernoulli distributed attribute (with z indicating success and ¯ z failure) with parameter π j . X = ( X 1 , X 2 , . . . , X K ) random vector multinomial distributed with parameters ( n ; π 1 , π 2 , . . . , π K ), where π k ( k = 1 , . . . , K ) are unknown. An evolutionary analysis of association patterns 5 / 25

Introduction Notation Criterion Procedure Example Criterion Cross-classification table F of X and a single binary attribute Z Z ¯ z z 1 f 11 f 12 f 1+ . . . . X . . . . . . . . K f K 1 f K 2 f K + f +1 f +2 n An evolutionary analysis of association patterns 6 / 25

Introduction Notation Criterion Procedure Example Criterion The qualitative variance, or heterogeneity, of X can be defined by the Gini index K K � 2 f 2 � f k + � � k + G ( X ) = 1 − = 1 − n 2 . n k =1 k =1 The variation of X within the categories of the variable Z is obtained by averaging G ( X | z ) and G ( X | ¯ z ) 2 � � 2 K K f 2 f 2 = 1 − 1 f + h � � � � kh kh G ( X | Z ) = 1 − f 2 n n f + h + h h =1 k =1 k =1 h =1 An evolutionary analysis of association patterns 7 / 25

Introduction Notation Criterion Procedure Example Criterion The variation of X explained by the categories of Z is 2 K � K � f 2 f 2 1 − 1 � k + � � kh G ( X ) − G ( X | Z ) = 1 − n 2 − = n f + h k =1 k =1 h =1 K 2 K f 2 f 2 = 1 − 1 � � � k + kh n f + h n n k =1 h =1 k =1 An evolutionary analysis of association patterns 8 / 25

Introduction Notation Criterion Procedure Example Criterion In the case of p binary attributes the criterion being maximized is p � ( G ( X ) − G ( X | Z j )) j =1 that is the sum of variances of X explained by each of the attributes Z j . An evolutionary analysis of association patterns 9 / 25

Introduction Notation Criterion Procedure Example Algebraic formalization Quantity to maximize � 1 n F (∆) − 1 F T − 1 � X T 11 T X �� ≡ tr n 2 � 1 n X T Z (∆) − 1 Z T X − 1 � X T 11 T X �� ≡ tr n 2 where X is a ( n × K ) matrix with x ik = 1 is the unit i is assigned to group k , F = [ F 1 . . . F p ] = X T Z , ∆ = diag ( Z T Z ) and 1 is a n -dimensional vector of ones. Eigenvalue decomposition 1 � X T Z (∆) − 1 Z T X − 1 �� X T 11 T X � U = ΛU . n n An evolutionary analysis of association patterns 10 / 25

Introduction Notation Criterion Procedure Example Back to the aim This contribution consists of a dynamic clustering procedure for high dimensional binary data that are arranged into subsequent batches; the first data batch is used to determine a ‘starting’ solution that is updated as further data batches are processed. two-fold problem clustering very large data sets or data produced at a high rate (data flows); perform a comparative analysis of data stratified according to time or space. An evolutionary analysis of association patterns 11 / 25

Introduction Notation Criterion Procedure Example The overall procedure The proposed procedure consists of three phases. phase 1 Analysis of the starting batch : the i -FCB 3 procedure is applied to obtain the starting solution[Palumbo and Iodice D’Enza(2010)]; phase 2 new batch processing : incoming statistical units are assigned to the K groups; phase 3 updating process : all the quantities are updated according to new data. Phases 2 and 3 are repeated for each new data batch. 3 iterative factorial clustering of binary data An evolutionary analysis of association patterns 12 / 25

Introduction Notation Criterion Procedure Example phase 1: starting batch The i -FCB iterative algorithm runs over the following steps: step 0 : pseudo-random generation of matrix X ; step 1 : an eigenvalue decomposition is performed on the matrix resulting from expression 1, obtaining the matrix Ψ, such that � Z (∆) − 1 Z T − 1 � 1 n 11 T Ψ = XU Λ 2 ; (1) step 2 : matrix X is updated according to a Euclidean squared distance-based non-hierarchical clustering algorithm ( k -means) on the projected statistical units (Ψ matrix). Steps 1 and 2 are iterated until the stopping rule is verified: the quantity in 1 does not significantly increase from one iteration to the next. An evolutionary analysis of association patterns 13 / 25

Introduction Notation Criterion Procedure Example convergence of the criterion number of iterations versus value of the criterion: 1000 repetitions. Unstructured data Structured data An evolutionary analysis of association patterns 14 / 25

Introduction Notation Criterion Procedure Example phase 3: updating process update of the number of units: n ∗ = n + n + ; update of cross-tabulation block matrix: F ∗ = F + F + , with F + = Z +T X + ; update of the diagonal matrix of margins: ∆ ∗ = ∆ + ∆ + , with ∆ + = diag � Z +T Z + � update of eigenvalue decomposition: � f ∗ f ∗ T �� 1 F ∗ (∆ ∗ ) − 1 F ∗ T − 1 U ∗ = Λ ∗ U ∗ � n ∗ n ∗ where f ∗ is the row-margin vector of the F ∗ matrix. An evolutionary analysis of association patterns 15 / 25

Introduction Notation Criterion Procedure Example Application: synthetic data The number of binary attributes is p = 12, V 1 , V 2 , . . . , V 12; starting block: 200 statistical units described by uncorrelated items first block: 100 statistical units with V 1 , V 2 , V 3 highly correlated, 100 statistical units with V 10 , V 11 , V 12 highly correlated; second block: 400 statistical units described by uncorrelated items third block: 100 statistical units with V 4 , V 5 , V 6 highly correlated, 100 statistical units with V 7 , V 8 , V 9 highly correlated; notes The number of clusters is K = 3. Synthetic data are obtained using the R-package bindata , by Leisch. An evolutionary analysis of association patterns 16 / 25

Introduction Notation Criterion Procedure Example Visualization of the results A common visualization support The procedure produces a different factorial plan for each update. In order to visualize the evolving association structure of the considered attributes as new data comes in, a three-way multidimensional scaling (MDS) is used. MDS visualization For the starting matrix F and for its updates F ∗ a matrix of chi-square distances among attributes is computed. A three-way MDS on the resulting three-way distance matrix is performed, using the package smacof by de Leeuw and Mair. An evolutionary analysis of association patterns 17 / 25

Introduction Notation Criterion Procedure Example Application An evolutionary analysis of association patterns 18 / 25

Introduction Notation Criterion Procedure Example Application: real-world data The ‘retail’ data set The retail market basket data set is supplied by a anonymous Belgian retail supermarket store. The data are collected over three non-consecutive periods, for a time range of approximately 5 months of data. The total amount of receipts (statistical units) being collected equals n = 88163; the number of products (binary attributes) p = 28549. An evolutionary analysis of association patterns 19 / 25

An evolutionary analysis of association patterns Alfonso Iodice DEnza - PowerPoint PPT Presentation

Introduction Notation Criterion Procedure Example An evolutionary analysis of association patterns Alfonso Iodice DEnza 1 Francesco Palumbo 2 Correspondence Analysis and Related MEthods 2011 Rennes, 8 - 11 February 2011 1Universit` a di

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

6 th A NNUAL H UMIES A WARDS Evolutionary Learning of Local Descriptor Evolutionary

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky & Yoad Lewenberg

Principles and Techniques of Evolutionary Architecture Rebecca Parsons Chief Technology O ffi cer

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

Outline DM812 METAHEURISTICS Lecture 6 Evolutionary Algorithms 1. Evolutionary Algorithms

Models of Language Evolution Session 04 : Evolutionary Game Theory: Evolutionary Dynamics Michael

How competition affects evolutionary rescue: theoretical insight Matthew Osmond Claire de

Design Patterns Applications Programming What is design patterns? The design patterns are

Design Patterns 1 What are Design Patterns? Design patterns describe common (and successful)

Separation and convexity properties of hierarchical and non hierarchical clustering Patrice

Gated Mode Testing with PXD9 Pilot 20th International Workshop on DEPFET Detectors and

On robust estimation and smoothing 2 with spatial and tonal kernels 3 4 Pavel Mr azek

Sonar Map Stitching Lior Alezra Advisor: Dr. Gera Weiss Main Purpose: Build a desktop

Analysis of variance and regression December 4, 2007 Variance component models Variance

GMaVis: A Domain-Specific Language for Large-Scale Geospatial Data Visualization Supporting

Linear Classification Linear separability Inseparability Real world problems: there may not

Nonlinear Equations Nonlinear system of equations Robotic arms

An evolutionary analysis of association patterns Alfonso Iodice DEnza - PowerPoint PPT Presentation

Introduction Notation Criterion Procedure Example An evolutionary analysis of association patterns Alfonso Iodice DEnza 1 Francesco Palumbo 2 Correspondence Analysis and Related MEthods 2011 Rennes, 8 - 11 February 2011 1Universit` a di

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

6 th A NNUAL H UMIES A WARDS Evolutionary Learning of Local Descriptor Evolutionary

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky &amp; Yoad Lewenberg

Principles and Techniques of Evolutionary Architecture Rebecca Parsons Chief Technology O ffi cer

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

Outline DM812 METAHEURISTICS Lecture 6 Evolutionary Algorithms 1. Evolutionary Algorithms

Models of Language Evolution Session 04 : Evolutionary Game Theory: Evolutionary Dynamics Michael

How competition affects evolutionary rescue: theoretical insight Matthew Osmond Claire de

Design Patterns Applications Programming What is design patterns? The design patterns are

Design Patterns 1 What are Design Patterns? Design patterns describe common (and successful)

Separation and convexity properties of hierarchical and non hierarchical clustering Patrice

Gated Mode Testing with PXD9 Pilot 20th International Workshop on DEPFET Detectors and

On robust estimation and smoothing 2 with spatial and tonal kernels 3 4 Pavel Mr azek

Sonar Map Stitching Lior Alezra Advisor: Dr. Gera Weiss Main Purpose: Build a desktop

Analysis of variance and regression December 4, 2007 Variance component models Variance

GMaVis: A Domain-Specific Language for Large-Scale Geospatial Data Visualization Supporting

Linear Classification Linear separability Inseparability Real world problems: there may not

Nonlinear Equations Nonlinear system of equations Robotic arms

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky & Yoad Lewenberg