Sequential Detection and Isolation of a Correlated Pair Anamitra - PowerPoint PPT Presentation

Sequential Detection and Isolation of a Correlated Pair Anamitra Chaudhuri Department of Statistics University of Illinois, Urbana-Champaign Joint work with Georgios Fellouris 2020 IEEE International Symposium on Information Theory Los Angeles, California 21-26 June, 2020

Introduction

Motivation – Quickest inference about the underlying dependence structure. – Environmental monitoring, sensor networks, fault detection in power grid, neural coding etc. – In this context, – data are observed sequentially and the sample size is not fixed in advance, – there are multiple hypotheses regarding the dependence structure. – Goal: stop sampling as quickly as possible and identify the true hypothesis while controlling the probability of errors .

Related works – Detection and isolation of the correlation structure in a p − variate Gaussian random vector. – p = 2: Sequential hypothesis testing for the correlation coefficient ρ in bivariate Gaussian - Binary hypothesis testing [Choi, 1971, Kowalski, 1971, Pradhan and Sathe, 1975, Wolde-Tsadik, 1976, Wald, 1945, . . . ] - Two sided version [Woodroofe, 1979] – p > 2: Sequential multiple testing and design - Observation from only one component is taken at each time, temporal dependence [Heydari and Tajer, 2017] – Sequentially observed data from independent streams, simultaneous testing of multiple binary hypotheses. [Song and Fellouris, 2017]

Goal In this work, – data from all sources are observed sequentially, – the observations are independent over time , – at most one pair of its components is correlated. Goal: – stop sampling as quickly as possible, – identify the correlated pair, if there is any, – control three kinds of errors: - False Alarm: Detecting a correlated pair when there is none. - Missed Detection: Failing to detect a correlated pair when there is one. - Wrong Isolation: Identifying the wrong correlated pair when there is one.

Problem formulation

Problem Setup – p information sources: { X i ( t ) : t ∈ N } , i = 1 . . . p . iid - For a fixed source i ∈ { 1 , . . . , p } , X i ( t ) ∼ N (0 , 1) , t ∈ N . - The set of all (unordered) pairs: E := { ( i , j ) : 1 ≤ i < j ≤ p } - At each time t ∈ N , Corr ( X k ( t ) , X l ( t )) = ρ e , where e ∈ E such that e = ( k , l ). – Given a user-specified value ρ ∗ ∈ (0 , 1), we perform multiple testing - for each e ∈ E , H 0 : ρ e = 0 vs. H 1 : | ρ e | = ρ ∗ , - when at most one of the � p � nulls should be rejected. 2

Problem Setup – F t = σ ( X (1) , . . . , X ( t )), where X ( t ) = ( X 1 ( t ) , X 2 ( t ) , . . . , X p ( t )). – A sequential test ( τ, d ) consists of: - an {F t } -stopping time, τ , at which we stop sampling , - and an {F τ } -measurable decision rule d , which denotes the subset of pairs declared to be correlated upon stopping. – Since there is at most one correlated pair, let - P 0 : prob. measure when all sources are independent. - P e + (resp. P e − ): when the pair e has correlation ρ ∗ (resp. − ρ ∗ ) and all other sources are independent.

Problem Setup – ∆( α, β, γ ): the class of sequential tests ( τ, d ) for which - False alarm: P 0 ( d � = ∅ ) ≤ α, - Missed detection: for all e ∈ E , P e + ( d = ∅ ) , P e + ( d = ∅ ) ≤ β, - Wrong Isolation: for all e ∈ E , P e + ( d � = ∅ , d � = { e } ) , P e − ( d � = ∅ , d � = { e } ) ≤ γ. – Problem: Find ( τ, d ) ∈ ∆( α, β, γ ) that minimizes E [ τ ] under P 0 and P e + , P e − for every e ∈ E to a first order asymptotic approximation as α, β, γ → 0.

Notations and Statistics – For each e ∈ E , the likelihood ratios Λ e + ( n ) := dP e + Λ e − ( n ) := dP e − ( F ( n )) , ( F ( n )) . dP 0 dP 0 – Mixture likelihood ratio statistic for the two sided testing problem: Λ e ( n ) := Λ e + ( n ) + Λ e − ( n ) . 2 – At time n , the ordered mixture likelihood ratio statistics are: � p � Λ (1) ( n ) ≥ . . . Λ ( K ) ( n ) , and Λ i k ( n ) ≡ Λ ( k ) ( n ) , k = 1 . . . K := . 2

Proposed Procedure

Proposed Rule Inspired by the gap-intersection rule proposed in [Song and Fellouris, 2017], our proposed procedure is ( τ ∗ , d ∗ ), where – τ ∗ := min { τ 1 , τ 2 } , with - τ 1 := inf { n ≥ 1 : Λ (1) ( n ) ≤ 1 / A } , - τ 2 := inf { n ≥ 1 : Λ (1) ( n ) ≥ B , Λ (1) ( n ) / Λ (2) ( n ) ≥ C } . � ∅ if τ 1 < τ 2 , – d ∗ := i 1 ( τ ∗ ) if τ 2 < τ 1 .

Illustration � � � � 1 0 . 8 0 1 0 0 Σ = . Σ = . 0 . 8 1 0 0 1 0 0 0 1 0 0 1 15 15 (1,2) 10 10 log(B) log(B) 5 5 log(C) log(statistic) log(statistic) 0 0 (2,3) (2,3) −5 −5 −log(A) −log(A) (1,2) −10 −10 (3,1) (3,1) stop sampling stop sampling −15 −15 0 5 10 15 20 25 30 0 5 10 15 20 25 30 sample size sample size

Error Control � p � Recall, K = . 2 Theorem For any A , B , C > 1 , we have P 0 ( d ∗ � = ∅ ) ≤ K / B , P e + ( d ∗ = ∅ ) = P e − ( d ∗ = ∅ ) ≤ 1 / A , P e + ( d ∗ � = ∅ , d ∗ � = { e } ) = P e − ( d ∗ � = ∅ , d ∗ � = { e } ) ≤ ( K − 1) / C . In particular, ( τ ∗ , d ∗ ) ∈ ∆( α, β, γ ) when A = 1 α and C = K − 1 B = K β , . (1) γ

Asymptotic Upper Bound – For each e ∈ E , the KL information numbers D 0 := E 0 [ − log Λ e + (1)] = E 0 [ − log Λ e − (1)] , D 1 := E e + [log Λ e + (1)] = E e − [log Λ e − (1)] . – Let x ∧ y := min { x , y } , x ∨ y := max { x , y } . Lemma Let e ∈ E . As A , B , C → ∞ we have E 0 [ τ ∗ ] ≤ log A (1 + o (1)) , D 0 � log B log C � � E e − [ τ ∗ ] , E e + [ τ ∗ ] ≤ (1 + o (1)) . D 0 + D 1 D 1

Asymptotic Optimality

Universal Lower Bound - Let � x � � 1 − x � + (1 − x ) log x , y ∈ (0 , 1) . h ( x , y ) := x log , 1 − y y Lemma If α, β, γ ∈ (0 , 1) such that α + β < 1 and β + 2 γ < 1 , e ∈ E , and ( τ, d ) ∈ ∆( α, β, γ ) , then E 0 [ τ ] ≥ h ( α, β ) , D 0 � h ( β + γ, γ ) ∨ h ( γ, β + γ ) E e + [ τ ] , E e − [ τ ] ≥ h ( β, α ) . D 1 D 0 + D 1

Simulation Study

An Alternate Rule – An alternate rule ( τ int , d int ) is a modification of the intersection rule proposed in [De and Baron, 2012], where - τ int := inf { n ≥ 1 : 0 ≤ p ( n ) ≤ 1 and Λ e ( n ) / ∈ (1 / A , B ) for all e ∈ E} , � ∅ if p ( τ int ) = 0 , - d int := , i 1 ( τ int ) otherwise . - p ( n ) = |{ e ∈ E : Λ e ( n ) > 1 }| . – ( τ int , d int ) ∈ ∆( α, β, γ ) when the thresholds are A = 1 � K α , K − 1 � and B = max . β γ

Illustration � � � � 1 0 . 8 0 1 0 0 Σ = . Σ = . 0 . 8 1 0 0 1 0 0 0 1 0 0 1 15 15 (1,2) 10 10 log(B) log(B) 5 5 log(C) log(statistic) log(statistic) 0 0 (1,2) (2,3) −5 −5 (2,3) −log(A) −log(A) −10 −10 (3,1) (3,1) proposed rule stops proposed rule stops intersection rule (modified) stops intersection rule (modified) stops −15 −15 0 5 10 15 20 25 30 0 5 10 15 20 25 30 sample size sample size

Comparison – p = 10 , ρ ∗ = 0 . 7 , α = β = 10 − 2 , γ = 10 − 3 . – only one pair is correlated with correlation coefficient ρ , all others are uncorrelated. – varied the value of ρ in the interval ( − 0 . 9 , 0 . 9). 100 Intersection Rule Proposed Rule 80 Expected Sample Size 60 40 20 −0.7 0.0 0.7 True value of correlation in the correlated pair

Summary

Summary – Proposed the problem of quick detection and isolation of a correlated pair in a Gaussian random vector. – Sequential multiple testing that controls three kinds of error : false alarm, missed detection and wrong isolation. – Goal: Minimize the average sample size subject to three error constraints. – Proposed a very simple rule based on the mixture likelihood ratios of the pairs and established its asymptotic optimality. – We compared our rule with an alternative one numerically and showed that its performance is significantly better, especially when the true value of the correlation is much higher .

References

References i Choi, S. C. (1971). Sequential test for correlation coefficients. Journal of the American Statistical Association , 66(335):575–576. De, S. K. and Baron, M. (2012). Sequential bonferroni methods for multiple hypothesis testing with strong control of family-wise error rates i and ii. Sequential Analysis , 31(2):238–262. Heydari, J. and Tajer, A. (2017). Quickest search for local structures in random graphs. IEEE Transactions on Signal and Information Processing over Networks , 3(3):526–538.

Sequential Detection and Isolation of a Correlated Pair Anamitra - PowerPoint PPT Presentation

Sequential Detection and Isolation of a Correlated Pair Anamitra Chaudhuri Department of Statistics University of Illinois, Urbana-Champaign Joint work with Georgios Fellouris 2020 IEEE International Symposium on Information Theory Los

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Closest Pair of Points Cormen et.al 33.4 Closest Pair of Points Closest pair. Given n points in

GCC Highlighted Products GSure Gel Extraction kit GSure Soil DNA Isolation kit GSure Sputum DNA

Serializable Snapshot Isolation Making ISOLATION LEVEL SERIALIZABLE Provide Serializable

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

A whole genome approach for QTL detection using a linear mixed model with correlated marker

ADAPTED SPAULDING PYRAMID Making Isolation: How does it work? Patient Isolation- Creating

Introduction to pixel track isolation The purpose of track isolation algorithm is an additional

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Course : Data mining Lecture : Computing basic graph statistics Aristides Gionis Department of

slides of Layered Adaptive Importance Sampling Presentation June 2016 CITATION READS 1 40 3

noise and number of sensors Giovanni Capellari Eleni Chatzi Stefano Mariani 3 rd International

Understanding MCMC Dynamics as Flows on the Wasserstein Space Chang Liu, Jingwei Zhuo, Jun Zhu 1

Choice with multiple alternatives 5.2 Specification of the deterministic part Michel

Online k -MLE for mixture modelling with exponential families Christophe Saint-Jean Frank

Some results on convolution idempotents May 28, 2020 1 IIT Hyderabad, India 2 Stanford University

Sparse Coding and Dictionary Learning for Image Analysis Part IV: New sparse models Francis