Level sets estimation of random compact sets P. Heinrich, R. S. Stoica and C. V. Tran Universit´ e Lille 1 - Laboratoire Paul Painlev´ e Workshop in Spatial Statistics and Image Analysis in Biology Avignon, May 9-11 2012
Introduction : motivating example Level sets : a tool for compact random sets averaging Estimation of level sets Examples of application Conclusions and perspectives
A practical application (1) Pattern detection in spatial data : ◮ the data d : image analysis, epidemiology, galaxy catalogues ◮ detect and characterise the pattern “hidden” in the data : objects, cluster pattern or filamentary network ◮ hypothesis : the pattern is the outcome γ of a stochastic process Γ ◮ possible solution in this context : probabilistic modelling and maximisation
A practical application (2) Gibbs modelling framework ◮ Markov random fields, marked point processes, etc. ◮ general structure of the probability density : h ( γ | θ ) = exp [ − U d ( γ | θ ) − U i ( γ | θ )] α ( θ ) and also the necessary mathematical details so that everything is well defined ...
A practical application (3) Gibbs modelling framework (continued) ◮ U d ( γ | θ ) : this term is related to the objects location in the data field (inhomogeneous process) ◮ U i ( γ | θ ) : this term is related to the object interaction and to the morphology of the pattern (prior model, regularisation term) ◮ α ( θ ) : normalisation constant (not always available analytically) ◮ pattern estimator : � γ = arg max γ ∈ Ω { h ( γ | θ ) } = arg min γ ∈ Ω { U d ( γ | θ ) + U i ( γ | θ ) } (1)
A practical application (4) Some concluding remarks ◮ simulated annealing algorithm : convergence towards the uniform distribution on the solution sub-space given by (1) ◮ the model parameters are not always known ... ◮ the convergence is difficult to be stated ◮ ... or the solution is not always unique (continuous models and/or priors on the model parameters) ◮ ⇒ a real need to average the obtained solution in order to obtain a much more robust solution Idea : use level sets as a tool for averaging random patterns
Level sets : basic notions and definitions (1) Random compact sets and coverage function : ◮ (Ω , A , P ) : probability space ◮ ( W = [0 , 1] d , B , ν ) : measure space (... where the data field leaves) with B the corresponding Borel σ − algebra and ν the Lebesgue measure ◮ C : the class of compact sets in W A random compact set Γ in W is a random map from Ω to C that is measurable in the sense ∀ C ∈ C , { ω : Γ( ω ) ∩ C � = ∅} ∈ A The coverage function is given by : p ( w ) = P ( w ∈ Γ)
Level sets : basic notions and definitions (2) Level or Quantile sets : for α ∈ [0 , 1] the (deterministic) α − level set is Q α = { w ∈ W : p ( w ) > α } or for simplicity { p > α } . Vorob’ev expectation : the Borel set E V Γ such that ν ( E V Γ) = E [ ν (Γ)] and { p > α ∗ } ⊂ E V Γ ⊂ { p ≥ α ∗ } , where α ∗ = inf { α ∈ [0 , 1] : ν ( Q α ) ≤ E [ ν (Γ)] } . The Vorob’ev expectation is the α ∗ − level set that matches the mean volume of Γ.
Some known results and properties (1) ✻ ν ( W ) F − ( α 0 ) ⊂ F ( α 0 ) • ✲ 0 α 0 α 1 α 2 1 Figure: Behaviour of function F ( α ) = ν ( Q α ) Remarks : ◮ F is c` adl` ag with constant regions (plateaux) ◮ constant regions of p ( w ) ⇒ discontinuities of ν ( Q α ) ◮ constant regions of ν ( Q α ) ⇒ discontinuities of p ( w )
Some known results and properties (2) Vorob’ev expectation : ◮ it is unique provided F ( α ) = ν ( Q α ) = ν ( { p > α } ) is continuous at α ∗ ; then we have E V Γ = { p ≥ α ∗ } ◮ it minimises B → E [ ν ( B △ Γ)] under the constraint ν ( B ) = E [ ν (Γ)], where △ is the symmetric difference (Molchanov, 05). More generally, on level sets : ◮ p ( w ) not always available in an analytical closed form ◮ the level sets cannot be computed for all the points w ∈ W ⇒ discretisation should be considered
Plug-in estimation (1) Definition ◮ consider n i.i.d. copies Γ 1 , Γ 2 , . . . , Γ n of Γ ◮ the empirical counterpart of p ( w ) n � p n ( w ) = 1 1 { w ∈ Γ i } n i =1 ◮ the plug-in estimator Q n ,α = { p n > α }
Plug-in estimation (2) Properties : the problem was deeply studied in the literature ◮ some references : (Molchanov, 87, 90, 98), (Cuevas, 97, 06) and many others ◮ L 1 − consistency under weak assumptions → p ( w ) does not need to be continuous ◮ Hausdorff distance : similar consistency results using some extra assumptions ◮ rates of convergence and asymptotic normality : regularity conditions on p ( w ) Aim of our work ◮ plug-in estimator that takes into account the discretisation effects ◮ estimator for the Vorob’ev expectation → its definition contains another quantity that need approximation ...
A new level-set estimator (1) Discretisation : for any Borel set B in W and r ∈ 2 − N , its corresponding grid approximation is � B r = [ w , w + r ) d . w ∈ B ∩ r Z d Regularity : the “upper box counting dimension” of ∂ B is log N r ( ∂ B ) dim box ( ∂ B ) = lim sup , − log r r → 0 with N r ( ∂ B ) = Card { w ∈ r Z d : [ w , w + r ) d ∩ ∂ B � = ∅} .
A new level-set estimator (2) Proposition Assume that dim box ( ∂ B ) < d. For all ε > 0 , there exists r ε such that 0 < r < r ε ⇒ ν ( B r △ B ) ≤ r d − dim box ( ∂ B ) − ε . Proposition Assume that dim box ( ∂ Γ) ≤ d − κ with probability one for some κ > 0 . For all α such that ν ( { p = α } ) = 0 , (i) with probability 1 , � � Q r lim n ,α △ Q α = 0 ν r → 0 n →∞ (ii) for all ε > 0 , � � �� ≤ r κ + 2 e − 2 n ε 2 + F ( α − ε ) − F ( α + ε ) . Q r n ,α △ Q α E ν The proof is an extension of the result in (Cuevas, 06).
Vorob’ev expectation estimator (1) Kovyazin’s mean : the empirical counter-part of the Vorob’ev expectation. That is the Borel set K n such that n � ν ( K n ) = 1 ν (Γ i ) n i =1 and { p n > α ∗ n } ⊂ K n ⊂ { p n ≥ α ∗ n } , where α ∗ n = inf { α ∈ [0 , 1] : ν ( { p n > α } ) ≤ ν ( K n } . ) Theorem Assume that ν ( { p = α ∗ } ) = 0 . Then, with probability one, n →∞ ν ( K n △ E V Γ) = 0 . lim The proof revisits the result given by (Kovyazin, 86).
Vorob’ev expectation estimator (2) Grid approximation of K n : this is the estimator we propose. That is the Borel set K n , r such that n , r } r ⊂ K n , r ⊂ { p n ≥ α ∗ { p n > α ∗ n , r } r , where α ∗ n , r = inf { α ∈ [0 , 1] : ν ( { p n > α } r ) ≤ ν ( K n ) } . Some remarks : ◮ quite strong assumption : ν ( K n ) is computed exactly ... ◮ an alternative idea may consider directly the discretisation of Γ or K n , but this does not guarantee a mean volume equal to ν ( K n ) ... ◮ still, in practice ...
Consistency of the Vorob’ev estimator Theorem Assume that dim box ( ∂ Γ) ≤ d − κ with probability one for some κ > 0 , and that ν ( { p = α ∗ } ) = 0 and ν ( { p = β ∗ } ) = 0 with β ∗ = sup { α ∈ [0 , 1] : ν ( { p > α } ) ≥ E [ ν (Γ)] } . Then, we have almost surely lim ν ( K n , r △ E V Γ) = 0 . r → 0 n →∞ Proof. We write that ν ( K n , r △ E V Γ) ≤ ν ( K n , r △ K n ) + ν ( K n △ E V Γ) and use Theorem 1 and two lemmas to conclude. For technical details, a draft is available on demand ...
Cosmic filaments : simulated annealing detection (Stoica, Martinez and Saar, 07,10) 12 10 100 8 6 90 4 2 80 70 50 60 40 50 40 30 30 20 20 a) 10 10 10 5 0 100 90 80 50 70 40 60 30 50 40 20 30 10 20 b) 0 10 Figure: a) Original data. b) Cylinder configuration detected.
Cosmic filaments : level sets averaging 14000 12000 10000 8000 6000 4000 2000 a) 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 b) Figure: a) Behaviour of the level set volume. b) Estimated Vorob’ev expectation.
Epidemiology (veterinary context) Disease : sub-clinical mastitis for diary herds ◮ points → farms location ◮ to each farm → disease score (continuous variable) ◮ clusters pattern detection : regions where there is a lack of hygiene or rigour in farm management 300 250 200 150 100 50 0 0 50 100 150 200 250 300 350 Figure: The spatial distribution of the farms outlines almost the entire French territory (INRA Avignon).
Epidemiology : sub-clinical mastitis data (Stoica, Gay and Kretzschmar, 07) 1 300 0.9 50 250 0.8 0.7 100 200 0.6 150 0.5 150 0.4 200 100 0.3 250 0.2 50 0.1 300 a) b) 0 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 Figure: Disease data scores and coordinates for the year 1996 : a) cluster pattern (disk configuration) detected ; b) Level sets.
Conclusion : ◮ estimator including the discretisation effects ◮ averaging the shape of the pattern ... Perspectives : ◮ ... provided the model is correct ... ◮ relax hypotheses ◮ what is the variance of the pattern ? Acknowledgements : this work was done together with wonderful co-authors and also with help of some very generous people ... Some of them are today with us :) ...
Recommend
More recommend