statistical modelling of a terrorist network with the
play

Statistical modelling of a terrorist network with the latent class - PowerPoint PPT Presentation

Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons Murray Aitkin and Duy Vu, and Brian Francis murray.aitkin@unimelb.edu.au duy.vu@unimelb.edu.au b.francis@lancaster.ac.uk School of


  1. Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons Murray Aitkin and Duy Vu, and Brian Francis murray.aitkin@unimelb.edu.au duy.vu@unimelb.edu.au b.francis@lancaster.ac.uk School of Mathematics and Statistics, University of Melbourne and Department of Mathematics and Statistics, University of Lancaster, UK BOB 2015 Terrorist network – p. 1

  2. Statistical modelling of social networks Work supported by Australian Research Council 2004-7, 2012-15. Aim: to evaluate latent class modelling by maximum likelihood and Bayesian methods for the analysis of social network and criminal career data. Participants Murray Aitkin, Pip Pattison, Brian Francis, Duy Vu. Main contribution: identifying subgroups, their number and membership, by latent class modelling and Bayesian model comparison. Two examples: • Social network of the Natchez Mississippi women – see Aitkin, M., Vu, D. and Francis, B.J. (2014). Statistical modelling of the group structure of social networks. Social Networks 38, 74-87. • Noordin Top terrorist network – Aitkin, M., Vu, D. and Francis, B.J. (2016) – to appear in an RSS journal (A or C). BOB 2015 Terrorist network – p. 2

  3. Where do network data come from? For social networks – networks of people or other social creatures – from either • direct observation, or • indirect data gathering though newpapers and other recording instruments. These sources of information provide the evidence of connections among actors . The connections are presented mathematically, and are analysed through properties of the mathematical structure. BOB 2015 Terrorist network – p. 3

  4. Facebook friendship network (Wikipedia) BOB 2015 Terrorist network – p. 4

  5. Unipartite and bipartite networks The Facebook network is a unipartite network – it represents direct connections between the Facebook users. These connections may be directed (A likes B does not imply that B likes A) or undirected or reciprocal (A and B are connected through something, like Facebook). We discuss bipartite networks, in which the connections between actors are through their joint participation in events. BOB 2015 Terrorist network – p. 5

  6. The Natchez social network BOB 2015 Terrorist network – p. 6

  7. The adjacency matrix To perform any analysis we need to re-express the table elements mathematically through a link, or tie variable Y ij , with the presence of woman i at event j defining Y ij = 1 , and her absence from the event defining Y ij = 0 . We use n to denote the number of rows – women, and r to denote the number of columns – events. The resulting table is expressed as an n × r matrix, called the adjacency matrix, denoted by Y . Marginal totals (T) have been added to the table, giving the total number of events attended by each woman, and the total number of women attending each event. We see that women vary in their propensity to attend events, and events vary in their attractiveness to women. We also give the marital status of woman i in a variable x i , coded 1 for married and 0 for unmarried. BOB 2015 Terrorist network – p. 7

  8. Two-mode network data x W \ E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 T 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 8 0 2 1 1 1 0 1 1 1 1 0 0 0 0 0 0 7 0 3 0 1 1 1 1 1 1 1 1 0 0 0 0 0 8 0 4 1 0 1 1 1 1 1 1 0 0 0 0 0 0 7 0 5 0 0 1 1 1 0 1 0 0 0 0 0 0 0 4 0 6 0 0 1 0 1 1 0 1 0 0 0 0 0 0 4 0 7 0 0 0 0 1 1 1 1 0 0 0 0 0 0 4 0 8 0 0 0 0 0 1 0 1 1 0 0 0 0 0 3 0 9 0 0 0 0 1 0 1 1 1 0 0 0 0 0 4 0 10 0 0 0 0 0 0 1 1 1 0 0 1 0 0 4 0 11 0 0 0 0 0 1 0 1 1 1 0 1 0 0 5 0 12 0 0 0 0 0 0 0 1 1 1 0 1 1 1 6 1 13 0 0 0 0 0 0 1 1 1 1 0 1 1 1 7 1 14 0 0 0 0 0 1 1 0 1 1 1 1 1 1 8 1 15 0 0 0 0 0 0 1 1 0 1 1 1 0 0 5 1 16 0 0 0 0 0 0 0 1 1 0 0 0 0 0 2 1 17 0 0 0 0 0 0 0 0 1 0 1 0 0 0 2 1 18 0 0 0 0 0 0 0 0 1 0 1 0 0 0 2 T 3 3 6 4 8 9 10 14 12 5 4 6 3 3 90 BOB 2015 Terrorist network – p. 8

  9. Two-mode network data – zeros suppressed x W \ E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 T 1 1 1 1 1 1 1 1 1 1 1 8 0 2 1 1 1 1 1 1 1 7 0 3 1 1 1 1 1 1 1 1 8 0 4 1 1 1 1 1 1 1 7 0 5 1 1 1 1 4 0 6 1 1 1 1 4 0 7 1 1 1 1 4 0 8 1 1 1 3 0 9 1 1 1 1 4 0 10 1 1 1 1 4 0 11 1 1 1 1 1 5 0 12 1 1 1 1 1 1 6 1 13 1 1 1 1 1 1 1 7 1 14 1 1 1 1 1 1 1 1 8 1 15 1 1 1 1 1 5 1 16 1 1 2 1 17 1 1 2 1 18 1 1 2 T 3 3 6 4 8 9 10 14 12 5 4 6 3 3 90 BOB 2015 Terrorist network – p. 9

  10. Original Matrix 18 Actors 14 Events BOB 2015 Terrorist network – p. 10

  11. Random Shuffled Matrix 18 Actors 14 Events BOB 2015 Terrorist network – p. 11

  12. Probability models for actors and events Analysis needs to allow for uncertainty in the behaviour of actors: even if they form an established group with other actors, this does not mean that they all attend the same events. We consider the presence or absence of an actor at an event as a random process – attendance is determined by a possibly large number of factors unknown to us, so we represent the process outcome as a Bernoulli random variable: The probability that actor i attends event j , ( Y ij = 1) , is p ij , and that actor i does not attend event j , ( Y ij = 0) , is 1 − p ij . We want to bring the actors and event structures into the event attendance probability in some way. BOB 2015 Terrorist network – p. 12

  13. Models The “null" model is a single-parameter model, giving the same constant probability p ij = p that every actor attends every event, independently across events and actors – all actors have the same attendance probability, and all events have the same attraction probability. The Rasch model has a parameter for each actor and a parameter for each event: • Each actor i has a propensity θ i to attend any event. • Each event j has an attractiveness φ j to any actor. • Actors attend events independently. • The Rasch model is a main effect or additive exponential random graph model (ERGM), in events and actors, on the logit-transformed probability scale: � � p ij logit p ij = log = θ i + φ j . 1 − p ij It has no subgroup structure. BOB 2015 Terrorist network – p. 13

  14. The latent class model This model specifies a K -class latent structure for actors. The K classes are distinguished by K sets of event attendance parameters q jk , different among classes, but identical within classes. The proportion of actors in class k is π k ; θ K = ( K, { π k } , { q jk } ) . The class structure is unobserved; it is implied and identified by the actors’ different patterns of event attendance. The (observed data) likelihood L ( θ K ) – the probability of the observed data – is given by r � q y ij jk (1 − q jk ) 1 − y ij Pr[ { y ij } | k, i ] = j =1   K r q y ij � � jk (1 − q jk ) 1 − y ij Pr[ { y ij } | i ] =  π k  j =1 k =1     n K r   q y ij � � � jk (1 − q jk ) 1 − y ij L ( θ K ) = Pr[ { y ij } ] =  π k  .   i =1 k =1 j =1 BOB 2015 Terrorist network – p. 14

  15. Analysis with the complete data likelihood Bayesian analysis is greatly simplified by introducing counterfactual missing data: the class identification of each actor. We define Z ik = 1 if actor i belongs to class k , and zero otherwise, with (prior) probability π k . If the complete data y ij and Z ik were observed, the complete data likelihood CL ( θ K ) for the K -class model would be CL ( θ K ) = Pr[ { y ij } , { Z ik } ] = Pr[ { y ij }|{ Z ik } ] · Pr[ { Z ik } ] � n   n r K K � jk (1 − q jk ) 1 − y ij � Z ik � q y ij � � �  · � � π Z ik =  k i =1 j =1 k =1 i =1 k =1 Z ik   n K r q y ij � � � jk (1 − q jk ) 1 − y ij =  π k .  i =1 j =1 k =1 BOB 2015 Terrorist network – p. 15

  16. MCMC analysis MCMC iterates between making • random draws of the Z ik given the current parameter draws, and • random draws of the parameters given the current Z ik draws. With flat or non-informative priors on the parameters and the Z ik , the conditional distributions can be inferred from the complete data likelihood: • the Z ik given the parameters are multinomial with probabilities j =1 q y ij � r jk (1 − q jk ) 1 − y ij ; proportional to π k • the parameters given the Z ik : ◦ the π k are Dirichlet with parameters Z + k = � n i =1 Z ik ; ◦ the q jk are Beta with parameters � n i =1 Z ik y ij , � n i =1 Z ik (1 − y ij ) . BOB 2015 Terrorist network – p. 16

Recommend


More recommend