ALPHATECH, Inc. Performance Metrics for Group-Detection Algorithms Presented at Interface 2004 May 29, 2004 Jim White jim.white@alphatech.com Sam Steingold Connie Fournelle
Introduction ALPHATECH, Inc. • What is the group detection problem? • Evaluating Group-Detection Algorithms (GDAs) with synthetic data • Performance metrics • Some performance evaluation results
Introduction to Group Detection ALPHATECH, Inc. Link-quality parameters Putative Links GDA Groups 1. List of proteins 1. P45, P671, P7 2. List of proteins 2. P456, P73 3. List of proteins 3. P7, P55, P873, P1356, P561, P3 … … • Each link is a list of proteins that were observed to be working together or interacting, probably because they belong to a larger group of interacting proteins - Groups may be cellular processes, bio subsystems, … • Links are noisy fragments of evidence, possibly much smaller than the generating groups
Groups and Links ALPHATECH, Inc. Group 1 Group 3 Group 2 Group n Group 4 Orphans • Entities (proteins) • Orphan entities - Exchangeable - Don’t belong to any groups - May belong to more than one group • Link-quality parameters • Groups (processes, systems) - PI = prior probability that a link is clutter - Independent, may overlap (independent of the groups) - Generate links - PR = prior probability that an entity in a • Observed Links group-generated link is noise (not in the - Either group-generated or clutter group) - Each group-generated link is produced by one of the groups
How Links Are Generated ALPHATECH, Inc. 1 - PI PI = Prob. Link is independent of groups Randomly select Select N entities a group, then Make a Clutter from population, randomly sample group-generated Link uniform random that group N link sampling times Add Noise to Link Each entity in link has probability PR of being replaced by an entity from Group- outside the group Generated Link Each link is either clutter or is generated by one of the groups
Evaluating GDAs with Synthetic Data ALPHATECH, Inc. Group Detection System Under Test Noisy GDA GDA Synthetic Links Outputs System Synthetic Statistical Performance Groups Analysis Metrics System performance depends on both the GDA and the information content of the Links
Testing with Synthetic Data Can Answer Important Questions ALPHATECH, Inc. • How many links are needed? • Is link size critical? • How sensitive is performance to noise and clutter? • How does performance vary with # of groups and group size? • What are typical scenarios in which the GDA does very well? • What are problem scenarios in which the GDA under performs? • Testing with synthetic data provides a rational basis for planning follow-up tests with real data.
Performance Metrics ALPHATECH, Inc.
Input-Output Model for Analyzing Detection System Performance ALPHATECH, Inc. Actual Detector Output Group Membership x y Detection System P(y,x) = P(y|x)P(x) Input SNR = Output SNR = P(x=1)/P(x=0) P(x=1|y=1)/P(x=0|y=1) Link data & GDA Indicator variables ( x,y ) for membership of a generic entity in a generic group x = 1 if entity actually belongs to the group, x = 0 otherwise y = 1 if detector assigns the entity to the group, y =0 otherwise Four probabilities characterize detection performance (joint distribution) P(x=0,y=0) P(x=0,y=1) P(x,y) = P(x=1,y=0) P(x=1,y=1)
Performance in a 3-D World ALPHATECH, Inc. The four probabilities in P(x,y) sum to 1, so detection performance lives in a 3-D world A nice parameterization Pg = P(x=1) prior probability of an entity belonging to the group (group prevalence) Fn = P(y=0|x=1 ) false-negative rate (miss rate) Fp = P(y=1|x=0) false-positive rate (false-alarm rate) Classical performance metrics are functions of Fn, Fp, and Pg Error rate Pe = P(x does not equal y) = Pg Fn + (1-Pg) Fp Detection Probability Tp = P(y=1|x=1) = 1 – Fn (Recall metric, sensitivity) Positive Predictive Value PV+ = P(x=1|y=1) (Precision metric) Negative Predictive Value PV- = P(x=0|y=0) Bayes Factor G1 = Posterior odds favoring x=1 divided by prior odds favoring x=1 Signal-to-noise ratios SNR out = G1 SNR in, where SNR in = P(x=1)/P(x=0)
Proficiency Metric Avoids a Limitation of Classical Metrics ALPHATECH, Inc. • No single classical metric is sensitive to both Fn and Fp as input SNR goes to 0 - Analyst must consider two metrics simultaneously to measure performance - ROC curves - Fn and Fp - Precision and Recall - Juggling two metrics complicates algorithm optimization and the interpretation of detection performance - Usual focus on error rate can be very misleading • The Proficiency metric from information theory is never blind - Proficiency = I(x,y) / H(x) - I(x,y) = amount of information about x that is provided by y (the mutual information) - H(x) = the amount of information about x that is required to achieve ideal error-free performance (the entropy of x ) - 0 ≤ Proficiency ≤ 1 - Deficiency is defined as 1 - Proficiency
Definitions of I(X,Y) and H(X) ALPHATECH, Inc. • Mutual information of joint distribution P(x,y) I(X,Y) = ΣΣ P(x,y) log P(x,y) / [ P(x)P(y) ] • Entropy of marginal distribution P(x) H(X) = - Σ p(x) log p(x)
Proficiencies and ROC Curves ALPHATECH, Inc. ROC curves for different proficiencies • Each curve has 0 90 60 70 5 constant proficiency 80 0 2 0.9 0 4 • Red Box 0.8 70 0 3 - Recall > 0.75 60 50 0.7 - Precision > 0.5 - Contains operating 40 0.6 0 2 points such that Recall 30 0.5 Proficiency > 0.56 0 4 P( x =1) is 0.002 0.4 0 3 20 0.3 0 2 Precision = 0.5 0.2 0.1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 Log 10 (False-Positive Rate)
Comparing Deficiency with Error Rate ALPHATECH, Inc. • A detection system is looking for two groups: a large one and a small one - The group sizes are 1/10 and 1/10,000 of the population - The detection system has low error rates: Fn = Fp = 1/10,000 • Deficiency metric shows that the smaller group is harder to find while the Error Rate is the same for both - Deficiencies: 0.0026 vs 0.136 - Error Rates: 0.0001 vs 0.0001 (insensitive to changes in group prevalence) • Precision metric (and the output SNR) track the performance difference in this case - Precision: 0.999 vs 0.5 - Output SNR: 999 vs 1
ALPHATECH, Inc. Example of Performance Evaluation Using Synthetic Data
Sensitivity to Number of Experiments Using CMU’s K-Groups Algorithm (J. Schneider & A. Moore) ALPHATECH, Inc. • Synthetic Universe 100 - 10,000 proteins - 5 groups, each containing 20 90 proteins 80 • Synthetic Links Group Proficiency (%) 512 - Link = proteins observed to 70 interact or work together - Size = 2, 4, or 6 proteins 256 60 - One link per experiment 50 • Link Quality 128 - 10% clutter links 40 - 10% noise in group links 64 30 • Evaluation Objective 32 - Determine proficiency vs # 20 16 experiments 4 8 10 P( x =1) is 0.002 • Google autonlab to get k- 2 groups software 0 0 10 20 30 40 50 60 70 80 90 100 - Unsupervised detection Orphan Proficiency (%)
More Noise or Clutter ALPHATECH, Inc. PR = 0.2 (twice the noise) PI = 0.2 (twice the clutter) 100 100 90 90 80 80 Group Proficiency (%) Group Proficiency (%) 70 70 512 256 60 60 512 256 128 50 50 128 40 40 64 64 30 30 32 32 20 20 16 16 8 8 10 10 2 4 4 2 0 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Orphan Proficiency (%) Orphan Proficiency (%)
Less Noise or Clutter ALPHATECH, Inc. PR = 0.05 (half the noise) PI = 0.05 (half the clutter) 100 100 90 90 80 80 512 512 Group Proficiency (%) Group Proficiency (%) 256 70 70 128 256 60 60 128 50 50 64 64 40 40 32 30 32 30 16 16 20 20 8 8 4 10 10 2 4 2 0 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Orphan Proficiency (%) Orphan Proficiency (%)
Summary ALPHATECH, Inc. • Group detection - Is distinct from clustering - Looks for small groups of interacting entities in large populations • Proficiency Metric - Is a rigorous information-theoretic performance measure - Much safer than using just error rate or accuracy - May be used when tuning the parameters in machine-learning algorithms that use supervised learning - Simplifies the interpretation of performance evaluations based on synthetic or labeled real data
ALPHATECH, Inc. Appendix
Proficiency and Area Under ROC Curve ALPHATECH, Inc. P(x=1) is 0.1 Proficiency
Finding scientific teams doing research on aerosols Using CMU’s K-Groups Algorithm (J. Schneider & A. Moore) ALPHATECH, Inc. • Synthetic Links 100 - Authors of 504 research papers published over last 3 years 90 20 • Synthetic Universe 80 Group Proficiency (%) - 10,000 scientists, engineers, and 70 mathematicians 30 • Link Quality 60 15 - 10% clutter 50 40 The links that were not generated by a single aerosol research team 40 - 10% noise 30 Percentage of authors that were actually not on the research team 20 10 that wrote the paper 10 • Synthetic Ground Truth - Twenty research teams 0 0 10 20 30 40 50 60 70 80 90 100 Orphan Proficiency (%) • Test Objective - Determine proficiency vs # groups to • Underestimating is worse than find (an input to K-Groups) overestimating the # research teams
Recommend
More recommend