Bayesian Kernel Methods for Non-Gaussian Distributions Cameron MacKenzie and Theodore Trafalis School of Industrial Engineering University of Oklahoma INFORMS Annual Meeting November 9, 2010
Current Bayesian Kernel methods • Combine Bayesian probability with Support Vector Machines (SVM) • n data points, m attributes • X is n x m matrix • y is n x 1 vector of 0’s and 1’s • q ( X ) is a function of X used to predict y Likelihood Prior Posterior q q P y | X P X q P X | y P ( y ) MacKenzie and Trafalis 2
Support Vector Machines and idea of kernel methods Feature Space Input Space F F F K x , x x , x 1 2 1 2 MacKenzie and Trafalis 3
Gaussian distributions Refs: Schölkopf and Likelihood Prior Posterior Smola, 2002 Bishop and Tipping, 2003 q q q P X | y P y | X P X Logistic likelihood q exp x q i P y 1| x q i i 1 exp x i Normal prior q X E 0 n x n Kernel matrix K q q cov x ,..., x 1 n MacKenzie and Trafalis 4
What’s new • Beta distributions as priors • Adaptation of beta-binomial updating formula • Comparison of beta kernel classifiers with existing SVM classifiers • Online learning MacKenzie and Trafalis 5
Beta distribution q ~ Beta , q E MacKenzie and Trafalis 6
Shape of beta density functions 2 2 Beta(1,1) Beta(3,3) 1 1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 5 4 Beta(10,10) Beta(5,1) 2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 4 5 Beta(2,6) Beta(15,6) 2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 q MacKenzie and Trafalis 7
Beta-binomial conjugate • Prior q ~ Beta , • Likelihood Number of trials q Y ~ Binomial n , • Posterior q | Y y ~ Beta y , n y Number of zeros Number of ones MacKenzie and Trafalis 8
Applying beta-binomial to data mining q • Prior x ~ Beta , i i i • Posterior q x | y ~ Beta K x , x , 1 K x , x i i j i i j i y 1 y 0 j j 2 Number of zeros x x in training set n j i 2 K x , x exp j i 2 2 n Parameter to be tuned MacKenzie and Trafalis 9
Data sets Number Number of Training Tuning Testing Data set of data Ones Zeros attributes set set set points Parkinson 22 195 147 48 98 58 39 Tornado 83 10,816 721 10,095 541 271 541 Colon Cancer 2,000 62 22 40 31 19 12 Spam 57 4,601 1,813 2,788 460 230 460 Transfusion 4 748 178 570 150 74 524 Each training, tuning, and testing set is randomly sampled 100 times. MacKenzie and Trafalis 10
Testing on data sets Percentage Weighted Regular Data set of ones in Beta prior SVM SVM data set TP rate 86 91 98 Parkinson 75% TN rate 95 76 75 TP rate 80 87 59 Tornado 7% TN rate 97 91 99 TP rate 87 78 77 Colon Cancer 35% TN rate 85 93 95 TP rate 85 85 85 Spam 39% TN rate 85 93 95 TP rate 71 69 24 Transfusion 24% TN rate 61 64 94 MacKenzie and Trafalis 11
Online learning Updated probabilities for one data point from tornado data y = 0 Weighted likelihood Weighted likelihood Unweighted likelihood Each trial Trial E[ q ] E[ q ] E[ q ] uses 100 Prior 1 1 0.5 0.7 9.3 0.070 0.7 9.3 0.07 data points 1 1.00 1.13 0.47 0.70 9.43 0.069 0.70 16.03 0.04 to update prior 2 1.02 1.42 0.42 0.72 9.72 0.069 0.72 21.82 0.03 3 1.02 1.93 0.35 0.72 10.23 0.066 0.72 27.47 0.03 5 1.08 2.41 0.31 0.78 10.71 0.068 0.78 38.13 0.02 10 1.24 3.95 0.24 0.94 12.25 0.071 0.95 66.24 0.01 y = 1 Weighted likelihood Weighted likelihood Unweighted likelihood Trial E[ q ] E[ q ] E[ q ] Prior 1 1 0.5 0.7 9.3 0.07 0.7 9.3 0.07 1 1.01 1.00 0.50 0.71 9.30 0.07 0.71 9.30 0.07 2 1.01 1.00 0.50 0.71 9.30 0.07 0.71 9.30 0.07 3 1.10 1.00 0.52 0.80 9.30 0.08 0.81 9.30 0.08 5 1.16 1.00 0.54 0.86 9.30 0.08 0.88 9.38 0.09 10 1.49 1.01 0.60 1.19 9.31 0.11 1.22 9.41 0.11 MacKenzie and Trafalis 12
Conclusions • Adapting the beta-binomial updating rule to a kernel-based classifier can create a fast and accurate data mining algorithm • User can set prior and weights to reflect imbalanced data sets • Results are comparable to weighted SVM • Online learning combines previous and current information MacKenzie and Trafalis 13
Questions cmackenzie@ou.edu MacKenzie and Trafalis 14
Recommend
More recommend