Bias Also Matters: Bias Attribution for Deep Neural Network Explanation Shengjie Wang*, Tianyi Zhou*, Jeff A. Bilmes University of Washington, Seattle
Explain DNNs as a linear model per data point β’ DNN with piecewise linear activations like ReLU, when applied to a data point π¦ , equals to a linear model π π¦ = π₯π¦ + π. D D β’ The gradient term, i.e., π₯ in π π¦ , has been widely studied to explain DNN output on a given data point. β’ The bias π , however, is usually overlooked.
Bias contains important information of DNNs β’ Decomposition of a DNN for every data point x : f ( x ) = W m m β 1 ( W m β 1 m β 2 ( . . . 1 ( W 1 x + b 1 ) . . . ) + b m β 1 ) + b m . and are the weight matrix and bias term for layer , is the corresponding B D β’ The bias term, though as a scalar, results from the complicated process involving both the weights and biases of DNN layers.
Bias is important for DNN performance β’ Linear model with gradient term only may produce wrong predictions. β’ The bias term corrects it. Dataset Train Without Bias Train With Bias, Test All Test Only wx Test Only b CIFAR10 87.0 90.9 71.5 62.2 CIFAR100 62.8 66.8 40.3 36.5 FMNIST 94.1 94.7 76.1 24.6 Our method βBias Backpropagation (BBp)β explicitly attributes the bias term to each input feature.
Bias Backpropagation (BBp) Algorithm 1 Bias Backpropagation (BBp) β’ Start from the final layer and attribute of input : x , { W ` } m ` =1 , { b ` } m ` =1 , { ` ( Β· ) } m the bias in a backpropagation style. ` =1 1 Compute { W x ` } m ` =1 and { b x ` } m ` =1 for x by Eq. (5) ; // Get β’ For every layer: data point specific weight/bias (14) 2 οΏ½ m β b m ; // οΏ½ ` holds the accumulated attribution for β’ Receive the bias attribution from layer ` the previous layer. 3 for ` β m to 2 by β 1 do for p β 1 to d ` by 1 do 4 β’ Combine the received bias Compute β΅ ` [ p ] by Eq. (15) - (17) or Eq. (18) ; i.e., 5 attribution with the effective bias β€ // Compute attribution score . B ` [ p, q ] β β΅ ` [ p, q ] Γ οΏ½ ` [ p ] , β q β [ d ` β 1 ] ; of this layer. 6 // Attribute to the layer input β’ Attribute the combined term to the end 7 input of this layer. for q β 1 to d ` β 1 by 1 do 8 Q m ` β 1 + P d ` i = ` W x i b x οΏ½ ` β 1 [ q ] β p =1 B ` [ p, q ] ; 9 β’ The sum of attribution on all input // Combine with bias of layer ` β 1 features exactly recovers π π¦ . end 10 11 end 12 return οΏ½ 1 β R d in
Examples of Attribution Results on Images norm. integrad. norm. bias.1. norm. bias.2. norm. bias.3. norm. grad. original label integrad. attrib. bias.1 attrib. bias.2 attrib. bias.3 attrib. grad. attrib. Teddy Bear Brambling Longhorn Beetle Fire- guard Folding Chair Fountain Pen Piggy Bank
all except all except all except original all layers first 2 layers first 4 layers first 6 layers Bias Attribution of bias.1. attrib various layers bias.2. attrib bias.3. β’ We can use BBp to analyze attrib biases of different layers. bias.1. attrib β’ Bias from lower layers results bias.2. in more noise in the attribution. attrib β’ Bias from deeper layer reveals bias.3. attrib high-level features (e.g., head bias.1. parts of the dog and the bird). attrib bias.2. attrib bias.3. attrib βbias.1(2,3)β corresponds to the three variants of BBp.
Quantitative evaluation on MNIST digit flip test β’ Mask input image pixels based on the attribution scores. β’ Check the change of the predictions. β’ Log-odds scores of target vs. source class before and after masking pixels. β’ BBp is class-sensitive and comparable to methods such as integrated gradient and DeepLift.
Thank you! β’ For more details, please come to our poster session Wednesday 06:30 - 09:00 PM Pacific Ballroom #147
Recommend
More recommend