Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon Alexa AI Joint work with Young-Bum Kim, Eneldo Loza Mencía, Sunghyun Park, Ruhi Sarikaya and Johannes Fürnkranz
Mu Multi-lab label el Clas lassif ific icatio tion (MLC) • Goal : learn a function f that maps instances to a subset of labels Sea Desert Building f − − − − − − → Sky Cloud Mountain • It is important to take into account label dependencies . • Joint probability of labels L Y P ( y 1 , y 2 , · · · , y L | x ) = P ( y i | y <i , x ) i =1
Ma Maximi mization on of of t the j joi oint p prob obability • Traditional approaches for minimizing subset 0/1 loss : • (Probabilistic) classifier chain (Dembczyński et al., ICML 2010; Read et al., MLJ 2011) Y = {Sea, Desert, Building, Sky, Cloud, Mountain} 1. Creates a chain of L labels Desert Sea Cloud Mountain Sky Building f 1 f 2 f 3 f 4 f 5 f 6 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Sea = 1 Sea = 1 Sea = 1 Sea = 1 Additional input features Cloud = 0 Cloud = 0 Cloud = 0 2. Train L independent classifiers Mountain = 1 Mountain = 1 Sky = 1 given input and partial label vector
Ma Maximi mization on of of t the j joi oint p prob obability • Traditional approaches for minimizing subset 0/1 loss : • (Probabilistic) classifier chain (Dembczyński et al., ICML 2010; Read et al., MLJ 2011) Y = {Sea, Desert, Building, Sky, Cloud, Mountain} 1. Creates a chain of L labels Desert Sea Cloud Mountain Sky Building f 1 f 2 f 3 f 4 f 5 f 6 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Sea = 1 Sea = 1 Sea = 1 Sea = 1 Additional input features Cloud = 0 Cloud = 0 Cloud = 0 2. Train L independent classifiers Mountain = 1 Mountain = 1 Sky = 1 given input and partial label vector • Error-propagation at test time Limitations • Effect of label orders in the chain
Ma Maximi mization on of of t the j joi oint p prob obability • Traditional approaches for minimizing subset 0/1 loss : • (Probabilistic) classifier chain (Dembczyński et al., ICML 2010; Read et al., MLJ 2011) Y = {Sea, Desert, Building, Sky, Cloud, Mountain} 1. Creates a chain of L labels Desert Sea Cloud Mountain Sky Building f 1 f 2 f 3 f 4 f 5 f 6 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Sea = 1 Sea = 1 Sea = 1 Sea = 1 Sea = 0 Sea = 0 Sea = 0 Sea = 0 Additional input features Cloud = 0 Cloud = 0 Cloud = 0 2. Train L independent classifiers Mountain = 1 Mountain = 1 Sky = 1 given input and partial label vector • Error-propagation at test time Limitations • Effect of label orders in the chain
Re Recurrent Neural Networks for MLC • Learning from a set of relevant labels in a sequential manner (Nam et al., NIPS 2017) • Number of relevant labels is much smaller than the total number of labels Sea Building Sky Mountain END o 1 o 2 o 3 o 4 o 5 h 0 h 1 h 2 h 3 h 4 h 5 Sea Building Sky Mountain
Re Recurrent Neural Networks for MLC • Learning from a set of relevant labels in a sequential manner (Nam et al., NIPS 2017) • Number of relevant labels is much smaller than the total number of labels Sea Building Sky Mountain END o 1 o 2 o 3 o 4 o 5 h 0 h 1 h 2 h 3 h 4 h 5 Sea Building Sky Mountain • Question : The effect of label permutation remain! How to determine the target label permutation?
Target label permutations for RNN training Ta • Static label permutation for all instances • Arbitrary label sequence randomly chosen at the beginning • Label frequency distribution: freq2rare , rare2freq • Label structures (e.g., pairwise label dependencies) ➜ Suboptimal choice; learn from only one permutation • Different label permutations for individual instances • Choosing randomly every time • Learning from all possible label permutations ➜ More robust to the effect of label permutation; Computational complexity We need MLC algorithms that learn context-dependent label permutations efficiently !
Target label permutations for RNN training Ta • Static label permutation for all instances • Arbitrary label sequence randomly chosen at the beginning • Label frequency distribution: freq2rare , rare2freq • Label structures (e.g., pairwise label dependencies) ➜ Suboptimal choice; learn from only one permutation • Different label permutations for individual instances • Choosing randomly every time • Learning from all possible label permutations ➜ More robust to the effect of label permutation; Computational complexity We need MLC algorithms that learn context-dependent label permutations efficiently !
Target label permutations for RNN training Ta • Static label permutation for all instances • Arbitrary label sequence randomly chosen at the beginning • Label frequency distribution: freq2rare , rare2freq • Label structures (e.g., pairwise label dependencies) ➜ Suboptimal choice; learn from only one permutation • Different label permutations for individual instances • Choosing randomly every time • Learning from all possible label permutations ➜ More robust to the effect of label permutation; Computational complexity We need MLC algorithms that learn context-dependent label permutations efficiently !
Mod Model ba based ed label bel per permut utation ⑴ ⑵ False positive True positive False negative prediction prediction prediction computing errors & label sequence sampling updating parameters 2 1 S 2 1 4 3 5 S x B x x 2 x x x x 1 x x B x 2 x 1 x 4 x 3 x 5 sampled target label true target label set : 1 2 3 4 5 2 1 4 3 5 permutation :
<latexit sha1_base64="IgrCFsrhHA8C8hGl8eyMxhFpsME=">ACfHicbVFdaxNBFJ1dv2r8aGwfBiFDZIwq6I9qVQEF8itK0hey63JnMJkNnP5i5Wwjr/or+M9/8Kb6Is0lQ23ph4Mw587HubzSylIY/vD8W7fv3L23c7/34OGjx7v9J3sntqyNkFNR6tKcbRSq0JOSZGWZ5WRmHMtT/n5+04/vZDGqrI4plUlkxwXhcqUQHJU2r+MC+Qa0yampSRsPwUbMIRDiHOkJefNhzZtJn8cXx3Aum0h1jKjGcS2ztNGHYZOR5Fjr96ovOVC/jbDwGmCr6BTdUQgi8Oj4AH3W4YG7VYUpL2B+E4XBfcBNEWDNi2Jmn/ezwvRZ3LgoRGa2dRWFHSoCEltGx7cW1lheIcF3LmYIG5tEmzDq+Fl46ZQ1YatwqCNftvR4O5taucO2eXh72udeT/tFlN2UHSqKqSRZic1FWa6ASuknAXBkpSK8cQGUeyuIJRoU5ObVcyFE1798E5y8HkfhOPr8ZnB0sI1jhz1lz1nAIvaOHbGPbMKmTLCf3jMv8IbeL/+F/8ofbay+t+3Z1fKf/sbepLAlg=</latexit> <latexit sha1_base64="IgrCFsrhHA8C8hGl8eyMxhFpsME=">ACfHicbVFdaxNBFJ1dv2r8aGwfBiFDZIwq6I9qVQEF8itK0hey63JnMJkNnP5i5Wwjr/or+M9/8Kb6Is0lQ23ph4Mw587HubzSylIY/vD8W7fv3L23c7/34OGjx7v9J3sntqyNkFNR6tKcbRSq0JOSZGWZ5WRmHMtT/n5+04/vZDGqrI4plUlkxwXhcqUQHJU2r+MC+Qa0yampSRsPwUbMIRDiHOkJefNhzZtJn8cXx3Aum0h1jKjGcS2ztNGHYZOR5Fjr96ovOVC/jbDwGmCr6BTdUQgi8Oj4AH3W4YG7VYUpL2B+E4XBfcBNEWDNi2Jmn/ezwvRZ3LgoRGa2dRWFHSoCEltGx7cW1lheIcF3LmYIG5tEmzDq+Fl46ZQ1YatwqCNftvR4O5taucO2eXh72udeT/tFlN2UHSqKqSRZic1FWa6ASuknAXBkpSK8cQGUeyuIJRoU5ObVcyFE1798E5y8HkfhOPr8ZnB0sI1jhz1lz1nAIvaOHbGPbMKmTLCf3jMv8IbeL/+F/8ofbay+t+3Z1fKf/sbepLAlg=</latexit> <latexit sha1_base64="IgrCFsrhHA8C8hGl8eyMxhFpsME=">ACfHicbVFdaxNBFJ1dv2r8aGwfBiFDZIwq6I9qVQEF8itK0hey63JnMJkNnP5i5Wwjr/or+M9/8Kb6Is0lQ23ph4Mw587HubzSylIY/vD8W7fv3L23c7/34OGjx7v9J3sntqyNkFNR6tKcbRSq0JOSZGWZ5WRmHMtT/n5+04/vZDGqrI4plUlkxwXhcqUQHJU2r+MC+Qa0yampSRsPwUbMIRDiHOkJefNhzZtJn8cXx3Aum0h1jKjGcS2ztNGHYZOR5Fjr96ovOVC/jbDwGmCr6BTdUQgi8Oj4AH3W4YG7VYUpL2B+E4XBfcBNEWDNi2Jmn/ezwvRZ3LgoRGa2dRWFHSoCEltGx7cW1lheIcF3LmYIG5tEmzDq+Fl46ZQ1YatwqCNftvR4O5taucO2eXh72udeT/tFlN2UHSqKqSRZic1FWa6ASuknAXBkpSK8cQGUeyuIJRoU5ObVcyFE1798E5y8HkfhOPr8ZnB0sI1jhz1lz1nAIvaOHbGPbMKmTLCf3jMv8IbeL/+F/8ofbay+t+3Z1fKf/sbepLAlg=</latexit> <latexit sha1_base64="IgrCFsrhHA8C8hGl8eyMxhFpsME=">ACfHicbVFdaxNBFJ1dv2r8aGwfBiFDZIwq6I9qVQEF8itK0hey63JnMJkNnP5i5Wwjr/or+M9/8Kb6Is0lQ23ph4Mw587HubzSylIY/vD8W7fv3L23c7/34OGjx7v9J3sntqyNkFNR6tKcbRSq0JOSZGWZ5WRmHMtT/n5+04/vZDGqrI4plUlkxwXhcqUQHJU2r+MC+Qa0yampSRsPwUbMIRDiHOkJefNhzZtJn8cXx3Aum0h1jKjGcS2ztNGHYZOR5Fjr96ovOVC/jbDwGmCr6BTdUQgi8Oj4AH3W4YG7VYUpL2B+E4XBfcBNEWDNi2Jmn/ezwvRZ3LgoRGa2dRWFHSoCEltGx7cW1lheIcF3LmYIG5tEmzDq+Fl46ZQ1YatwqCNftvR4O5taucO2eXh72udeT/tFlN2UHSqKqSRZic1FWa6ASuknAXBkpSK8cQGUeyuIJRoU5ObVcyFE1798E5y8HkfhOPr8ZnB0sI1jhz1lz1nAIvaOHbGPbMKmTLCf3jMv8IbeL/+F/8ofbay+t+3Z1fKf/sbepLAlg=</latexit> Po Policy gr gradi dient " T − 1 # X r θ J ( θ ) = E P τ r θ log P θ ( a i | s i )( R i � b ( s i )) θ i =0 Label policy distribution 2 1 S Model prediction evaluation Model parameter updates Generated label permutation: 2 1 true target label set: 1 2 3 4 5 x B x x 2 x x x x 1 x
Recommend
More recommend