Classification from Pairwise Similarity and Unlabeled Data Han Bao 1,2 , Gang Niu 2 , Masashi Sugiyama 2,1 1 The University of Tokyo, Japan / 2 RIKEN, Japan July 13 th , 2018
Gentle Start: Binary Classification 2 Boundary Training data Positive data Negative data empirical risk minimization (ERM) Method: minimize classification error Goal: find a classifier where data is labeled as i . i . d . { ( x i , y i ) } n ∼ p ( x , y ) i =1 y = +1 f ( x ) = 0 x i ∈ R d y i ∈ { +1 , − 1 } f : R d → R y = − 1
Classification of sensitive matters e.g., politics, religion, opinion on racial issue hard to obtain explicit label instead asking “Which person do you share the same belief as?” cf. randomized response technique 3 http://leanintokyo.org/wp-content/uploads/2017/12/MeToo.jpg two people share the same property [Warner 1965] Warner, S. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63‒69, 1965. Motivation: Pairwise Information in Classification
Related: Semi-supervised Clustering Clustering from (manifold assumption, low-density separation) same class different class [Wagstaff+ ICML2001; many other papers] <latexit sha1_base64="30yT3voAaM6LAnNJPg23sA/ERKA=">AC6nichVFLS8NAEB7jq9ZHo14EL2JRPJWtCD5AEL14tGqsYCQkcVuX5kWyLWqIP8CrB5GeFD2I6J/w4h/w4MW7eKzgxYOTNCIqbTfs7uw381MvtEcg3mckOc2ob2js6s70ZPs7esfSImDQ1ueXZ1Kum2YbvbmupRg1lU4owbdNtxqWpqBs1rpZXQn69Q12O2tckPHbprqkWLFZiucoQUcUQ2Vb6vq4YvBYuyL2umfxAoTA4UMU0yJFpj/41sbKQhXmu2+AIy7IENOpTBAoWcLQNUMHDbweyQMBbBd8xFy0WOSnEASuWMohihIlrCs4ivnRi18B3m9CK2jlUM3C4yx2CPJEbUiOP5Ja8ks+GufwoR9jLId5anUsdJXUysvHRkmXizWH/h9WE8R3roreAu/H/+diH2cLfPCJUiGOVuUgZhko5ERJqpte7rRyd1TYW1if8SXJ3lCtC/JMHlAvq/KuX+foehWSO7s3+H+N6TpzHwm5tJLy3Hc0/AKIzDFA53FpZgFdZAwrLHcAV3cC+YwqlwLlTroUJbzBmGX0u4+gKh8atp</latexit> <latexit sha1_base64="30yT3voAaM6LAnNJPg23sA/ERKA=">AC6nichVFLS8NAEB7jq9ZHo14EL2JRPJWtCD5AEL14tGqsYCQkcVuX5kWyLWqIP8CrB5GeFD2I6J/w4h/w4MW7eKzgxYOTNCIqbTfs7uw381MvtEcg3mckOc2ob2js6s70ZPs7esfSImDQ1ueXZ1Kum2YbvbmupRg1lU4owbdNtxqWpqBs1rpZXQn69Q12O2tckPHbprqkWLFZiucoQUcUQ2Vb6vq4YvBYuyL2umfxAoTA4UMU0yJFpj/41sbKQhXmu2+AIy7IENOpTBAoWcLQNUMHDbweyQMBbBd8xFy0WOSnEASuWMohihIlrCs4ivnRi18B3m9CK2jlUM3C4yx2CPJEbUiOP5Ja8ks+GufwoR9jLId5anUsdJXUysvHRkmXizWH/h9WE8R3roreAu/H/+diH2cLfPCJUiGOVuUgZhko5ERJqpte7rRyd1TYW1if8SXJ3lCtC/JMHlAvq/KuX+foehWSO7s3+H+N6TpzHwm5tJLy3Hc0/AKIzDFA53FpZgFdZAwrLHcAV3cC+YwqlwLlTroUJbzBmGX0u4+gKh8atp</latexit> <latexit sha1_base64="30yT3voAaM6LAnNJPg23sA/ERKA=">AC6nichVFLS8NAEB7jq9ZHo14EL2JRPJWtCD5AEL14tGqsYCQkcVuX5kWyLWqIP8CrB5GeFD2I6J/w4h/w4MW7eKzgxYOTNCIqbTfs7uw381MvtEcg3mckOc2ob2js6s70ZPs7esfSImDQ1ueXZ1Kum2YbvbmupRg1lU4owbdNtxqWpqBs1rpZXQn69Q12O2tckPHbprqkWLFZiucoQUcUQ2Vb6vq4YvBYuyL2umfxAoTA4UMU0yJFpj/41sbKQhXmu2+AIy7IENOpTBAoWcLQNUMHDbweyQMBbBd8xFy0WOSnEASuWMohihIlrCs4ivnRi18B3m9CK2jlUM3C4yx2CPJEbUiOP5Ja8ks+GufwoR9jLId5anUsdJXUysvHRkmXizWH/h9WE8R3roreAu/H/+diH2cLfPCJUiGOVuUgZhko5ERJqpte7rRyd1TYW1if8SXJ3lCtC/JMHlAvq/KuX+foehWSO7s3+H+N6TpzHwm5tJLy3Hc0/AKIzDFA53FpZgFdZAwrLHcAV3cC+YwqlwLlTroUJbzBmGX0u4+gKh8atp</latexit> <latexit sha1_base64="30yT3voAaM6LAnNJPg23sA/ERKA=">AC6nichVFLS8NAEB7jq9ZHo14EL2JRPJWtCD5AEL14tGqsYCQkcVuX5kWyLWqIP8CrB5GeFD2I6J/w4h/w4MW7eKzgxYOTNCIqbTfs7uw381MvtEcg3mckOc2ob2js6s70ZPs7esfSImDQ1ueXZ1Kum2YbvbmupRg1lU4owbdNtxqWpqBs1rpZXQn69Q12O2tckPHbprqkWLFZiucoQUcUQ2Vb6vq4YvBYuyL2umfxAoTA4UMU0yJFpj/41sbKQhXmu2+AIy7IENOpTBAoWcLQNUMHDbweyQMBbBd8xFy0WOSnEASuWMohihIlrCs4ivnRi18B3m9CK2jlUM3C4yx2CPJEbUiOP5Ja8ks+GufwoR9jLId5anUsdJXUysvHRkmXizWH/h9WE8R3roreAu/H/+diH2cLfPCJUiGOVuUgZhko5ERJqpte7rRyd1TYW1if8SXJ3lCtC/JMHlAvq/KuX+foehWSO7s3+H+N6TpzHwm5tJLy3Hc0/AKIzDFA53FpZgFdZAwrLHcAV3cC+YwqlwLlTroUJbzBmGX0u4+gKh8atp</latexit> <latexit sha1_base64="9pK1tCgr2axsScUrDFhYvw/UlQ=">AC9nichVE7SwNBEJ6crxgfidoINmJ8RJCwJ4IPEQtLJNoVPBCuDs3uRe3G2C8cgf8A9YWIiCiNik1dbGP2BhYy+WCjYWzl1OREN0j92d/Wa+mblvFEtjDifkMS0tLa1d4Q7I13dPb3RWF/pmOWbJVmVMz7W1FdqjGDJrljGt027KprCsa3VK5/q0xth5nGBq9YNKfLewYrMFXmCOVjo5Iu831V1tzV6qLkJiRFdw+qeTb1ZUxMStV8LE6SxF/DjYGHEIVsqMPYEu2CiXQgYIBHG0NZHDw2wERCFiI5cBFzEaL+X4KVYgt4RFCNkRIt47uFrJ0ANfHs5HZ+tYhUNt43MYRgjD+SKvJ7ck2eyUfTXK6fw+ulgrdS51IrHz0aXH/l6XjzWH/m/UH4yvWRm8Bd/P/c7EP/R/3xGeQhyrzPnKMFTK8hFPM7Xebfnw+HV9ITPmjpNz8oJqnZFHcod6GeU39SJNMycQwXGLv4fbaGSnk/NJMT0TX1oO5h6GIRiBA53FpZgDVKQxbJHUIMbuBUqwqlwIVzWQ4VQwBmAH0uofQKOA6/X</latexit> <latexit sha1_base64="9pK1tCgr2axsScUrDFhYvw/UlQ=">AC9nichVE7SwNBEJ6crxgfidoINmJ8RJCwJ4IPEQtLJNoVPBCuDs3uRe3G2C8cgf8A9YWIiCiNik1dbGP2BhYy+WCjYWzl1OREN0j92d/Wa+mblvFEtjDifkMS0tLa1d4Q7I13dPb3RWF/pmOWbJVmVMz7W1FdqjGDJrljGt027KprCsa3VK5/q0xth5nGBq9YNKfLewYrMFXmCOVjo5Iu831V1tzV6qLkJiRFdw+qeTb1ZUxMStV8LE6SxF/DjYGHEIVsqMPYEu2CiXQgYIBHG0NZHDw2wERCFiI5cBFzEaL+X4KVYgt4RFCNkRIt47uFrJ0ANfHs5HZ+tYhUNt43MYRgjD+SKvJ7ck2eyUfTXK6fw+ulgrdS51IrHz0aXH/l6XjzWH/m/UH4yvWRm8Bd/P/c7EP/R/3xGeQhyrzPnKMFTK8hFPM7Xebfnw+HV9ITPmjpNz8oJqnZFHcod6GeU39SJNMycQwXGLv4fbaGSnk/NJMT0TX1oO5h6GIRiBA53FpZgDVKQxbJHUIMbuBUqwqlwIVzWQ4VQwBmAH0uofQKOA6/X</latexit> <latexit sha1_base64="9pK1tCgr2axsScUrDFhYvw/UlQ=">AC9nichVE7SwNBEJ6crxgfidoINmJ8RJCwJ4IPEQtLJNoVPBCuDs3uRe3G2C8cgf8A9YWIiCiNik1dbGP2BhYy+WCjYWzl1OREN0j92d/Wa+mblvFEtjDifkMS0tLa1d4Q7I13dPb3RWF/pmOWbJVmVMz7W1FdqjGDJrljGt027KprCsa3VK5/q0xth5nGBq9YNKfLewYrMFXmCOVjo5Iu831V1tzV6qLkJiRFdw+qeTb1ZUxMStV8LE6SxF/DjYGHEIVsqMPYEu2CiXQgYIBHG0NZHDw2wERCFiI5cBFzEaL+X4KVYgt4RFCNkRIt47uFrJ0ANfHs5HZ+tYhUNt43MYRgjD+SKvJ7ck2eyUfTXK6fw+ulgrdS51IrHz0aXH/l6XjzWH/m/UH4yvWRm8Bd/P/c7EP/R/3xGeQhyrzPnKMFTK8hFPM7Xebfnw+HV9ITPmjpNz8oJqnZFHcod6GeU39SJNMycQwXGLv4fbaGSnk/NJMT0TX1oO5h6GIRiBA53FpZgDVKQxbJHUIMbuBUqwqlwIVzWQ4VQwBmAH0uofQKOA6/X</latexit> <latexit sha1_base64="9pK1tCgr2axsScUrDFhYvw/UlQ=">AC9nichVE7SwNBEJ6crxgfidoINmJ8RJCwJ4IPEQtLJNoVPBCuDs3uRe3G2C8cgf8A9YWIiCiNik1dbGP2BhYy+WCjYWzl1OREN0j92d/Wa+mblvFEtjDifkMS0tLa1d4Q7I13dPb3RWF/pmOWbJVmVMz7W1FdqjGDJrljGt027KprCsa3VK5/q0xth5nGBq9YNKfLewYrMFXmCOVjo5Iu831V1tzV6qLkJiRFdw+qeTb1ZUxMStV8LE6SxF/DjYGHEIVsqMPYEu2CiXQgYIBHG0NZHDw2wERCFiI5cBFzEaL+X4KVYgt4RFCNkRIt47uFrJ0ANfHs5HZ+tYhUNt43MYRgjD+SKvJ7ck2eyUfTXK6fw+ulgrdS51IrHz0aXH/l6XjzWH/m/UH4yvWRm8Bd/P/c7EP/R/3xGeQhyrzPnKMFTK8hFPM7Xebfnw+HV9ITPmjpNz8oJqnZFHcod6GeU39SJNMycQwXGLv4fbaGSnk/NJMT0TX1oO5h6GIRiBA53FpZgDVKQxbJHUIMbuBUqwqlwIVzWQ4VQwBmAH0uofQKOA6/X</latexit> <latexit sha1_base64="s5sKGlaizRypxYBUvoucvMZcG+c=">AC9nichVE7SwNBEB7PV4yvqI1gI8b4AkbEXyAINpYqjEmkAvh7tzo4r242wTjkT+QP2BhIQoiYmOrY1/wCKNvVhGsLFw7nJBNCTZY3dnv5lvZu4b2VSZzQmpdAidXd09vYG+YP/A4NBwaGT0wDbylkITiqEaVkqWbKoynSY4ypNmRaVNFmlSflky/UnC9SymaHv86JM5p0pLMcUySOUDY0LWoSP1Yk1YmX1kVnTpQ157SUZQt1Y3ZeLGVDYRIl3psNGK+EQZ/7RihNxDhEAxQIA8aUNCBo62CBDZ+aYgBAROxDiIWgxz0+hBEHk5jGKYoSE6AmeR/hK+6iObzen7bEVrKLitpA5CRHySu5IlbyQe/JOvpvmcrwcbi9FvOUal5rZ4fJ4/KstS8Obw/EvqwWjHmuhN4e7+f852IfWxt86wlWIY5UVTxmGSpke4mqm1LotnJ1X42t7EWeGXJMPVOuKVMgz6qUXPpWbXbp3AUEcd+z/cBuNxGJ0NRrbXQpvbPpzD8AETMEcDncZNmAbdiCBZcvwAI/wJBSFS+FGuK2FCh0+Zwz+LOHhB7Psr+Y=</latexit> <latexit sha1_base64="s5sKGlaizRypxYBUvoucvMZcG+c=">AC9nichVE7SwNBEB7PV4yvqI1gI8b4AkbEXyAINpYqjEmkAvh7tzo4r242wTjkT+QP2BhIQoiYmOrY1/wCKNvVhGsLFw7nJBNCTZY3dnv5lvZu4b2VSZzQmpdAidXd09vYG+YP/A4NBwaGT0wDbylkITiqEaVkqWbKoynSY4ypNmRaVNFmlSflky/UnC9SymaHv86JM5p0pLMcUySOUDY0LWoSP1Yk1YmX1kVnTpQ157SUZQt1Y3ZeLGVDYRIl3psNGK+EQZ/7RihNxDhEAxQIA8aUNCBo62CBDZ+aYgBAROxDiIWgxz0+hBEHk5jGKYoSE6AmeR/hK+6iObzen7bEVrKLitpA5CRHySu5IlbyQe/JOvpvmcrwcbi9FvOUal5rZ4fJ4/KstS8Obw/EvqwWjHmuhN4e7+f852IfWxt86wlWIY5UVTxmGSpke4mqm1LotnJ1X42t7EWeGXJMPVOuKVMgz6qUXPpWbXbp3AUEcd+z/cBuNxGJ0NRrbXQpvbPpzD8AETMEcDncZNmAbdiCBZcvwAI/wJBSFS+FGuK2FCh0+Zwz+LOHhB7Psr+Y=</latexit> <latexit sha1_base64="s5sKGlaizRypxYBUvoucvMZcG+c=">AC9nichVE7SwNBEB7PV4yvqI1gI8b4AkbEXyAINpYqjEmkAvh7tzo4r242wTjkT+QP2BhIQoiYmOrY1/wCKNvVhGsLFw7nJBNCTZY3dnv5lvZu4b2VSZzQmpdAidXd09vYG+YP/A4NBwaGT0wDbylkITiqEaVkqWbKoynSY4ypNmRaVNFmlSflky/UnC9SymaHv86JM5p0pLMcUySOUDY0LWoSP1Yk1YmX1kVnTpQ157SUZQt1Y3ZeLGVDYRIl3psNGK+EQZ/7RihNxDhEAxQIA8aUNCBo62CBDZ+aYgBAROxDiIWgxz0+hBEHk5jGKYoSE6AmeR/hK+6iObzen7bEVrKLitpA5CRHySu5IlbyQe/JOvpvmcrwcbi9FvOUal5rZ4fJ4/KstS8Obw/EvqwWjHmuhN4e7+f852IfWxt86wlWIY5UVTxmGSpke4mqm1LotnJ1X42t7EWeGXJMPVOuKVMgz6qUXPpWbXbp3AUEcd+z/cBuNxGJ0NRrbXQpvbPpzD8AETMEcDncZNmAbdiCBZcvwAI/wJBSFS+FGuK2FCh0+Zwz+LOHhB7Psr+Y=</latexit> <latexit sha1_base64="s5sKGlaizRypxYBUvoucvMZcG+c=">AC9nichVE7SwNBEB7PV4yvqI1gI8b4AkbEXyAINpYqjEmkAvh7tzo4r242wTjkT+QP2BhIQoiYmOrY1/wCKNvVhGsLFw7nJBNCTZY3dnv5lvZu4b2VSZzQmpdAidXd09vYG+YP/A4NBwaGT0wDbylkITiqEaVkqWbKoynSY4ypNmRaVNFmlSflky/UnC9SymaHv86JM5p0pLMcUySOUDY0LWoSP1Yk1YmX1kVnTpQ157SUZQt1Y3ZeLGVDYRIl3psNGK+EQZ/7RihNxDhEAxQIA8aUNCBo62CBDZ+aYgBAROxDiIWgxz0+hBEHk5jGKYoSE6AmeR/hK+6iObzen7bEVrKLitpA5CRHySu5IlbyQe/JOvpvmcrwcbi9FvOUal5rZ4fJ4/KstS8Obw/EvqwWjHmuhN4e7+f852IfWxt86wlWIY5UVTxmGSpke4mqm1LotnJ1X42t7EWeGXJMPVOuKVMgz6qUXPpWbXbp3AUEcd+z/cBuNxGJ0NRrbXQpvbPpzD8AETMEcDncZNmAbdiCBZcvwAI/wJBSFS+FGuK2FCh0+Zwz+LOHhB7Psr+Y=</latexit> Wagstaff, K., Cardie, C., Rogers, S., and Schrödl, S. Constrained k-means clustering with background knowledge. In ICML, pp. 577‒584, 2001. 4 does not hold for many datasets Problem: Cluster assumption Offspring of unsupervised clustering dissimilar similar unlabeled U = { x i } S = { ( x i , x 0 i ) } D = { ( x i , x 0 i ) }
Recommend
More recommend