Multi-Label Learning with Highly Incomplete Data via Collaborative - PowerPoint PPT Presentation

Multi-Label Learning with Highly Incomplete Data via Collaborative Embedding Yufei Han 1 , Guolei Sun 2 , Yun Shen 1 , Xiangliang Zhang 2 1. Symantec Research Labs 2. King Abdullah University of Science and Technology

Outline • Introduction and Problem Definition • Our Methods • Experimental Results

Multi-Label Classification in Cyber Security • Multi-class classification, f ( x ) = c 1? c 2? c 3? or f(x)=apple f(x)=banana f(x)=orange • Multi-label classification, f ( x ) = { c 1? and c 2? and c 3? } Multi-label classification Collaborative embedding Incomplete feature

Existing popular solutions • Binary relevance – Construct classifier for each label independently – Not consider label dependency • Label power-set – Convert into multi-class classification – A,B: {}, {A}, {B}, {A,B} – 2 n : 40 labels, 2 40 =1,099,511,627,776 multi-class classification • Classifier Chains – Learn L binary classifiers by formatting the training problems as ( x i , y 1 , ..., y j − 1 ) → y j = { 0 , 1 } – Only capture the dependency of y i on y 1 , …, y i-1

Use Case of Multi-label Classification Train a prediction model for a Training given product Incomplete signature Incomple counts as features te labels 1 0 5 … 1 0 0 … 1 ? 0 … 0 1 2 … 0 2 1 … 1 1 0 … 3 1 0 … 0 0 1 0 … ? 0 1 … Machine days 0 0 1 … 0 9 0 1 … 1 ? 1 … 1 0 0 … 0 1 0 1 0 1 0 … ? 1 ? … 5

Our Problem: A Tale of Two Cities • Multi-label learning with incomplete feature values and weak labels – Training data (N instances with D X ∈ R N ∗ D features) is partially observed, with if X i,j Ω i,j = 1 is observed. Otherwise Ω i,j = 0 – Label assignment (M is the label Y ∈ { 0 , 1 } N ∗ M dimension) is a positive-unlabeled matrix, with indicating the corresponding instance X i,: is • Y i,j = 1 positively labeled in the j-th label indicating unobservable • Y i,j = 0

Our Problem: A Tale of Two Cities Classification Feature Matrix Label Matrix Model Corrupted / Incomplete data Weak supervision Limited coverage of sensors • Semi-Supervised information • Privacy control • Positive Unlabeled / Partially • Failure of sensors • observed Supervision Partial responses • Weak Pairwise / Triple-wise • constraint

Existing Approaches Methods Feature Values Labels Transductive/ Inductive BiasMC (ICML’15) Complete Positive (Weak) Both WELL (AAAI’10) Complete Positive (Weak) Transductive LEML (ICML’14) Complete Positive and Negative Inductive CoEmbed (AAAI’17) Complete Positive and Negative Transductive MC-1 (NIPS’10) Missing Positive and Negative Transductive DirtyIMC (NIPS’15) Noisy Positive and Negative Both Our study Missing Positive (Weak) Both Q: Give this column?

Collaborative Embedding: A Transfer Learning Approach Incomplete Feature Partially observed T ) Matrix (signatures T = φ ( UV Label Matrix WH of security events) (security event class) Shared Embedding Space Cost-Sensitive Logistic Matrix Low-rank LSE based Matrix Factorization Factorization V T H T = Logit = ＋ R(W) U X W Y

Feature Matrix Completion • Low-rank Completion to Partially Observed Feature Matrix Ω x ∗ ( X − UV T ) U * , V * = argmin 2 U , V V T X U U : projected features of data instances V : spanning basis defining the projection subspace

Collaborative Embedding: A Transfer Learning Approach Incomplete Feature Partially observed T ) Matrix (signatures T = φ ( UV Label Matrix WH of security events) (security event class) Shared Embedding Space Cost-Sensitive Logistic Matrix Low-rank LSE based Matrix Factorization Factorization V T H T ＋ R(W) = Logit = U X W Y

Label Matrix Reconstruction • Cost-sensitive Logistic Matrix Factorization on Positive- Unlabeled class assignment matrix 2 + H (1 − 2 Y i , j ) X i ,: ( WH T ) ,:, j ) W * , H * = argmin 2 ) ∑ Γ i , j log(1 + e + λ ( W W , H i , j Y i , j = 1 Observed and positively labeled entries Γ i , j = α Unobserved thus unlabeled entries Γ i , j = 1 − α Y i , j = 0 Y = I ( WH T )

ColEmbed: Collaborative Embedding • Collaborative Embedding as a solution to learning with incomplete feature and weak labels: Feature completion Label completion Functional Feature Extraction Tolerance to residual error

Upper Bound of Reconstruction Error • Provably reconstruction of the missing label entries – M, D: the number of labels and the dimensionality of feature vectors – N: the number of training samples – t : the upper bound of the spectral norm of H : maximum L2-norm of the row vectors in X – • The label reconstruction error is of the order of 1/(NM(1- ))

ColEmbed-L • Linear Collaborative Embedding: f ( ˆ X ) = ˆ XS T Flexible for both Transductive and Inductive setting

ColEmbed-NL • Non-linear Embedding: linear combination of random feature expansion Ali Rahimi and Ben Recht, Random Features for Large-Scale Kernel Machines, NIPS 2007

ColEmbed-NL • Non-linear Embedding: linear combination of random feature expansion

Training Process • Stochastic Gradient: Large-scale matrix factorization Non-linear case:

Empirical Study • Empirical study aims at answering the following questions – Is it really helpful to reconstruct features and labels simultaneously ? – Do transductive and inductive classification present consistently high precision ? – Does the proposed method provide better classification compared to the state-of-the-art approaches ? – Does the proposed method scale well ?

Methods to Compare • Baseline approaches: – BiasMC (transductive)and BiasMC-I (inductive) , by PU-learning – LEML (cost-sensitive binomial loss) , need + and - labels With complete – LEML (least squared loss) Feature values – WELL , weak labels – CoEmbed , need + and - labels – MC-1 , need + and - labels With missing or noisy – DirtyIMC , need + and - labels feature values • Incomplete feature matrix is completed using the convex low- rank matrix completion approach, noted as MC-Convex

Evaluation Data Sets • Benchmark data sets Public benchmark data Real-world IOT device event detection data

Feature Reconstruction • Lower errors on estimating the missing feature values, comparing to baseline method

Transductive Classification Accuracy • Higher classification accuracy than baseline methods

Inductive Classification Accuracy • Higher classification accuracy than baseline methods

On Real-world Security Data • Consistent better performances on classifying real-world security data, comparing to baseline methods Transductive mode test Inductive mode test

Efficiency Evaluation • Run time in seconds, linear w.r.t. the No. of instances

Takeaway • Collaboratively reconstructing missing feature values and learning missing labels is beneficial for both tasks. • Our proposed method is applicable for both transductive and inductive classification setting. • Our proposed method has better performance than the state-of-the-art approaches.

Future Work • Learning with incomplete data streams • Deep Neural Nets as a more powerful functional mapping between features and labels • Structured feature / label missing patterns • Further extension to multi-task learning

Multi-Label Learning with Highly Incomplete Data via Collaborative - PowerPoint PPT Presentation

Multi-Label Learning with Highly Incomplete Data via Collaborative Embedding Yufei Han 1 , Guolei Sun 2 , Yun Shen 1 , Xiangliang Zhang 2 1. Symantec Research Labs 2. King Abdullah University of Science and Technology Outline Introduction

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

Incomplete Information Econ 400 University of Notre Dame Econ 400 (ND) Incomplete Information

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

Synthesis under incomplete information Andreas Augustin June 12, 2008 Andreas Augustin

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Club Med Bintan Island, Indonesia A HOLISTIC WELLNESS ESCAPE JUST OFF SINGAPORE Image label

Presentation of the label Certicold WHY A CERTICOLD LABEL? A European conformity label For

IETF 78 TPA-Label for ADSP DKIM Third-Party Authorization Label draft-otis-dkim-tpa-label By

MPLS Source Label draft-chen-mpls-source-label-02 Mach Chen, Xiaohu Xu Zhenbin Li, Luyuan Fang

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Over 20 years of custom label printing INSIGNIS is a highly experienced label and packaging

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

1 Agenda ACLs history, interest, and activities related to business acumen Our

PSN COLLABORATIVE ACTION TEAM MEETING KICK OFF May 14, 2017, 3:30 5:30pm Rinconada Library

to Support Great Teaching Presented at: 2014 Conference on Teaching: Supporting Great Teaching

Blended Learning: Whats Next? Creating Collaborative Learning Opportunities Adrian Lee

innovation and improvement Virtual workshop 7 May 2020 #RapidQI Introduction This is a

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department or Computer Science The

Embracing a collaborative learning culture Building and nurturing human organisations Tell me

Results into Clinical Care Eric B. Larson, MD MPH & Leah Tuzzio, MPH NIH Collaboratory Health

Multi-Label Learning with Highly Incomplete Data via Collaborative - PowerPoint PPT Presentation

Multi-Label Learning with Highly Incomplete Data via Collaborative Embedding Yufei Han 1 , Guolei Sun 2 , Yun Shen 1 , Xiangliang Zhang 2 1. Symantec Research Labs 2. King Abdullah University of Science and Technology Outline Introduction

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking &amp; Recommendation Manik Varma Microsoft

Incomplete Information Econ 400 University of Notre Dame Econ 400 (ND) Incomplete Information

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

Synthesis under incomplete information Andreas Augustin June 12, 2008 Andreas Augustin

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Club Med Bintan Island, Indonesia A HOLISTIC WELLNESS ESCAPE JUST OFF SINGAPORE Image label

Presentation of the label Certicold WHY A CERTICOLD LABEL? A European conformity label For

IETF 78 TPA-Label for ADSP DKIM Third-Party Authorization Label draft-otis-dkim-tpa-label By

MPLS Source Label draft-chen-mpls-source-label-02 Mach Chen, Xiaohu Xu Zhenbin Li, Luyuan Fang

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Over 20 years of custom label printing INSIGNIS is a highly experienced label and packaging

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

1 Agenda ACLs history, interest, and activities related to business acumen Our

PSN COLLABORATIVE ACTION TEAM MEETING KICK OFF May 14, 2017, 3:30 5:30pm Rinconada Library

to Support Great Teaching Presented at: 2014 Conference on Teaching: Supporting Great Teaching

Blended Learning: Whats Next? Creating Collaborative Learning Opportunities Adrian Lee

innovation and improvement Virtual workshop 7 May 2020 #RapidQI Introduction This is a

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department or Computer Science The

Embracing a collaborative learning culture Building and nurturing human organisations Tell me

Results into Clinical Care Eric B. Larson, MD MPH &amp; Leah Tuzzio, MPH NIH Collaboratory Health

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

Results into Clinical Care Eric B. Larson, MD MPH & Leah Tuzzio, MPH NIH Collaboratory Health