task understanding from confusing multi task data
play

Task Understanding From Confusing Multi-task Data Yizhou JIANG - PowerPoint PPT Presentation

Task Understanding From Confusing Multi-task Data Yizhou JIANG Shangqi GUO Feng CHEN Xin SU Tsinghua University Tsinghua University Tsinghua University Tsinghua University 2 Motivation: From Narrow AI to AGI Narrow AI: A specific task


  1. Task Understanding From Confusing Multi-task Data Yizhou JIANG Shangqi GUO Feng CHEN Xin SU Tsinghua University Tsinghua University Tsinghua University Tsinghua University

  2. 2 Motivation: From Narrow AI to AGI  Narrow AI: A specific task in the determined environment. Task 1 … (Color) “Yellow” “Yellow” “Red” “Green” Multi-Task Learning : Task 2 … Comprehensive problems in (Name) “ Banana ” “ Apple ” “Lemon” “Apple” different semantic space Task 3 … (Taste) “Sweet” “Sour” “ Sweet ” “Sour” Task Annotation Label Annotation —— Manual Task Definition Do not exist in natural raw data AGI Problem: How can we learn task concept from original raw data?

  3. 3 Confusing Supervised Learning (CSL)  Without task annotation: Mapping conflicts between multi-task Data De-confuse Confusing Data “Yellow” “Banana” “Green” “Yellow” “Green” “Apple” “Lemon” “Sweet” “Red” Task Understanding Multi-Task Learning “Red” “Apple” “Banana” “Apple” “Red” ( Deconfusing Function ) ( Mapping Function ) “Apple” “Sweet” “Lemon” “Sweet” “Sour” “Sour” “Sour” “Sour” “Apple” Confusing Supervised Learning  CSL: Learning task concepts by reducing mapping conflicts

  4. 4 Method: CSL-Net Mapping Net Deconfusing Net 𝑛 𝑙 𝑛 2 , 𝑙 = 1, … , 𝑜 2 𝑙 − 𝑕 𝑙 𝑦 𝑗 𝑙 ℎ 𝑦 𝑗 , 𝑧 𝑗 − ෠ min 𝑕 𝑙 𝑀 𝑛𝑏𝑞 𝑕 𝑙 = ෍ 𝑧 𝑗 min ℎ 𝑀 𝑒𝑓𝑑 ℎ = ෍ ℎ 𝑦 𝑗 , 𝑧 𝑗 𝑗=1 𝑗=1 Sample Temporary Assignment 𝑦 𝑗 𝑦 𝑗 Ground-truth ℎ(𝑦 𝑗 ,𝑧 𝑗 ) Deconfusing-Net Deconfusing-Net ෠ 𝑧 𝑗 𝑧 𝑗 ℎ(𝑦 𝑗 ,𝑧 𝑗 ) … … … 𝑴 𝒆𝒇𝒅 ℎ(𝑦 𝑗 ,𝑧 𝑗 ) 𝑕 𝑙 (𝑦 𝑗 ) 𝑧 𝑗 𝑴 𝒏𝒃𝒒 𝑕 1 (𝑦 𝑗 ) 𝑕 1 (𝑦 𝑗 ) 𝑕 1 (𝑦 𝑗 ) 𝑦 𝑗 argmin … 𝑦 𝑗 … … Mapping Net Mapping Net Multi-task 𝑙 𝑕 𝑜 (𝑦 𝑗 ) 𝑕 𝑜 (𝑦 𝑗 ) 𝑕 𝑜 (𝑦 𝑗 ) Outputs Mapping-Net Training Deconfusing-Net Training

  5. 5 Motivation: From Narrow AI to AGI  AI Success: Exceeded human-level performance on various problems.  Narrow AI: A specific task in the determined environment.

  6. 6 Motivation: From Narrow AI to AGI Task 1 … (Color) “Yellow” “Yellow” “Red” “Green” Multi-Task Learning : Task 2 … Comprehensive problems in (Fruit) “ Banana ” “ Apple ” “Lemon” “Apple” different semantic space Task 3 … (Taste) “Sweet” “Sour” “ Sweet ” “Sour” Task Annotation Label Annotation —— Manual Task Definition Do not exist in natural raw data AGI Problem: How can we learn task concept from original raw data?

  7. 7 Confusing Data  Multi-tasks cannot be represented by a single mapping function.  Task understanding is vital for multi-task learning. Confusing Data: Multi-task data without Task Annotation “Sweet” “Green” “Yellow” “Sweet” “Apple” “Banana” “Red” “Red” “Apple” “Sweet” “Lemon” “Sour” “Sour” Mixed task is confusing! “Apple”

  8. 8 Comparison of Existing Methods A novel learning problem!  Supervised Learning & Latent Variable Learning : Mapping Confusing.  Multi-Task Learning : Task annotation is needed.  Multi-Label Learning : Multiple labels are allocated.  Confusing Supervised Learning: No task annotation or samples allocation.

  9. 9 Confusing Supervised Learning (CSL)  Without task annotation: Mapping conflicts between multi-task Data De-confuse Confusing Data “Yellow” “Banana” “Green” “Yellow” “Green” Task 1 “Red” “Lemon” “Sweet” (Color) Task Understanding Multi-Task Learning “Red” “Apple” Task 2 “Banana” “Apple” “Apple” “Red” ( Deconfusing Function ) ( Mapping Function ) (Fruit) “Apple” Task 3 “Sweet” “Lemon” “Sweet” “Sour” “Sour” (Taste) “Sour” “Sour” “Apple” Confusing Supervised Learning

  10. 10 Learning Objective: Risk Functional of CSL Model Traditional Supervised Learning Confusing Supervised Learning Risk Functional Solution min 𝑆 𝑕 ∗ > 0 min 𝑆 𝑕 ∗ , ℎ ∗ = 0 Deconfusing ℎ(𝑦, 𝑔, 𝑕) Mapping 𝑕(𝑦) 𝑍 𝑍 𝑍 𝑍 𝑌 𝑌 𝑌 𝑌

  11. 11 Feasibility: Loss → 0  Wrong allocation of confusing samples leads to unavoidable loss. function 1 function 2 Loss > 0 × Y Sam pl es Y X function 1 function 2 function 3 √ Loss ≈ 0 Y X C onf usi ng Sam pl es X  Task concept driven by global loss: Empirical risk should go towards 0!

  12. 12 Training Target & CSL-Net  Optimization Target:  Expected Result:  Constraint: The output of Deconfusing-Net is one-hot!  Difficulty: Approximation of Softmax leads to a trivial solution. Joint BP is not available.

  13. 13 Training Algorithm of CSL-Net Training of Mapping Net Training of Deconfusing Net 𝑛 𝑙 𝑛 2 , 𝑙 = 1, … , 𝑜 2 𝑙 − 𝑕 𝑙 𝑦 𝑗 𝑙 ℎ 𝑦 𝑗 , 𝑧 𝑗 − ෠ min 𝑕 𝑙 𝑀 𝑛𝑏𝑞 𝑕 𝑙 = ෍ 𝑧 𝑗 min ℎ 𝑀 𝑒𝑓𝑑 ℎ = ෍ ℎ 𝑦 𝑗 , 𝑧 𝑗 𝑗=1 𝑗=1 Sample Temporary Assignment 𝑦 𝑗 𝑦 𝑗 Ground-truth ℎ(𝑦 𝑗 ,𝑧 𝑗 ) Deconfusing-Net Deconfusing-Net ෠ 𝑧 𝑗 𝑧 𝑗 ℎ(𝑦 𝑗 ,𝑧 𝑗 ) … … … 𝑴 𝒆𝒇𝒅 ℎ(𝑦 𝑗 ,𝑧 𝑗 ) 𝑕 𝑙 (𝑦 𝑗 ) 𝑧 𝑗 𝑴 𝒏𝒃𝒒 𝑕 1 (𝑦 𝑗 ) 𝑕 1 (𝑦 𝑗 ) 𝑕 1 (𝑦 𝑗 ) 𝑦 𝑗 argmin … 𝑦 𝑗 … … Mapping Net Mapping Net Multi-task 𝑙 𝑕 𝑜 (𝑦 𝑗 ) 𝑕 𝑜 (𝑦 𝑗 ) 𝑕 𝑜 (𝑦 𝑗 ) Outputs Mapping-Net Training Deconfusing-Net Training

  14. 14 Experiment: Function Regression  Supervised learning fails to fit multiple functions.  Incorrect task number leads to confusing fitting results.  CSL-Net learns reasonable task concepts and complete multi-task mapping. Results in the training process

  15. 15 Experiment: Pattern Recognition  Each sample represents the classification result of only one task.  Two Learning Goal: Color Name Taste • Task Understanding “Apple” “Banana” “Sweet” “Spicy” “Sour” “Lemon” “Red” “Green” “Yellow” • Classification of Multi-Task  Two Evaluation Metrics: • Task Understanding • Classification of Multi-Task

  16. 16 Experiment: Pattern Recognition  Results on two confusing supervised datasets.

  17. 17 Experiment: Pattern Recognition  Feature Visualization of Deconfusing Net. Before After Before After  Deconfusing Net could separate confusing samples to reasonable task groups.

  18. 18 Conclusion  A novel learning problem for general raw data: • Task annotation is unknown in natural raw data. • Understanding task concept from raw data (confusing data).  A novel learning paradigm : Confusing Supervised Learning • Deconfusing Function: Samples allocation for tasks • Mapping Function: Multi-task mappings. • Global Risk Functional: Over all risk of representation for raw data.  A novel network : CSL-Net • Algorithm of alternating two-stage training to realize the task constraint.  A novel application: learning system towards general intelligence. • The agent autonomously defines task concepts and learns multi-task mapping without manual task annotation.

  19. 19 Thanks! Xin Su, Tsinghua University suxin16@mails.tsinghua.edu.cn

Recommend


More recommend