dm2c deep mixed modal clustering
play

DM2C: Deep Mixed-Modal Clustering Yangbangyan Jiang, Qianqian Xu, - PowerPoint PPT Presentation

DM2C: Deep Mixed-Modal Clustering Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang , Xiaochun Cao, Qingming Huang Institute of Information Engineering, CAS University of Chinese Academy of Sciences Institute of Computing Technology, CAS Key Lab. of


  1. DM2C: Deep Mixed-Modal Clustering Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang , Xiaochun Cao, Qingming Huang Institute of Information Engineering, CAS University of Chinese Academy of Sciences Institute of Computing Technology, CAS Key Lab. of BDKM, CAS Peng Cheng Lab.

  2. Why multiple modalities? Ubiquitous multi-modal data • The related information among multiple modalities helps us to understand the data. 1

  3. Supervised Learning under Multiple Modalities • Supervision comes from class labels and modality pairing. • Modality pairing: a sample in modality A and another sample in modality B represent the same instance. • Manual annotations: expensive and laborious. When involving multiple modalities, the labeling is even more complicated than that for single modal data. • We turn to unsupervised learning under multiple modalities since it works without data labels. 2

  4. Mixed-modal Setting: Fully-unsupervised Learning • Traditional unsupervised multi-modal learning still requires extra pairing information among modalities for feature alignment. • E.g. , partial modality pairing, ‘must/cannot link’ constraints, co-occurrence frequency... Figure 1: Examples of multi-modal and mixed-modal data with two modalities. 3 • Mixed-modal data: each instance is represented in only one modality.

  5. Mixed-modal Clustering: The Goal i j then grouping the samples into k categories. 4 • Dataset D = { x i } n i = 1 mixed from two modalities. • D → { x ( a ) i = 1 ∪ { x ( b ) } n a } n b j = 1 , where n = n a + n b . • Mixed-modal clustering aims at learning unifjed representations for the modalities and

  6. How to Learn Unifjed Representations? Choice 1: learn a joint semantic space for all the modalities • hard to fjnd the correlation among all the modalities when pairing information is not available Choice 2: learn the translation across the modalities • easy to obtain the cross-modal mappings under the guidance of cycle-consistency • modality unifying: transforming all the samples into a specifjc modality space 5

  7. Framework: Overview Figure 2: Overview of the proposed method. Modules • Modality-specifjc auto-encoders : to learn latent representations for each modality. • Cross-modal generators : to learn mappings across modalities with unpaired data. • Discriminators : to distinguish whether a sample is mapped from other modality spaces. 6

  8. Framework: Module I i (1) i i Modality-specifjc auto-encoders 7 Latent representations for each modality are learned by single-modal data reconstruction: i rec ( Θ AE A ) = ∥ x ( a ) − Dec A ( Enc A ( x ( a ) L A )) ∥ 2 2 , rec ( Θ AE B ) = ∥ x ( b ) − Dec B ( Enc B ( x ( b ) L B )) ∥ 2 2 .

  9. Framework: Module II Cross-modal generators Mappings across modalities are constrained by cycle-consistency : (2) Generators: produce fake samples that are transformed from other modalities rather than originally lying in a specifjc modality space. 8 L A cyc ( Θ G AB , Θ G BA ) = E z a ∼X A [ ∥ z a − G BA ( G AB ( z a )) ∥ 1 ] , L B cyc ( Θ G AB , Θ G BA ) = E z b ∼X B [ ∥ z b − G AB ( G BA ( z b )) ∥ 1 ] .

  10. Framework: Module III Discriminators Discriminators: distinguish whether a sample is mapped from other modality spaces. Games between generators and discriminators: (3) 9 L A adv ( Θ G BA , Θ D A ) = E z a ∼X A [ D A ( z a )] − E z b ∼X B [ D A ( G BA ( z b ))] , L B adv ( Θ G AB , Θ D B ) = E z b ∼X B [ D B ( z b )] − E z a ∼X A [ D B ( G AB ( z a ))] .

  11. Framework: Objective Function Objective Function (4) 10 min max L A adv + L B adv + λ 1 ( L A cyc + L B cyc ) + λ 2 ( L A rec + L B rec ) Θ GAB , Θ GBA Θ DA , Θ DB Θ AEA , Θ AEB

  12. Thank You for Your Attention! See you at the poster session! Wed Dec 11th 10:45AM – 12:45PM @ East Exhibition Hall B+C #63 11

Recommend


More recommend