What deep generative models can do for you: Opportunities, - - PowerPoint PPT Presentation

β–Ά
what deep generative models can do for you opportunities
SMART_READER_LITE
LIVE PREVIEW

What deep generative models can do for you: Opportunities, - - PowerPoint PPT Presentation

What deep generative models can do for you: Opportunities, challenges, and open questions Giulia Fanti Carnegie Mellon University 1 Kiran Zinan Lin Hao Liang Alankar Jain Thekumparampil Chen Wang Sewoong Oh Vyas Sekar 2 Classifying


slide-1
SLIDE 1

What deep generative models can do for you: Opportunities, challenges, and open questions

1

Giulia Fanti Carnegie Mellon University

slide-2
SLIDE 2

2

Vyas Sekar Chen Wang Alankar Jain Zinan Lin Hao Liang Kiran Thekumparampil Sewoong Oh

slide-3
SLIDE 3

Common ML tools in networking

3

Classification

Classifying network traffic

Reinforcement learning

Traffic engineering

Unsupervised methods

Clustering signals

slide-4
SLIDE 4

This talk: Generative models

4

  • What are generative models?
  • Why are they relevant now?
  • How can they be useful in networking?
  • What are the limitations?
slide-5
SLIDE 5

What is a generative model?

  • Models the joint probability distribution π‘ž(𝑦) of a dataset
  • Example:

5

𝑦 𝑒 = 𝑔 𝑦 0: 𝑒 βˆ’ 1 , πœ„ + π‘œ[𝑒]

Noise Learned parameters Model How do we pick 𝑔? How to combine noise? Time t 𝑦[𝑒 βˆ’ 1] 𝑦[0]

slide-6
SLIDE 6

How are they used in the networking community?

Use dom domain n kno nowledg edge to extract high-level insights Design par param ametric model to model those insights Use da data to populate parameters

Network traffic has temporal patterns

𝑦 𝑒 = sin πœ„π‘’ + π‘œ[𝑒]

! " = 1 day

Melamed (1993), Denneulin et al (2004), Swing, BURSE, Hierarchical bundling, Di et al (2014), …

slide-7
SLIDE 7

Problems with this approach

7

  • Requires new design for

every type of data Poor flexibility

  • Doesn’t capture properties

that were not explicitly modeled Poor fidelity

slide-8
SLIDE 8

Deep generative models

8

Design neur eural networ

  • rk

to produce data of the right dimensionality Use da data to populate parameters

πœ„ ∈ 𝑆!

slide-9
SLIDE 9

Generative Adversarial Networks (GANs): Breakthrough in generative modeling

  • Prior approaches
  • Likelihood-based
  • Heavily rely on domain

knowledge

  • GANs
  • Adversarial learning
  • Limited a priori assumptions

9

slide-10
SLIDE 10

Generative Adversarial Networks (GANs)

Generator G Noise z Discriminator D FAKE! REAL

10

slide-11
SLIDE 11

How can we use these tools in networking?

  • Sharing synthetic data
  • Discovering malicious inputs to black-box systems
  • Understanding complex datasets

11

slide-12
SLIDE 12

Sharing synthetic data

Use case 1

12

Vyas Sekar Chen Wang Alankar Jain Zinan Lin

github.com/fjxmlzn/DoppelGANger

slide-13
SLIDE 13

Key stumbling block: Access to data

Enterprises Researchers

13

Unreproducible research Limited potential Collaborative opportunities go untapped Division A Division B

slide-14
SLIDE 14

(Not a new) idea: Synthetic data models

14

Generative Model Generative Model Generative Model

Data Clearinghouse (ISAC, ISAO) Enterprises Researchers

slide-15
SLIDE 15

Two main problems

Fideli lity Pri rivacy

Real Generated Generative Model Business secrets User data

15

slide-16
SLIDE 16

Existing methods

16

Fi Fide delity Pr Privacy

Anonymized, raw data Expert-designed parametric models Machine-learned models DoppelGANger

Generating synthetic time series data with GANs

slide-17
SLIDE 17

What kinds of data are we interested in?

17

Multi- dimensional time series With metadata (U.S., mobile traffic)

slide-18
SLIDE 18

Datasets: Networking, security, and systems

  • Cluster traces
  • Go

Google: task resource usage logs from 12.k machines (2011)

  • IB

IBM: resource usage measurements from 100k containers

  • Traffic measurements
  • Wi

Wikipedia web traffic: # daily views of Wikipedia articles (2016)

  • FC

FCC Meas asuring Broad adban and America ca: Internet traffic and performance measurements from consumer devices around the country

18

slide-19
SLIDE 19

DoppelGANger: Time series generation

19

RN RNN Noi Noise ο»ΏAu Auxiliary Di Discrimina nator

1: 1: re real 0: 0: fa fake

ο»Ώ Di Discrimina nator

1: 1: re real 0: 0: fa fake

R1,… ,…,R ,RS RN RNN Noi Noise RT-s+

s+1,…

,…,R ,RT

… …

Mi Min/Ma Max Gener Generator (M (MLP) (mi minΒ±ma max/2 /2 Me Metadata Gener Generator (M (MLP) (A (A1, …, …, Am) Noi Noise

slide-20
SLIDE 20

Part I: RNN + Batched Generation

20

Unbatched Batched

slide-21
SLIDE 21

Challenge: Training on high-dynamic-range time series

21

Day

slide-22
SLIDE 22

Part II: Auto normalization

  • Standard normalization: Normalize by gl

global min/max

  • DoppelGANger: Normalize each timeseries individually
  • Store min/max as β€œfake” metadata

22

max min (min, max) (min, max) (min, max)

slide-23
SLIDE 23

23

slide-24
SLIDE 24

Challenge: Complex relationships in metadata

24

  • Need to capture relation between metadata and time series
  • E.g., Cable vs Mobile users
  • Straw man: Joint generator of metadata and time series
  • Problem: too hard for a single generator

Time series min value

Count

Before: Single generator

slide-25
SLIDE 25

Part III: Decoupled Generation, Auxiliary Discriminator

  • Two stage decoupling
  • Generate metadata (using a standard MLP)
  • Generate measurements conditioned on metadata
  • Auxiliary discriminator for metadata alone

25

slide-26
SLIDE 26

Histogram of !"#$!%&

'

per time series

26

Withou hout auxiliary discriminator Wi With auxiliary discriminator Count Count Time series min value

slide-27
SLIDE 27

Putting it together

27

RN RNN Noi Noise ο»ΏAu Auxiliary Di Discrimina nator

1: 1: re real 0: 0: fa fake

ο»Ώ Di Discrimina nator

1: 1: re real 0: 0: fa fake

R1,… ,…,R ,RS RN RNN Noi Noise RT-s+

s+1,…

,…,R ,RT

… …

Mi Min/Ma Max Gener Generator (M (MLP) (mi minΒ±ma max/2 /2 Me Metadata Gener Generator (M (MLP) (A (A1, …, …, Am) Noi Noise

slide-28
SLIDE 28

Temporal Correlations

Microbenchmark

28

slide-29
SLIDE 29

Predicting job failures in a compute cluster

Downstream task

  • Train on synthetic, test on real

29

slide-30
SLIDE 30

Evaluating privacy

  • Protecting business secrets
  • Aggregate functions of the data
  • User privacy
  • Differential privacy
  • Robustness against membership inference

30

slide-31
SLIDE 31

Differentially-private SGD kills fidelity in GANs

31

slide-32
SLIDE 32

Open questions: Synthetic data generation

  • Fi

Fidelity

  • Long sequences of data
  • Stateful protocols
  • Pr

Privacy

  • Differentially-private GANs
  • New privacy metrics?

32

slide-33
SLIDE 33

Identifying malicious inputs to black-box systems

Use case 2

33

Vyas Sekar Zinan Lin Hao Liang

slide-34
SLIDE 34

Bl Black-bo box De Devi vices es and System ems Abound

IoT Devices Servers / Routers Control Units in Vehicles / Manufacturing NO NO so source co code / bi bina nary / pr protoco col fo format / de design do doc

Towards Oblivious Network Analysis using GANs HotNets'19 11/14/2019 34

slide-35
SLIDE 35

Id Identifying At Attack Pa Packets is is Ha Hard

Towards Oblivious Network Analysis using GANs HotNets'19 11/14/2019

Send packets We We wa want to to id identif tify at attack ack pack packets, but but do do NO NOT ha have so source co code or system descr cript ption

35

Attacker

slide-36
SLIDE 36

Motivating example

  • Packet classification
  • Vamanan et al [SIGCOMM 2010]
  • Singh et al [SIGCOMM 2013]
  • Yingchareonthawornchai et al [TON 2018]
  • Liang et al [SIGCOMM 2019]
  • Rashelbach et al [SIGCOMM 2020]
  • Many more…

36

Classification Time Can an attacker identify many y packets with hi high classification times?

slide-37
SLIDE 37

Random packet generation

  • NeuroCuts, Liang et al [SIGCOMM 2019]

37

Thre Threshold hold Slo low p packets Fa Fast pac packets Can can we generate many, d , diverse s slo low packets? Classification Time (ms) Number of packets 2,000 total packets

slide-38
SLIDE 38

Common approaches

  • Fuzzing tools
  • Random sampling
  • Optimization of black-box functions
  • Bayesian optimization
  • Genetic algorithms
  • Simulated annealing

38

GANs can help!

slide-39
SLIDE 39

11/14/2019 Towards Oblivious Network Analysis using GANs HotNets'19

Classification decision tree Random Packets GAN Training Dataset β€œFast” packets β€œSlow” packets

Ap Approa

  • ach 1:

1: Va Vanilla GA GAN

1% 1%

16

  • Challenge: too little training data
slide-40
SLIDE 40

11/14/2019 Towards Oblivious Network Analysis using GANs HotNets'19

GAN Training Dataset β€œFast” packets β€œSlow” packets Generate packets with condition=β€œslow”

Am AmpGAN AN: Tr Training with Feedback

18

Random Packets Classification decision tree

Am AmpGAN AN

slide-41
SLIDE 41

Results

41

Classification Time (ms) Number of packets Number of packets Random

  • m

pac packets Am AmpGAN AN Thre Threshold hold

slide-42
SLIDE 42

Results

42

System Calls Fraction of β€œslow” packets

AmpGAN

Genetic Algorithms Simulated Annealing Generalized SA Bayesian Optimization AmpGAN

2.5x jump 10x jump

slide-43
SLIDE 43

Open questions

  • Sequences of inputs
  • Can we use this to optimize systems as well as finding attacks
  • E.g., CherryPick [NSDI 2017]

43

slide-44
SLIDE 44

Extracting insights from unstructured data

Use case 3

44

Zinan Lin Kiran Thekumparampil Sewoong Oh

github.com/fjxmlzn/InfoGAN-CR

slide-45
SLIDE 45
  • How do 𝑨,s control the factors?

Disentangled GANs

Generator 𝑨$ 𝑨% 𝑨&

…

𝑙 factors

  • Hair color
  • Rotation
  • Background
  • Bangs

Vanilla GANs 𝑨$ 𝑨% 𝑨&

…

Factor$ Factor% Factor'

…

45 Code & Paper: https://github.com/fjxmlzn/InfoGAN-CR

𝑒 input noise Disentangled GANs 𝑑$ 𝑑% 𝑑'

…

Factor$ Factor% Factor'

…

𝑨(s

slide-46
SLIDE 46

Examples of Disentanglement

Generator 𝑑$ 𝑑% 𝑑&

…

Changing only: 𝑑$ 𝑑% 𝑑) 𝑑* 𝑑+ hair color rotation lighting background bangs shape scale rotation x-position y-position

(CelebA dataset) (dSprites dataset)

46

𝑨(s

* CelebA example is generated by InfoGAN-CR. Dsprites example is synthetic for illustration.

Latent codes The remaining noise dimensions

slide-47
SLIDE 47

Our Solution: Contrastive Regularizer (CR)

Generator (𝐻)

𝑗 = 1 𝑗 = 2 𝑗 = 3 Same shape Same x-position Same y-position

47

Same i-th latent code

  • Use two latent codes (𝑑-,…,𝑑,,…,𝑑.), (𝑑-

/,…,𝑑, /,…,𝑑. / ) to generate a pair of

images

Equa l

slide-48
SLIDE 48

Intuition of Contrastive Regularizer (CR)

Generator (𝐻)

𝑗 = 1 𝑗 = 2 𝑗 = 3

Contrastive Regularizer (CR)

1 2 3

Classification task!

48

Same i-th latent code

  • Use two latent codes (𝑑-,…,𝑑,,…,𝑑.), (𝑑-

/,…,𝑑, /,…,𝑑. / ) to generate a pair of

images

Equa l

slide-49
SLIDE 49

InfoGAN-CR

InfoGAN-CR loss: min

0,2,3 max 4

𝑀567 𝐻, 𝐸 βˆ’ πœ‡π½ 𝐻, 𝑅 βˆ’ 𝛽𝑀8(𝐻, 𝐼)

cl clas assificat cation accu accuracy acy of

  • f CR

CR

𝐼

[1] InfoGAN. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (NeurIPS 2016)

! ∈ ℝ! $ ∈ ℝ" % ∈ ℝ#

!

GAN Discriminator InfoGAN Encoder

" #

ℝ" CR ℝ Μ‚ $ ∈ ℝ" Input Noise Latent Factors %β€² ∈ ℝ# %β€²β€² ∈ ℝ#

49

InfoGAN [1] GAN’s adversarial loss Mutual info loss

slide-50
SLIDE 50

Key open question

  • Can disentanglement help us make sense of networking data?
  • Reverse-engineer protocols
  • Categorize complex data patterns
  • Has not been tried in networking domain
  • Time series disentanglement
  • Himberg, HyvΓ€rinen, Esposito (2004)

50

slide-51
SLIDE 51

Take-home messages

  • Deep generative models show promise for networking applications
  • Synthetic data generation
  • Identifying malicious packets for black-box systems
  • Extracting structural insights from data
  • They often cannot be applied off the shelf
  • New architectures
  • Data pre-processing pipelines
  • Training mechanisms

51