combining machine and automata learning
play

Combining Machine and Automata Learning for Network Traffic - PowerPoint PPT Presentation

Combining Machine and Automata Learning for Network Traffic Classification Zeynab Sabahi, Fatemeh Ghassemi, and Zahra Alimadadi TTCS 2020,Tehran, Iran 1 Network Traffic Classification, What & Why?


  1. ميحرلا نمحرلا للوا مسب Combining Machine and Automata Learning for Network Traffic Classification Zeynab Sabahi, Fatemeh Ghassemi, and Zahra Alimadadi TTCS 2020,Tehran, Iran 1

  2. Network Traffic Classification, What & Why? For a given interleaved packet trace, we want to detect which applications are running ? For the network management tasks: - Anomaly detection, - Balancing bandwidth usage, - Firewalling, gateway .. . 010011011001111 2

  3. Network Traffic Classification, How?  Port-based classification: Inefficient (random or non-standard ports usage)   Payload inspection: Useless in encrypted traffic   Statistical methods: Flow/packet statistical features Fast but less accurate  Ignore temporal relation among flows   Behavioral classification: Specific to the category of application  3

  4. Our solution  Intuition:  A network application is a program calling different well-known protocols such as HTTP, TCP, SSL, and TLS. TCP TLS HTTP User  Each application has its specific network communication language when calling different well-known protocols. 4

  5. Research Goals  Learning the network language for each application that we do not have its source code, in an automatic way  Classifying an interleaved packet traces of applications according to the learned languages 5

  6. Research Goals k-TSS language  Learning the network language for each application that we do not have its source code, in an automatic way  Classifying an interleaved packet traces of applications according to the learned languages 6

  7.  Introduction  Preliminary: k-TSS Language  NeTLang Framework  Evaluation  Conclusion 7

  8. Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. 8

  9. Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. aba aabababb  Window of size 3  Segments = {aba}  Prefixes = {}  Suffixes = {} 9

  10. Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. a baa abababb  Window of size 3  Segments = {aba, baa}  Prefixes = {}  Suffixes = {} 10

  11. Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. ab aaa bababb  Window of size 3  Segments = {aba, baa, aaa}  Prefixes = {}  Suffixes = {} 11

  12. Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. ab aaabababb  Window of size 3  Segments = {aba, baa, aaa, aab, bab, abb}  Prefixes = {ab}  Suffixes = {} 12

  13. Formal Foundation: k-TSS Language  k - T estable language in the S trict S ense is a regular k-size window language. Its learning is decidable.  Words are determined by three allowed sets prefixes, suffixes, and segments. abaaababa bb  Window of size 3  Segments = {aba, baa, aaa, aab, bab, abb}  Prefixes = {ab}  Suffixes = {bb} 13

  14. Formal Definition of k-TSS Language  Definition 1 (k-test vector) Let k > 0. A k-test vector is a 5-tuple 𝑎 = < 𝛵, 𝐽, 𝐺, 𝑈, 𝐷 > where:  𝐽 ⊆ Σ 𝑙−1 is a set of allowed prefixes  𝐺 ⊆ Σ 𝑙−1 is a set of allowed suffixes  𝑈 ⊆ Σ 𝑙 is a set of allowed segments  𝐷 ⊆ Σ <𝑙 is a set of allowed short strings  Definition 2 (k-TSS Language) Let 𝑎 = < Σ, 𝐽, 𝐺, 𝑈, 𝐷 > be a k-test vector, for some k > 0. L(Z) = [(𝐽Σ ∗ ∩ Σ ∗ 𝐺) − Σ ∗ (Σ 𝑙 − 𝑈)Σ ∗ ] ∪ 𝐷  14

  15. Formal Definition of k-TSS Language What is it? How should it be defined for network domain?  Definition 1 (k-test vector) ? Let k > 0. A k-test vector is a 5-tuple 𝑎 = < 𝜯, 𝐽, 𝐺, 𝑈, 𝐷 > where:  𝐽 ⊆ Σ 𝑙−1 is a set of allowed prefixes  𝐺 ⊆ Σ 𝑙−1 is a set of allowed suffixes  𝑈 ⊆ Σ 𝑙 is a set of allowed segments  𝐷 ⊆ Σ <𝑙 is a set of allowed short strings  Definition 2 (k-TSS Language) Let 𝑎 = < Σ, 𝐽, 𝐺, 𝑈, 𝐷 > be a k-test vector, for some k > 0. L(Z) = [(𝐽Σ ∗ ∩ Σ ∗ 𝐺) − Σ ∗ (Σ 𝑙 − 𝑈)Σ ∗ ] ∪ 𝐷  15

  16.  Introduction  Preliminary: k-TSS Language  NeTLang Framework  Evaluation  Conclusion 16

  17. Translating Network Concepts to Automata Learning Intuition: some packets always appear together due to the control phase of protocols or the specific functionality of an application A sequence of related packets : A symbol of the alphabet A packet trace of an application : A word of the language For a set of all packet traces of an application its k-TSS language can be learned 17

  18. NeTLang Framework  Ne twork T raffic Lan guage Learner: NeTLang  Architectural View: 18

  19. NeTLang Framework  Ne twork T raffic Lan guage Learner: NeTLang  Architectural View: 1 2 3 19

  20. 1) Trace Generator  Different coloring is for their protocol.  Clustering algorithm is Kmeans++.  Stats is statistical features based on length, number, and IAT of packets. 20

  21. 2) Language Learner  By moving a k-window sliding parser the k-TSS vector is learned.  For the running example (k=3):  Σ = {H-2, SL-2, SL-3, SL-4, SL-5, T-1, T-10, TL-2, U- 0, U-1}  T = {SL-2 T-1 U-0, SL-4 SL-2 T-1, T-1 SL-5 H-2, T-1 U-0 U-1, T-10 TL-2 T-1, TL-2 T-1 SL-5, U-0 U-1 SL-3}  I = {SL-4 SL-2, T-10 TL-2}  F = {SL-5 H-2, U-1 SL-3} 21

  22. 3) Classifier The automata of applications The interleaved packet trace App1 App2 . . . 22

  23. 3) Classifier The automata of applications The interleaved packet trace App1 The trace generator module is used to divide the symbolic sub-traces by timing features. App2 . . . 23

  24. 3) Classifier The automata of applications Sub-trace s1 App1 Automata word inclusion is not a suitable approach due to App2 the incomplete sub-traces and network noises. . . . 24

  25. 3) Classifier The automata of applications Sub-trace s1 App1 Z(App1) = < 𝛵 1 , 𝐽 1 , 𝐺 1 , 𝑈 1 > Z(s1) = < 𝛵, 𝐽, 𝐺, 𝑈 > Window-based Similarity App2 . . . 25

  26. 3) Classifier The automata of applications Sub-trace s1 App1 Z(App1) = < 𝛵 1 , 𝐽 1 , 𝐺 1 , 𝑈 1 > Z(s1) = < 𝛵, 𝐽, 𝐺, 𝑈 > Window-based Similarity App2 Percentage 𝛦𝑈 = 𝑈 −𝑈1 1 = 𝑈1 −𝑈 , 𝛦𝑈 𝑈1 , 𝑈 Change 𝛦Ʃ = Ʃ −Ʃ1 , 𝛦𝐽 = 𝐽 −𝐽1 𝐽 , 𝛦𝐺 = 𝐺 −𝐺1 metric Ʃ 𝐺 . . . distance(s1, App1) = 𝛦𝑈 𝛦𝑈 1 𝛦Ʃ 𝛦𝐽 𝛦𝐺 26

  27. 3) Classifier The automata of applications Sub-trace s1 App1 Z(App1) = < 𝛵 1 , 𝐽 1 , 𝐺 1 , 𝑈 1 > Z(s1) = < 𝛵, 𝐽, 𝐺, 𝑈 > Window-based Similarity App2 𝛦𝑈 = 𝑈 −𝑈1 1 = 𝑈1 −𝑈 , 𝛦𝑈 𝑈1 , 𝑈 𝛦Ʃ = Ʃ −Ʃ1 , 𝛦𝐽 = 𝐽 −𝐽1 𝐽 , 𝛦𝐺 = 𝐺 −𝐺1 Ʃ 𝐺 In general: . . D(Z(w), Z( 𝐵𝑞𝑞 𝑗 )) = Δ𝑈 Δ𝑈 𝑗 ΔƩ . ΔI ΔF 27

  28. 3) Classifier The automata of applications Sub-trace s1 App1 distance(s1, App1) Min = distance(s1, Appj) App2 Class(s1) = 𝐵𝑞𝑞 𝑘 distance(s1, App2) . . . . . . 28

  29. 3) Classifier The automata of applications Sub-trace s1 App1 distance(s1, App1) Min = distance(s1, Appj) App2 Class(s1) = 𝐵𝑞𝑞 𝑘 distance(s1, App2) Class(w) = j if D(L(w), L( 𝐵𝑞𝑞 𝑘 )) = . . . . 𝑏𝑠𝑕𝑛𝑗𝑜 ∀ 𝐵𝑞𝑞 𝑗 ∈ |A| (D(L(w), L( 𝐵𝑞𝑞 𝑗 ))) . . 29

  30. Classifier Result for the Running Example  Z(w= SL-4 SL-2 T-10 TL-2 T-1 U-2 ):  Z(App ): Σ’ = {H-2, SL-2, SL-3, SL-4, SL-5, Σ = { SL-2, TL-2, T-1, U-2, SL-4, T-10}   T-1, T-10, TL-2, U-0, U-1} T = {SL-4 SL-2 T-10, SL-2 T-10 TL-2, T-10 TL-  T ’ = {SL-2 T-1 U-0, SL-4 SL-2 T-1,  2 T-1,TL-2 T-1 U-2} T-1 SL-5 H-2, T-1 U-0 U-1, T-10 I = {SL-4 SL-2,}  TL-2 T-1, TL-2 T-1 SL-5, U-0 U-1 F = {T-1 U-2}  SL-3} I ’ = {SL-4 SL-2, T-10 TL-2}  F ’ = {SL-5 H-2, U-1 SL-3}  𝑗 = 0.85, ΔƩ = 0 .16, Δ𝑈 = 0.75, Δ𝑈 ΔI = 0, ΔF = 1 D(L(w),L(app)) = 7484160099 31

  31.  Introduction  Preliminary: k-TSS Language  NeTLang Framework  Evaluation  Conclusion 32

  32. Dataset Description We divided pcaps to three sets: - Train: 65% - Validation: 15% - Test: 20% Metrics: - Precision (P), - Recall (R), - F1-Measure= 2∗𝑄∗𝑆 𝑄+𝑆 33

  33. Evaluation Results The Best Configurations of Validation Set Parameter Application Traffic Identification Characterization Session 15 15 Threshold Inactive 5 15 Timeout Flow 10 10 Duration k 3 3 34

  34. Compare with Statistical Classifiers: Application Identification Precision F1-Measure Recall 35

Recommend


More recommend