See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/333652427 Presentation: Scalable Detection of Botnets Based on DGA Presentation · June 2019 DOI: 10.13140/RG.2.2.24134.32322 CITATIONS READS 0 26 3 authors: Mattia Zago Manuel Gil Pérez University of Murcia University of Murcia 19 PUBLICATIONS 32 CITATIONS 87 PUBLICATIONS 461 CITATIONS SEE PROFILE SEE PROFILE Gregorio Martinez Perez University of Murcia 252 PUBLICATIONS 2,224 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: AuthCODE View project Selfnet View project All content following this page was uploaded by Mattia Zago on 07 June 2019. The user has requested enhancement of the downloaded file.
S CALABLE D ETECTION OF B OTNETS BASED ON DGA E FFICIENT F EATURE D ISCOVERY P ROCESS IN M ACHINE L EARNING T ECHNIQUES Speaker: Mattia Zago Authors: M. Zago, M. Gil Pérez, G. Martínez Pérez Available Online – Soft Computing – Q2 IF: 2.367 Zago, M., Gil Pérez, M. & Martínez Pérez, G. Soft Comput (2019). 10.1007/s00500-018-03703-8
O UR A GENDA FOR T ODAY Background & Motivation State of The Subject localisation Relevance Art Objective Machine learning Analysis algorithms Feature sets and families Exploratory feature analysis Challenges Classification results Binary problem Multiclass problem Data Best practices March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 2
W HAT IS A B OTNET ? March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 3
DGA: D OMAIN G ENERATION A LGORITHM March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 4
DGA: D OMAIN G ENERATION A LGORITHM Objective Analyse DNS queries to detect malicious AGDs connections March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 5
A PPROACHES TO THE DETECTION – ML March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 6
S TATE OF T HE A RT REGARDING A LGORITHMS Identified Since Selected 2010 +30 More than researches 100 articles We have identified six comparison metrics Machine Learning approach Type of application (either supervised, non (e.g., binary or multiclass classifier, supervised) correlation, anomaly detection, 01 02 etc.) Family of features used Comparisons 06 03 (i.e., either Context-Free or with other works, approaches or Context-Aware) algorithms 05 04 Real-time analysis Achieved results (i.e., online detection, (either poor, average, good and performance scalability, etc.) excellent) March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 7
A PPROACHES TO DGA DETECTION – F EATURES L ANGUAGE A NALYSIS Context-Free Usage of Natural Language Process A feature that is related only to a techniques to estimate FQDN and thus is independent of if the domain is legit or not contextual information, including, but (i.e. test the randomness) not limited to, timing, origin or any Examples other environment configuration. – Length of the string – Entropy – Frequency analysis – Vowels ratio DNS Q UERY A NALYSIS Context-Aware Decode sniffed queries and responses A feature that is dependent on the and look for “troublesome” indicators specific malware sample execution, that may suggest a regular pattern. which is realised in a precise environment with a specific config. Examples and in a particular time frame. Num. of connections – Num. of IP addresses– Num. of NXDomains – Longevity of domain – March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 8
S TATE OF T HE A RT REGARDING F EATURES Used By Code Description 11 14 16 24 28 33 34 38 45 46 48 49 51 52 54 56 57 58 59 65 66 Tot. 3 5 9 NLP-L-x String length ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 16 NLP-LDN Number of domain levels 3 ✔ ✔ ✔ NLP-R-NUM-x Ratio of numerical characters ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 8 NLP-R-VOW-x Ratio of vowel characters 4 ✔ ✔ ✔ ✔ NLP-R-CON-x Ratio of consonants characters 4 ✔ ✔ ✔ ✔ NLP-LANG Language hypothesis 2 ✔ ✔ NLP-LC-C Longest consecutive cons. sequence 5 ✔ ✔ ✔ ✔ ✔ NLP-LC-V Longest consecutive vowel sequence ✔ 1 NLP-LC-D Longest consecutive number seq. 3 ✔ ✔ ✔ NLP-COV Covariance matrix ✔ 1 NLP-R-MC Ratio of meaningful characters 3 ✔ ✔ ✔ NLP-LMS Length of longest meaningful string 0 NLP-WLU Number of “word-like” units 1 ✔ NLP-SQS Domain squatting score 1 ✔ NLP-LED Levenshtein Edit Distance ✔ ✔ 2 NLP-nG-FR Frequency distribution (histogram) 4 ✔ ✔ ✔ ✔ NLP-nG-E Entropy 11 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ NLP-nG-COV Covariance 1 ✔ NLP-nG-MEAN Mean of frequencies 1 ✔ NLP-nG-MED Median of frequencies ✔ 1 NLP-nG-VAR Variance of frequencies 1 ✔ NLP-nG-STD Standard deviation of frequencies ✔ 1 NLP-nG-PRO Pronounceability score 3 ✔ ✔ ✔ NLP-nG-NORM Normality score 3 ✔ ✔ ✔ NLP-nG-PRT Transition probability 2 ✔ ✔ NLP-nG-PRA Probability of appearance 2 ✔ ✔ NLP-nG-PRI Index probability ✔ ✔ 2 NLP-nG-DST-KL Kullback-Leiber divergence 2 ✔ ✔ NLP-nG-DST-JI Jaccard Index measure ✔ ✔ ✔ ✔ 4 NLP-nG-DST-TH Distance - Threshold 1 ✔ NLP-nG-DST-AF Distance - Avg. frequency 1 ✔ NLP-nG-DST-AC Distance - Avg. count 2 ✔ ✔ Total 1 4 7 3 1 4 7 1 9 1 1 2 7 5 8 3 8 6 1 2 3 4 5 3 March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 9
S TATE OF T HE A RT REGARDING F EATURES Used By Code Description 11 14 16 24 28 33 34 38 45 46 48 49 51 52 54 56 57 58 59 65 66 Tot. 3 5 9 NLP-L-x String length ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 16 NLP-LDN Number of domain levels 3 ✔ ✔ ✔ NLP-R-NUM-x Ratio of numerical characters ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 8 NLP-R-VOW-x Ratio of vowel characters 4 ✔ ✔ ✔ ✔ NLP-R-CON-x Ratio of consonants characters 4 ✔ ✔ ✔ ✔ NLP-LANG Language hypothesis 2 ✔ ✔ NLP-LC-C Longest consecutive cons. sequence 5 ✔ ✔ ✔ ✔ ✔ NLP-LC-V Longest consecutive vowel sequence ✔ 1 NLP-LC-D Longest consecutive number seq. 3 ✔ ✔ ✔ NLP-COV Covariance matrix ✔ 1 NLP-R-MC Ratio of meaningful characters 3 ✔ ✔ ✔ NLP-LMS Length of longest meaningful string 0 NLP-WLU Number of “word-like” units 1 ✔ NLP-SQS Domain squatting score 1 ✔ NLP-LED Levenshtein Edit Distance ✔ ✔ 2 NLP-nG-FR Frequency distribution (histogram) 4 ✔ ✔ ✔ ✔ NLP-nG-E Entropy 11 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ NLP-nG-COV Covariance 1 ✔ NLP-nG-MEAN Mean of frequencies 1 ✔ NLP-nG-MED Median of frequencies ✔ 1 NLP-nG-VAR Variance of frequencies 1 ✔ NLP-nG-STD Standard deviation of frequencies ✔ 1 NLP-nG-PRO Pronounceability score 3 ✔ ✔ ✔ NLP-nG-NORM Normality score 3 ✔ ✔ ✔ NLP-nG-PRT Transition probability 2 ✔ ✔ NLP-nG-PRA Probability of appearance 2 ✔ ✔ NLP-nG-PRI Index probability ✔ ✔ 2 NLP-nG-DST-KL Kullback-Leiber divergence 2 ✔ ✔ NLP-nG-DST-JI Jaccard Index measure ✔ ✔ ✔ ✔ 4 NLP-nG-DST-TH Distance - Threshold 1 ✔ NLP-nG-DST-AF Distance - Avg. frequency 1 ✔ NLP-nG-DST-AC Distance - Avg. count 2 ✔ ✔ Total 1 4 7 3 1 4 7 1 9 1 1 2 7 5 8 3 8 6 1 2 3 4 5 3 March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 10
S TATE OF T HE A RT REGARDING F EATURES - E XPLORE Scatter plot of 10.000 FQDNs Axis: Horizontal Length • Vertical Entropy • Dots: Green Legitimate • Other colours Malware • March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 11
S TATE OF T HE A RT REGARDING F EATURES - E XPLORE Scatter plot of 10.000 FQDNs Axis: Horizontal Length • Vertical Entropy • Dots: Light Blue Legitimate • Red Malware • March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 12
E XAMPLE OF F EATURE A NALYSIS Features that are interesting for Features that are interesting for their values their shapes Length of domain name Longest Consecutive (excluding TLD) Consonant Sequence March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 13
Recommend
More recommend