query log analysis
play

Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD - PowerPoint PPT Presentation

Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD Resolver Pieter Robberechts Promotor: Prof. Hendrik Blockeel Thesis Defence Co-promoter: Ronald Geens Jun 30, 2017 Goal and Context Goal and Context The QLAD System Results


  1. Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD Resolver Pieter Robberechts Promotor: Prof. Hendrik Blockeel Thesis Defence Co-promoter: Ronald Geens Jun 30, 2017

  2. Goal and Context Goal and Context The QLAD System Results Conclusion

  3. DNS Belgium 
 The .be ccTLD resolver 1.5 million 
 domains Domain name registry 4 
 for .be/.vlaanderen/.brussels nameservers - Manage registration of domains - Provide infrastructure to answer queries 350 million 
 queries / day Highlights uit 2016. DNS Belgium. 
 URL: https://www.dnsbelgium.be/sites/default/files/generated/files/documents/cijfers%20deel%201%20-%20980px_v04_NL.pdf

  4. DNS Belgium 
 Current Situation pcap files - stored for 10 days - analysed post-mortem

  5. DNS Belgium 
 Current Situation We believe that proactive and real-time analysis of this data could contribute to the resilience and security of DNS Belgium’s service.

  6. " Design and build a working query log analysis platform using available components and custom development, able to predict, detect and report on common attack and abuse patterns in an open architecture, allowing for future growth and improvement.

  7. Anomaly Detection 
 Challenges • Huge data volume ‣ e ffi ciency and scalability ! ‣ easy to stay under the hood • No labelled or clean training data • Wide range of attacks, under constant evolution • Specific nature of DNS tra ffi c ‣ periodicity and trends ‣ few (typically two) packets per flow 


  8. Anomaly Detection 
 Goal Design and implement a query log analysis platform that: • is able to detect suspicious behaviour and a wide range of attacks • is e ffi cient enough to scan high volume tra ffi c • can detect low volume anomalies • does not need any initial knowledge about the analysed tra ffi c • is tuned to the unique nature of DNS tra ffi c • allows for future growth and improvement 


  9. Goal and Context The QLAD System The QLAD System Results Conclusion

  10. QLAD 
 Query Log Anomaly Detection • Focus on anomalies ( ≠ attacks/abuses) • Statistical techniques • Inspiration from network anomaly detection

  11. QLAD 
 System Overview ANOMALY 
 DATA TRANSFORMATION PRESENTATION DETECTION QLAD-global ENTRADA -- OR -- QLAD-UI DSC QLAD-flow

  12. Data Transformation 
 ENTRADA vs DSC ENTRADA DSC aggregate convert archive archive + SQL MonogDB API

  13. Data Transformation 
 ENTRADA vs DSC ENTRADA DSC • • Stores all tra ffi c Lightweight • • Allows a detailed analysis No additional infrastructure • SQL interface • • Storage cost and infrastructure No detailed (log level) analysis "ClientAddr": [ { "val": "195.238.24.111", "count": 1014 }, { "val": "195.238.25.53", "count": 70 }, { "val": "195.238.25.99", "count": 63 }, { "val": "195.238.24.117", "count": 61 }, { "val": "194.78.30.189", "count": 59 }, { "val": "42.236.23.92", "count": 55 }, { "val": "195.238.25.108", "count": 55 }, { "val": "42.236.23.91", "count": 54 }, { "val": "193.58.1.131", "count": 52 },

  14. QLAD-flow 
 Algorithm h ₁ Dewaele, G., Fukuda, K., Borgnat, P., Abry, P., & Cho, K. (2007). Extracting Hidden Anomalies using Sketch and Non Gaussian Multiresolution Statistical Detection Procedures. Proc. ACM SIGCOMM Workshop on Large-Scale Attack Defense (LSAD’07), 1–8.

  15. QLAD-flow 
 Algorithm 2 1 3 0 2 1 3 1 0 0 2 0 1 Level 1 α₁ , β₁ 3 3 3 4 0 2 Level 2 α₂ , β₂ 6 7 2 Level 3 α₃ , β₃

  16. QLAD-flow 
 Algorithm Level 1 Level 2 Level 3 β₁ β₁ β₁ + + + α₁ α₁ α₁ Avg Distance Anomalous sketch

  17. QLAD-flow 
 Algorithm h1 h2 h3 ∩

  18. QLAD-flow 
 Algorithm • Designed to analyse the whole TCP/IP tra ffi c. [1] - Works with TCP/IP connection identifiers (src/dst port/address). • CZ.NIC extended it to meet DNS tra ffi c specifics. [2] • Hash keys: - IP address - Uses the source IP address. - Helps finding suspicious tra ffi c sources. - Query name - First domain name of the query is extracted. - Helps finding suspicious tra ffi c from legitimate sources. - ASN - Uses network identifier. - Helps finding tra ffi c from suspicious networks. [1] Dewaele, G., Fukuda, K., Borgnat, P., Abry, P., & Cho, K. (2007). Extracting Hidden Anomalies using Sketch and Non Gaussian Multiresolution Statistical Detection Procedures. Proc. ACM SIGCOMM Workshop on Large-Scale Attack Defense (LSAD’07), 1–8. [2] Mikle, O., Slany, K., Vesely, J., Janousek, T., & Survy, O. (2011). Detecting Hidden Anomalies in DNS Communication .

  19. QLAD-flow 
 Shortcomings Some attacks span a lot of flows 
 e.g. DoS with spoofed IP address QLAD-flow is unable to detect these

  20. QLAD-global 
 Algorithm Observation : each tra ffi c anomaly causes changes in the distribution of one or more tra ffi c features

  21. QLAD-global 
 Algorithm TLD TLD GET NEW RUN UPDATE SLD SLD ENTRADA ENTROPIES DETECTOR MODELS qtype qtype - EMA rcode rcode - Kalman -- OR -- client - ... client ASN ASN country country DSC response size response size 1 2 4 REPORT ANOMALIES - timestamp - features with anomaly 3

  22. QLAD-UI 
 Rationale • Automatic classification is challenging - wide range of anomalies - subtle di ff erences • Rely on user • Visualise anomalies with relevant tra ffi c

  23. QLAD-UI 
 Implementation DATABASE DATA API USER INTERFACE HDFS Thrift API staging warehouse Node.js API MongoDB React + Flux + Grommet + D3.js Mongoose

  24. Goal and Context The QLAD System Results Results Conclusion

  25. Data 
 Description of the evaluation dataset Sunday 12 to Monday 13 February 2017 1 42 GB server 58,345,819 queries

  26. Results 
 Detected anomalies QLAD-flow QLAD-flow Total QLAD-global (source IP) (query name) (unique) Caching resolver 12 2 12 1 2 3 Bening anomaly Email marketing 8 2 8 Spam sender 3 3 Domain enumeration 5 2 5 Reflection attack 1 1 1 Broken resolver or script 1 1 DoS attack 3 2 1 3 Unknown 1 1 1 False Positive 11 35 15 9 36 TOTAL

  27. Goal and Context The QLAD System Results Conclusion Conclusion

  28. Conclusion 
 Achieved results QLAD - ENTRADA / DSC - QLAD-flow - QLAD-global - QLAD-UI is a winning combination!

  29. Conclusion 
 Future (ongoing) work • Anomaly ≠ attack / abuse => filtering needed Can this be automated? • Additional / alternative algorithms - rule based - clustering • Student job

  30. Thanks! Any questions?

  31. Appendix 
 Gamma Distribution The shape parameter α controls the evolution of Γα , β from a highly asymmetric stretched exponential shape ( α → 0) to a Gaussian shape ( α → + ∞ ). --> 1/ α can be read as a measure of the departure of Γα , β from the normal distribution N( αβ , αβ 2) 
 The scale parameter β mostly acts as a multiplicative factor (if X is Γα , β , then γ X is simply Γα , γ β ).

Recommend


More recommend