Query Log Analysis Detecting Anomalies in DNS Tra ffi c at a TLD Resolver Pieter Robberechts Promotor: Prof. Hendrik Blockeel Thesis Defence Co-promoter: Ronald Geens Jun 30, 2017
Goal and Context Goal and Context The QLAD System Results Conclusion
DNS Belgium The .be ccTLD resolver 1.5 million domains Domain name registry 4 for .be/.vlaanderen/.brussels nameservers - Manage registration of domains - Provide infrastructure to answer queries 350 million queries / day Highlights uit 2016. DNS Belgium. URL: https://www.dnsbelgium.be/sites/default/files/generated/files/documents/cijfers%20deel%201%20-%20980px_v04_NL.pdf
DNS Belgium Current Situation pcap files - stored for 10 days - analysed post-mortem
DNS Belgium Current Situation We believe that proactive and real-time analysis of this data could contribute to the resilience and security of DNS Belgium’s service.
" Design and build a working query log analysis platform using available components and custom development, able to predict, detect and report on common attack and abuse patterns in an open architecture, allowing for future growth and improvement.
Anomaly Detection Challenges • Huge data volume ‣ e ffi ciency and scalability ! ‣ easy to stay under the hood • No labelled or clean training data • Wide range of attacks, under constant evolution • Specific nature of DNS tra ffi c ‣ periodicity and trends ‣ few (typically two) packets per flow
Anomaly Detection Goal Design and implement a query log analysis platform that: • is able to detect suspicious behaviour and a wide range of attacks • is e ffi cient enough to scan high volume tra ffi c • can detect low volume anomalies • does not need any initial knowledge about the analysed tra ffi c • is tuned to the unique nature of DNS tra ffi c • allows for future growth and improvement
Goal and Context The QLAD System The QLAD System Results Conclusion
QLAD Query Log Anomaly Detection • Focus on anomalies ( ≠ attacks/abuses) • Statistical techniques • Inspiration from network anomaly detection
QLAD System Overview ANOMALY DATA TRANSFORMATION PRESENTATION DETECTION QLAD-global ENTRADA -- OR -- QLAD-UI DSC QLAD-flow
Data Transformation ENTRADA vs DSC ENTRADA DSC aggregate convert archive archive + SQL MonogDB API
Data Transformation ENTRADA vs DSC ENTRADA DSC • • Stores all tra ffi c Lightweight • • Allows a detailed analysis No additional infrastructure • SQL interface • • Storage cost and infrastructure No detailed (log level) analysis "ClientAddr": [ { "val": "195.238.24.111", "count": 1014 }, { "val": "195.238.25.53", "count": 70 }, { "val": "195.238.25.99", "count": 63 }, { "val": "195.238.24.117", "count": 61 }, { "val": "194.78.30.189", "count": 59 }, { "val": "42.236.23.92", "count": 55 }, { "val": "195.238.25.108", "count": 55 }, { "val": "42.236.23.91", "count": 54 }, { "val": "193.58.1.131", "count": 52 },
QLAD-flow Algorithm h ₁ Dewaele, G., Fukuda, K., Borgnat, P., Abry, P., & Cho, K. (2007). Extracting Hidden Anomalies using Sketch and Non Gaussian Multiresolution Statistical Detection Procedures. Proc. ACM SIGCOMM Workshop on Large-Scale Attack Defense (LSAD’07), 1–8.
QLAD-flow Algorithm 2 1 3 0 2 1 3 1 0 0 2 0 1 Level 1 α₁ , β₁ 3 3 3 4 0 2 Level 2 α₂ , β₂ 6 7 2 Level 3 α₃ , β₃
QLAD-flow Algorithm Level 1 Level 2 Level 3 β₁ β₁ β₁ + + + α₁ α₁ α₁ Avg Distance Anomalous sketch
QLAD-flow Algorithm h1 h2 h3 ∩
QLAD-flow Algorithm • Designed to analyse the whole TCP/IP tra ffi c. [1] - Works with TCP/IP connection identifiers (src/dst port/address). • CZ.NIC extended it to meet DNS tra ffi c specifics. [2] • Hash keys: - IP address - Uses the source IP address. - Helps finding suspicious tra ffi c sources. - Query name - First domain name of the query is extracted. - Helps finding suspicious tra ffi c from legitimate sources. - ASN - Uses network identifier. - Helps finding tra ffi c from suspicious networks. [1] Dewaele, G., Fukuda, K., Borgnat, P., Abry, P., & Cho, K. (2007). Extracting Hidden Anomalies using Sketch and Non Gaussian Multiresolution Statistical Detection Procedures. Proc. ACM SIGCOMM Workshop on Large-Scale Attack Defense (LSAD’07), 1–8. [2] Mikle, O., Slany, K., Vesely, J., Janousek, T., & Survy, O. (2011). Detecting Hidden Anomalies in DNS Communication .
QLAD-flow Shortcomings Some attacks span a lot of flows e.g. DoS with spoofed IP address QLAD-flow is unable to detect these
QLAD-global Algorithm Observation : each tra ffi c anomaly causes changes in the distribution of one or more tra ffi c features
QLAD-global Algorithm TLD TLD GET NEW RUN UPDATE SLD SLD ENTRADA ENTROPIES DETECTOR MODELS qtype qtype - EMA rcode rcode - Kalman -- OR -- client - ... client ASN ASN country country DSC response size response size 1 2 4 REPORT ANOMALIES - timestamp - features with anomaly 3
QLAD-UI Rationale • Automatic classification is challenging - wide range of anomalies - subtle di ff erences • Rely on user • Visualise anomalies with relevant tra ffi c
QLAD-UI Implementation DATABASE DATA API USER INTERFACE HDFS Thrift API staging warehouse Node.js API MongoDB React + Flux + Grommet + D3.js Mongoose
Goal and Context The QLAD System Results Results Conclusion
Data Description of the evaluation dataset Sunday 12 to Monday 13 February 2017 1 42 GB server 58,345,819 queries
Results Detected anomalies QLAD-flow QLAD-flow Total QLAD-global (source IP) (query name) (unique) Caching resolver 12 2 12 1 2 3 Bening anomaly Email marketing 8 2 8 Spam sender 3 3 Domain enumeration 5 2 5 Reflection attack 1 1 1 Broken resolver or script 1 1 DoS attack 3 2 1 3 Unknown 1 1 1 False Positive 11 35 15 9 36 TOTAL
Goal and Context The QLAD System Results Conclusion Conclusion
Conclusion Achieved results QLAD - ENTRADA / DSC - QLAD-flow - QLAD-global - QLAD-UI is a winning combination!
Conclusion Future (ongoing) work • Anomaly ≠ attack / abuse => filtering needed Can this be automated? • Additional / alternative algorithms - rule based - clustering • Student job
Thanks! Any questions?
Appendix Gamma Distribution The shape parameter α controls the evolution of Γα , β from a highly asymmetric stretched exponential shape ( α → 0) to a Gaussian shape ( α → + ∞ ). --> 1/ α can be read as a measure of the departure of Γα , β from the normal distribution N( αβ , αβ 2) The scale parameter β mostly acts as a multiplicative factor (if X is Γα , β , then γ X is simply Γα , γ β ).
Recommend
More recommend