Fighting Malware With GPUs In Real Time Peter Kovac kovac@avast.com www.avast.com
Introduction • Sensor network – Few hundred million of user machines • Hundreds of thousands new files every day – Most of them clean files – Thousands identified as malware
Legacy system setup Sample collected by - sensor network Automatic file- - Sent to submit servers Malware Analysts based detections - Analyzed internally - New detection created and sent to clients Scavenger Submit servers Sensor network
Challenges Malware Analysts Huge amounts of data Scavenger Submit servers Sensor network
Challenges Malware Analysts Huge amounts of data Scavenger Expensive pre-processing Submit servers Sensor network
Challenges High latency in data processing Malware Analysts Huge amounts of data Scavenger Expensive pre-processing Submit servers Sensor network
Medusa • GPU-accelerated database – Fixed size binary records (file fingerprint/metadata) • Nearest neighbor queries • Rule matching queries ? • Classification of unknown records
Database record • One record = constant vector of over 100 attributes • the “file fingerprint” • Each attribute has a data type and semantic Attribute Data Type Semantic sha256 32 byte array CHECKSUM pe_sect_cnt uint16_t VALUE pe_sect_rawoff_entry uint32_t OFFSET • The complete contents of the vector are kept secret • static and dynamic features of PE executables
Data mapping DISK / RAM GPU R1:A1 R1:A2 R1:A1 R1:A3 R2:A1 R3:A1 • One block of data • Block of data per attribute • Records in random order • Records in same order as • index: key -> position on disk • Columnar database
Nearest neighbor query • Compound distance function • Data type and semantic determine partial dist. func. Data Type Semantic Partial distance function 32 byte array CHECKSUM RETURN_ZERO uint16_t VALUE EQUAL_RET32 uint32_t OFFSET LOG • Each partial distance function = one kernel function • Over 100 kernels for every NN query • Intermediate results kept in the “Scratchpad”
The Scratchpad • Large array – 4 bytes of storage per record • NN query workflow: memset scratchpad to 0 For each kernel: add partial distance to scratchpad [no synchronization required!] thrust::stable_sort_by_key on scratchpad and an index table • Scratchpad data can be reused
Rule matching query • Rule: “ pe_sect_cnt == 5 AND file_length >= 0x1000” • Data type and operator determine matching function Data Type Operator Matching function uint64_t == EQUAL<T>, T = uint64_t uint16_t >= GREATER<T>, T = uint16_t uint32_t RANGE RANGE<T>, T = uint32_t • Scratchpad used for intermediate results • kernels add 1 for each failure on a rule
Medusa as a classifier • Medusa holds 3 tables with different types of records • Clean (30+ million records) • Malware (2-3 million recent threats) • Undecided (1-2 million recent samples) • NN query on all tables = cluster of most similar records • Instance based learning • Variation on a kNN classifier • Safeguards against special cases (min/max distance) • What use has the undecided table? • Rule generator!
Medusa as a rule generator • Analysis of the cluster from classifier • Attributes with same/similar values • Construct a typical record representing the cluster Usually about 60-80% of attributes can be used Typical representative of the cluster
Medusa as a rule generator • Rules describing the typical representative • Too specific : up to 100 conditions • Find a subset (~20) of the conditions • No hits on clean records • Maximize hits on malware/undecided records • Billions of billions of possible combinations • Stochastic approach > greedy methods • Requires thousands of scans over all data • On average this process takes 3 seconds
Evo-gen • Final result from the rule generator • Hundreds released daily • See Libor Morkovsky’s poster (P5313) for more details
GPUs for heavy lifting • CPUs: 4x Intel Xeon @ 2.40GHz • GPUs: 4x Nvidia GTX Titan • Dataset: 17 GB of file fingerprints Operation CPU GPU Speedup Rule matching query ~220ms ~10ms ~22x 256-NN query 1300ms 100ms 13x Rule generation >60s ~3s >20x • Medusa on CPU: • Not used: slow, does not scale well • Scaling problems : synchronization, faulty nodes…
Improvements Reduced data size Malware Analysts Submit servers Scavenger Sensor network Medusa
Improvements Reduced data size Malware Analysts Submit servers Scavenger Sensor network Pre-processing on client machines Medusa
Improvements Reduced data size Malware Analysts Submit servers Scavenger Sensor network Pre-processing on client machines Automated generic Medusa detections
Improvements Better supporting tools for analysts Reduced data size Malware Analysts Submit servers Scavenger Sensor network Pre-processing on client machines Automated generic Medusa detections
Patient Zero • First user to encounter an unknown threat • Might be unprotected! • Evo-gen time to release: ~minutes A sample has to go through internal systems before we can detect it.
Going real time • Real time classifier for low prevalence samples RT Medusa Medusa
Real time classifier • Hundreds of thousands queries daily • Steady growth since deployment in Avast 2015 • Time constrained caching to avoid load spikes • Response time ~120ms only possible thanks to GPUs RT Medusa Avast users
Questions? Contact me: kovac@avast.com
Recommend
More recommend