Hunting malware using its fingerprints Piotr Białczak
About me Piotr Białczak Senior Researcher CERT Polska/NASK/ Warsaw University of Technology piotr.bialczak@cert.pl Twitter: @bialczakp 2
Malware hunting A mouse and cat game? 3
Hunting malware ▷ Identification based on network infrastructure not effective as we wish to ▷ Constant change of IPs and domains ▷ Dedicated mechanisms for evasion (DGA, FastFlux) ▷ Problem with unknown threats 4
What if we could identify something constant and distinguishable? 5
Malware fingerprinting ▷ Provide mechanism to identify threats ▷ Pinpoint some rarely changed elements ▷ Make fingerprints unique ▷ Give good generalization ▷ Should be easy to share 6
Different notions of malware fingerprinting ▷ Identification of malware families ○ Directly providing family name ○ More like multiclass malware detector ▷ Generic feature extractor ○ Providing representation of some features ○ It can be labeled with malware family name ▷ I will focus on the second group 7
Network traffic fingerprints ▷ Mainly extraction of some protocol fields ▷ Choosing fields hard to change ▷ Presentation in two Source: https://commons.wikimedia.org/wiki/File:Ipv4_header.svg forms: full and concise 8
Examples of usage scenarios ▷ Grouping activities ▷ Identification of malware families ▷ Identification of different operations in single family ▷ Correlation of similar behavior between families 9
Popular Tools 10
JA3 ▷ Fingerprinting TLS clients/servers ▷ Decimal values of chosen fields in Client Hello messages (then MD5 hash) ▷ Extensively supported ▷ Available: https://github.com/salesforce /ja3 Source: https://engineering.salesforce.com/open-sourcing-ja3-92c9e53c3c41 11
HASSH ▷ Fingerprinting SSH clients/servers ▷ Decimal values of chosen fields in SSH_MSG_KEXINIT messages (then MD5 hash) ▷ Available here: https://github.com/salesforce /hassh 12
FATT ▷ Fingerprint all the things ▷ Supports: JA3, HASSH ▷ Also: RDFP, gQUIC, HTTP header fingerprint ▷ Live network and pcap files via pyshark ▷ Available here: https://github.com/0x4D31/fatt/ 13
Other great tools ▷ HTTP - https://lcamtuf.coredump.cx/p0f3/ ▷ TLS, DTLS, SSH, HTTP, TCP - https://github.com/cisco/mercury ▷ Identification framework, https://github.com/rapid7/recog 14
SMTP fingerprinting ▷ More complex approach ▷ Fingerprinting SMTP implementation (dialect) ○ Exchanged messages (also their case) ○ SMTP extensions ○ IMF fields ▷ Detection of spambots ▷ Stringhini et al. B@bel: Leveraging Email Delivery for Spam Mitigation ▷ Also SISSDEN.eu 15
SMTP dialects Source: https://sissden.eu/blog/analysis-of-smtp-dialects Unfortunately no open-sourced tool 16
DEMO 17
hfinger 18
Problems of current HTTP fingerprinting ▷ Collisions between families ▷ Collisions with benign software ▷ Limited analysis of URL and payload 19
hfinger URL Header structure Payload URL length request method length ● ● ● Number of directories, protocol version entropy ● ● ● variables header order presence of non-ASCII ● ● File extension popular headers' characters ● ● URL parts lengths values ● 20
Fingerprint creation POST /dir1/dir2?var1=val1 HTTP/1.1 Host: 127.0.0.1:8000 Accept: */* User-Agent: My UA Content-Length: 9 Content-Type: application/x-www-form-urlencoded misc=test Direct representation of features - for easier interpretation 21
Encoding "application/javascript":"ap-ja", "application/json":"ap-js", Analyzed popular headers "application/octet-stream":"ap-os", Connection ● "application/pdf":"ap-pd", Accept-Encoding ● "application/x-octet-stream":"ap-x-o-s", Content-Encoding ● "application/x-www-form-urlencoded":"ap-x-w-f-u", "audio/mpeg":"au-mp", Cache-Control ● "binary":"bi", TE ● "image/gif":"im-gi", Accept-Charset ● "multipart/form-data":"mu-f-d", Content-Type ● "octet/binary":"oc-bi", Accept "octet-stream":"oc-st", ● "text/csv":"te-cs", Accept-Language ● "text/html":"te-ht", User-Agent ● "text/plain":"te-pl", "text/xml":"te-xm" 22
Different report modes ▷ Configurable report modes - depending on desired information level ▷ Default ○ Tries to optimize uniqueness vs generalization ▷ Informative ○ More features presented ▷ All features ○ Full information, but also more fingerprints 23
More info ▷ https://github.com/CERT-Polska/hfinger ▷ Working prototype stage - there will be some changes ▷ Written in Python, uses Tshark for pcap parsing and HTTP reassembly ▷ Comments, issues, PRs are welcomed 24
hfinger demo 25
Conclusion Fingerprints are helpful in hunting malware ▷ Until malware developers start to evade fingerprinters ▷ Problem of generalization vs collisions ▷ Some popular protocols are already covered, but not all of ▷ them Give a try to hfinger :-) ▷ 26
27
Presentation template - slidescarnival.com Hunting malware using its fingerprints Piotr Białczak
Recommend
More recommend