Theory, Code and Result - DeepSec

7 downloads 113 Views 3MB Size Report
Theory behind Malware Attribution. Code to conduct Malware Attribution analysis. Result of analysis ... Domain List. (ww
Malware Attribution Theory, Code and Result

Who am I? • Michael Boman, M.A.R.T. project • Have been “playing around” with malware analysis “for a while”

• Working for FireEye • This is a HOBBY project that I use my SPARE TIME to work on

Agenda Theory behind Malware Attribution

Code to conduct Malware Attribution analysis

Result of analysis

Theory



Malware Attribution: tracking cyber spies - Greg Hoglund, Blackhat 2010 http://www.youtube.com/watch?v=k4Ry1trQhDk

What am I trying to do? Move this way Binary

Human

What am I trying to do? Blacklists

Binary

Net Recon Command and Control

Developer Fingerprints

Tactics Techniques Procedures

Social Cyberspace DIGINT

Physical Surveillance HUMINT

Human

What am I trying to do? Blacklists

Binary

Net Recon Command and Control

Developer Fingerprints

Tactics Techniques Procedures

Social Cyberspace DIGINT

Physical Surveillance HUMINT

Human

Blacklists

Net Recon Command and Control

Developer Fingerprints

Tactics Techniques Procedures

Social Cyberspace DIGINT

Physical Surveillance HUMINT

Physical Surveillance HUMINT Social Cyberspace DIGINT Developer Fingerprints

Tactics Techniques Procedures Blacklists

Net Recon Command and Control

Actions / Intent Installation / Deployment CNA (spreader) / CNE (search & exfil tool) COMS Defensive / Anti-forensic Exploit Shellcode DNS, Command and Control Protocol, Encryption

Physical Surveillance HUMINT Social Cyberspace DIGINT Developer Fingerprints

Tactics Techniques Procedures Blacklists

Net Recon Command and Control

Actions / Intent Installation / Deployment CNA (spreader) / CNE (search & exfil tool) COMS Defensive / Anti-forensic Exploit Shellcode DNS, Command and Control Protocol, Encryption

Steps • Step 0: Gather malware • Step 1: Extract metadata from binary • Step 2: Store metadata and binary in MongoDB

• Step 3: Analyze collected data

Step 0: Gather malware • • • •

VirusShare (virusshare.com)



Malware Domain List (www.malwaredomainlist.com/mdl.php)

OpenMalware (www.offensivecomputing.net) MalShare (www.malshare.com) CleanMX (support.clean-mx.de/clean-mx/ viruses)

Step 1: Extract metadata from binary

Development Steps Source Core “backbone” sourcecode

Machine

Binary

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Time

Runtime libraries

Paths

MAC Address

Malware

Packing

Development Steps Source Core “backbone” sourcecode

Machine

Binary

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Time

Runtime libraries

Paths

MAC Address

Malware

Packing

Development Steps Source Core “backbone” sourcecode

Machine

Binary

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Time

Runtime libraries

Paths

MAC Address

Malware

Packing

Step 1: Extract metadata from binary

• • • • •

Hashes (for sample identification)



md5, sha1, sha256, sha512, ssdeep etc.

File type / Exif / PEiD



Compiler / Packer etc.

PE Headers / Imports / Exports etc. Virustotal results Tags

Identifying compiler / packer • PEiD

• Python • peutils.SignatureDatabase().match_all()

PE Header information

VirusTotal Results

Tags • User-supplied tags to identify sample source and behavior

• analyst / analyst-system supplied

Step 2: Store metadata and binary in MongoDB

Components • •

Modified VXCage server



Stores malware & metadata in MongoDB instead of FS / ORDBMS

Collects a lot more metadata then the original

VXCage REST API • • •

/malware/add



Add sample

/malware/get/



Download sample. If no local sample, search other repos

/malware/find



Search for sample by md5, sha256, ssdeep, tag, date

• /tags/list •

List tags

Step 3: Analyze collected data

Identifying development environments • Compiler / Linker / Libraries • Strings • Paths • PE Translation header • Compile times • Number of times a software been built

Cataloging behaviors • Packers • Encryption • Anti-debugging • Anti-VM • Anti-forensics

Result

Have I seen you before?

• Detects similar malware (based on SSDEEP fuzzy hashing)

Different MD5, 100% SSDeep match

SSDEEP Analysis

(3007)

SSDEEP Analysis

(3007)

SSDEEP Analysis

(851)

Challanges • Party handshake problem: • 707k samples analyzed and counting

(resulting in over 250 billion compares!)

• Need a better target (pre-)selection

What compilers / packers are common? 1. "Borland Delphi 3.0 (???)", 54298 2. "Microsoft Visual C++ v6.0", 33364 3. "Microsoft Visual C++ 8", 28005 4. "Microsoft Visual Basic v5.0 - v6.0", 26573 5. "UPX v0.80 - v0.84", 22353

Are there any unidentified packers? • How to identify a packer • PE Section is empty in binary, is writable and executable

How common are antidebugging techniques? • 31622 out of 531182 PE binaries uses IsDebuggerPresent (6 %)

• Packed executable uncounted

Analysis Coverage Source Core “backbone” sourcecode

Machine

Binary

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Time

Runtime libraries

Paths

MAC Address

Malware

Packing

Future

What am I trying to do in the future Blacklists

Binary

Net Recon Command and Control

Developer Fingerprints

Tactics Techniques Procedures

Social Cyberspace DIGINT

Physical Surveillance HUMINT

Human

Expand scope of analysis +network +memory +os changes +behavior

What am I trying to do in the future • More automation • More modular design • Solve the “Big Data” issue I am getting myself into (Hadoop?)

• More pretty graphs

Thank you • Michael Boman • [email protected] • @mboman • http://blog.michaelboman.org • Code available at https://github.com/ mboman/vxcage