Benchmarking Vulnerability Detection Tools for Web Services
ICWS 2010
Nuno Antunes, Marco Vieira {nmsa, mvieira}@dei.uc.pt
CISUC Department of Informatics Engineering University of Coimbra, Portugal
Outline
The problem
Benchmarking Approach
Benchmark for SQL Injection vulnerability detection tools
Benchmarking Example
Conclusions and Future Work
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
2
Web Services
Web Services are becoming a strategic component in a wide range of organizations
Web Services are extremely exposed to attacks Any existing vulnerability will most probably be uncovered/exploited Hackers are moving their focus to applications’ code
Both providers and consumers need to assess services’ security
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
3
Common vulnerabilities in Web Services
300 Public Web Services analyzed
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
4
Vulnerability detection tools
Vulnerability Scanners Easy and widely-used way to test applications searching vulnerabilities Use fuzzing techniques to attack applications Avoid the repetitive and tedious task of doing hundreds or even thousands of tests by hand
Static Code Analyzers Analyze the code without actually executing it The analysis varies depending on the tool sophistication Provide a way for highlighting possible coding errors
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
5
Using vulnerability detection tools…
Tools are often expensive
Many tools can generate conflicting results
Due to time constraints or resource limitations Developers have to select a tool from the set of tools available Rely on that tool to detect vulnerabilities
However…
Previous work shows that the effectiveness of many of these tools is low
How to select the tools to use?
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
6
How to select the tools to use?
Existing evaluations have limited value By the limited number of tools used By the representativeness of the experiments
Developers urge a practical way to compare alternative tools concerning their ability to detect vulnerabilities
The solution: Benchmarking!
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
7
Benchmarking vulnerability detection tools
Benchmarks are standard approaches to evaluate and compare different systems
according to specific characteristics
Evaluate and compare the existing tools
Select the most effective tools
Guide the improvement of methodologies
Nuno Antunes
As performance benchmarks have contributed to improve performance of systems ICWS 2010, July 05-10, Miami, Florida, USA
8
Benchmarking Approach
Workload:
Work that a tool must perform during the benchmark execution
Measures: Characterize the effectiveness of the tools Must be easy to understand Must allow the comparison among different tools
Procedure:
Nuno Antunes
The procedures and rules that must be followed during the benchmark execution ICWS 2010, July 05-10, Miami, Florida, USA
9
Workload
Services to exercise the Vuln. Detection Tools
Domain defined by: Class of web services (e.g., SOAP, REST) Types of vulnerabilities (e.g., SQL Injection, XPath Injection, file execution) Vulnerability detection approaches (e.g., penetrationtesting, static analysis, anomaly detection)
Different types of workload can be considered: Real workloads Realistic workloads Synthetic workloads
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
10
Measures Computed
from the information collected during the benchmark run
Relative
measures
Can
be used for comparison or for improvement and tuning
Different
tools report vulnerabilities in different ways Precision
Recall Nuno Antunes
F-Measure ICWS 2010, July 05-10, Miami, Florida, USA
11
Procedure
Step 1: Preparation
Step 2: Execution
Use the tools under benchmarking to detect vulnerabilities in the workload
Step 3: Measures calculation
Select the tools to be benchmarked
Analyze the vulnerabilities reported by the tools and calculate the measures.
Step 4: Ranking and selection Rank the tools using the measures Select the most effective tool
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
12
A Benchmark for SQL Injection V. D. tools
This benchmark targets the domain: Class of web services: SOAP web services Type of vulnerabilities: SQL Injection Vulnerability detection approaches: penetration-testing, static code analysis, and runtime anomaly detection
Workload composed by code from standard benchmarks: TPC-App TPC-W* TPC-C*
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
13
Workload Benchmark TPC-App
TPC-C
TPC-W
Nuno Antunes
Service Name ProductDetail NewProducts NewCustomer ChangePaymentMethod Delivery NewOrder OrderStatus Payment StockLevel AdminUpdate CreateNewCustomer CreateShoppingCart DoAuthorSearch DoSubjectSearch DoTitleSearch GetBestSellers GetCustomer GetMostRecentOrder GetNewProducts GetPassword GetUsername Total
Vuln. Inputs 0 15 1 2 2 3 4 6 2 2 11 0 1 1 1 1 1 1 1 1 0 56
Vuln. Queries 0 1 4 1 7 5 5 11 2 1 4 0 1 1 1 1 1 1 1 1 0 49
ICWS 2010, July 05-10, Miami, Florida, USA
LOC 121 103 205 99 227 331 209 327 80 81 163 207 44 45 45 62 46 129 50 40 40 2654
Avg. C. Comp. 5 4.5 5.6 5 21 33 13 25 4 5 3 2.67 3 3 3 3 4 6 3 2 2 14
Enhancing the workload
To create a more realistic workload we created new versions of the services
This way, for each web service we have: one version without known vulnerabilities one version with N vulnerabilities N versions with one vulnerable SQL query each
This accounts for:
Services + Versions 80
Nuno Antunes
Vuln. Inputs 158
ICWS 2010, July 05-10, Miami, Florida, USA
Vuln. lines 87
15
Step 1: Preparation
The tools under benchmarking Provider
Tool
HP
WebInspect
IBM
Rational AppScan
Acunetix
Web Vulnerability Scanner
Univ. Coimbra
VS.WS
Univ. Maryland
FindBugs
SourceForge
Yasca
JetBrains
IntelliJ IDEA
Univ. Coimbra
CIVS-WS
Technique
Penetration testing
Static code analysis Anomaly detection
Vulnerability Scanners: VS1, VS2, VS3, VS4
Static code analyzers: SA1, SA2, SA3
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
16
Step 2: Execution
Results for Penetration Testing
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
Tool
% TP
% FP
VS1
32.28%
54.46%
VS2
24.05%
61.22%
VS3
1.9%
0%
VS4
24.05%
43.28% 17
Step 2: Execution
Results for Static Code Analysis and Anomaly Detection
Tool
% TP
% FP
CIVS
79.31%
0%
SA1
55.17%
7.69%
SA2
100%
36.03%
SA3
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
14.94% 67.50% 18
Step 3: Measures calculation
Benchmarking results
Nuno Antunes
Tool
F-Measure
Precision
Recall
CIVS-WS
0.885
1
0.793
SA1
0.691
0.923
0.552
SA2
0.780
0.640
1
SA3
0.204
0.325
0.149
VS1
0.378
0.455
0.323
VS2
0.297
0.388
0.241
VS3
0.037
1
0.019
VS4
0.338
0.567
0.241
ICWS 2010, July 05-10, Miami, Florida, USA
19
Step 4: Ranking and selection
Rank the tools using the measures
Select the most effective tool
Inputs
Queries
Nuno Antunes
Criteria
1st
2nd
3rd
4th
F-Measure
VS1
VS4
VS2
VS3
Precision
VS3
VS4
VS1
VS2
Recall
VS1
F-Measure
CIVS
SA2
SA1
SA3
Precision
CIVS
SA1
SA2
SA3
Recall
SA2
CIVS
SA1
SA3
VS2/VS4
ICWS 2010, July 05-10, Miami, Florida, USA
VS3
20
Benchmark properties
Portability
Non-intrusiveness Simple
to use
Repeatability
Representativeness Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
21
Conclusions and future work
We proposed an approach to benchmark the effectiveness of V. D. tools in web services
A concrete benchmark was implemented Targeting tools able to detect SQL Injection A benchmarking example was conducted
Results show that the benchmark can be used to assess and compare different tools Future work includes: Extend the benchmark to other types of vulnerabilities Apply the benchmarking approach to define benchmarks for other types of web services
Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
22
Questions?
Nuno Antunes Center for Informatics and Systems University of Coimbra
[email protected] Nuno Antunes
ICWS 2010, July 05-10, Miami, Florida, USA
23
Benchmark Representativeness
Influenced by the representativeness of the workload
May not be representative of all the SQL Inj. patterns
However, what is important is to compare tools in a relative manner
To verify this we replaced the workload by a real workload
Nuno Antunes
Constituted by a small set of third-party WS
ICWS 2010, July 05-10, Miami, Florida, USA
24
Benchmark Representativeness
Inputs
Queries Nuno Antunes
Criteria F-Measure Precision Recall F-Measure Precision Recall
1st 2nd VS1 VS4 VS3/VS4 VS1 VS4 CIVS SA2 CIVS SA1 SA2/CIVS
ICWS 2010, July 05-10, Miami, Florida, USA
3rd VS2 VS2 VS2 SA1 SA2 SA1
4th VS3 VS1 VS3 SA3 SA3 SA3 25