performance evaluation of firewalls in gigabit-networks - CiteSeerX

PERFORMANCE EVALUATION OF FIREWALLS IN GIGABIT-NETWORKS Rainer Funke, Andreas Grote, Hans-Ulrich Heiss Department of Computer Science University of Paderborn 33095 Paderborn, Germany e-mail: {rainer, grote, heiss}@upb.de

Keywords: High speed networks, packet filtering, firewall, performance evaluation

ABSTRACT: With increasing use of the internet and the services provided on top of it, such as WWW or Electronic Commerce, there is a growing demand for larger bandwidth and also for improved security. Both goals are in conflict since network security partly relies on screening data traffic, which implies a considerable overhead and may slow down throughput. The paper presents the results of a measurement study of packet screen performance in a Gigabit environment.

1 INTRODUCTION Firewalls are widely used tools to increase the security of computer networks. By definition, a firewall separates two partitions of a network by controlling the traffic crossing the border. Usually they are used to secure the intranet of an organization against attacks from the public Internet, but they are also employed to separate subnets of an intranet against each other if there are different security requirements. There are basically two types of firewalls: packet screens and proxy servers. A packet screen is a firewall controlling each packet of inbound and outbound traffic, analyzing the header and deciding whether to pass the packet or to reject it. They do not look into the payload of the packet, so there is no possibility to control the contents of the packets, which depends on the application level protocol, such as http, ftp or smtp. To control traffic for specific applications, an application level gateway or proxy server has to be used. A proxy server does not allow direct connections from an inside client to an outside server or vice versa. Instead, the inside client connects to the proxy and the proxy acts on behalf of the client. To the outside server, the proxy represents the client, and to the client, the proxy represents the server. By this mediation, the proxy can perform arbitrary control specific for the corresponding type of application. Both types of firewalls can be combined.

In the following, we concentrate on packet filtering at the IP-level, since this is usually a built-in-feature in today’s network components such as routers. If the organization has exactly one link to the Internet then it is reasonable to secure this link by a packet-screen firewall. This means that all traffic from and to the outside is controlled to some extent. The control is based on filter rules, also called access lists, which - depending on source address, target address, and TCP or UDP port number - permit or deny the transmission of the packet. Since submasks for addresses and comparative operators for port numbers are used, the rules can be defined very flexibly (Fig. 1). Rule Direction Source addr. Target addr. Protocol

Source port Target port

Action

1

In

External

Internal

TCP

>1023

25

Permit

2

Out

Internal

External

TCP

25

>1023

Permit

3

Any

Any

Any

Any

Any

Any

Deny

Figure 1: Example for filter rules

However, the more selective and sophisticated the policy, the longer are the lists of the rules that are examined in a sequential, top-down way for each packet. With increasing length of the list the overhead for packet screening increases and may eventually slow down the packet throughput. As long as packet screening is used for slow Internet links with (2Mbit/sec) or within LANs with standard Ethernet (10Mbit/sec) most commercial firewalls can cope with the traffic. Due to the widespread usage of Internetbased services and its multimedia contents, link speeds for Internet access of larger organizations are significantly higher. German Universities, for instance, are currently using 34Mbit/sec or even 155Mbit/sec up-links to the German research network B-WIN, and a few Gigabit lines have been set up recently. A previous study [Ellermann and Benecke 1998] showed that when using a workstation for packet screening on a 155MBit/sec ATM link, there is a clear performance degradation for small and medium size packets. However, it is still an open question, if firewalls using dedicated and up-to-date network devices can operate at the Gigabit level or if bandwidth has to be sacrificed for security. To address those questions, we conducted a

performance study. The background of our study is an intended closer cooperation between two universities 40 km away from each other and connected by a Gigabit link. Data traffic is a mix of usual Internet protocols (http, ftp, smtp), but also high performance metacomputing applications and video streaming. The latter results from a close cooperation of the audio-visual media centers of the two universities with shared access to digitized media.

2 TEST ENVIRONMENT 2.1 LOAD GENERATION

Switching (MLS). Without going into detail, it consists of two components, a route switch module (RSM) performing the usual IP routing and also the packet screening. The other component, the Switching Engine (SE) contains a cache for accepted flows. A flow is a unidirected sequence of packets between a particular source and destination that share the same protocol and transport-layer information. If there are two connections between the same client and server, one consisting of HTTP packets the other of FTP packets they establish two different flows. Whenever a flow is established by a first packet, an entry in the MLS-cache of the SE is created. All following packets that belong to the same flow are directly switched inside the switch without going to the

To simulate internet traffic that is Route Switch Module large enough to saturate a Gigabit connection, we need a lot of hardware equipment. To generate the required load, we use a cluster computer (Siemens hpcLine) consisting of 96 dual 1. packet 1. packet processor Pentium II systems with Si 450MHz clock speed and 512 MB follow-up packets follow-up packets Switching memory under Solaris. The machine is Engine with usually used for parallel computing and MLS cache to that end connected by SCI-technology Figure 2: Multilayer switching (Scalable Coherent Interface) in a 2Dtorus, but it has in addition Fast Ethernet router (Fig. 2). Since the routing processor can be configured links that can be connected to network devices. Since all as a packet filter by defining an access control list, only the nodes are located in one large cabinet, the set-up of the first packet needs to be examined with regard to the filter different configurations was relatively comfortable. rules. All following packets of that flow are of the same type and can be switched without further access checks based on 2.2 MEASUREMENT TOOL the cache entry. It should be noted, however, that packet filtering assisted by MLS has also some disadvantages: No The measurements have been done using the NetPerf logging is possible and filter rules can only be used as output program (Version 2.1) from Hewlett Packard. NetPerf is access lists which in some cases means a serious drawback. based on the client-server model and consists of two different programs, netperf and net server, the former to represent the client the latter simulate the server. When starting netperf, the first thing that will happen is the establishment of a control connection to the remote system. This connection will be used to pass test configuration information and results to and from the remote system. Regardless of the type of test being run, the control connection will be a TCP connection using BSD sockets. Once the control connection is up and the configuration information has been passed, a separate connection will be opened for the measurement itself using the APIs and protocols appropriate for the test. The test will be performed, and the results will be displayed. 2.3 THE PACKET FILTERING HARDWARE As packet filter we used a Cisco 5505 with Net Flow Feature Card which provides the so-called Multi-Layer

The Catalyst 5505 switch provides 5 slots for different cards, e.g. line cards with 12 or 24 FastEthernet ports each. The slots are connected to a backplane consisting of 3 busses with 1.2 Gbit/s each. The busses are connected to each other by crossbars for load sharing purposes summing up to a combined maximum backplane bandwidth of 3.6 Gbit/s. Bus arbitration takes place locally in each line card and centrally between the line cards.

3 EXPERIMENTS 3.1 SWITCH THROUGHPUT Our first experiment was aiming at the internal performance to figure out if the 3.6 Gbit/s can be achieved and how bus arbitration and load balancing are influencing the performance. We connected up to 96 nodes to the FastEthernet ports of the switch (Figure 3).

3.2 PACKET FILTERING THROUGHPUT up to 96 Dual Pentium II (450 MHz)

Since we were interested in both throughput and latency, we performed two different tests:

Si

:

Cisco 5505 LAN

Figure 3: Configuration for measurement of internal switch throughput

Figure 4 shows the bandwidth within the switch: a box represents a line card with 24 FastEthernet ports. 12 nodes were sending 12 were receiving data with maximum data rate. The arrow indicates the data flow an the number gives the data rate measured in Mbit/s. In test (a) only one line card was busy having the entire backplane bandwidth at its disposal. We measured more than 900 Mbit/s which is close to bandwidth of one the three busses. If we allow for some protocol overhead we got roughly the nominal bandwidth. Similar results could be measured if we do the same for two or three line cards simultaneously (b). Each line card gets the bandwidth of one of the busses. However, if four line cards are busy (c) then there is no equal sharing of the total bandwidth, but two of them get two thirds and the other two share one third. 12

936

12

12

910

12

12

458

12

12

910

12

12

458

12

12

910

12

12

917

12

12

917

12

(a)

(b)

Figure 4: Internal throughput across backplane busses: Traffic within the line cards

When traffic goes between the line cards as shown in figure 5, then we also see that the total bandwidth per line card is limited to the bandwidth of one of the busses. In none 12

24

12

12

16 475

934

960

8 640 320

12

12

16

12

16

8

475 (a)

12

12 934

12 (b)

625

475 12

8 313

16 (c)

TCP-Stream (TCP-S): This test is suitable to measure bulk data transfer performance. A unidirectional stream of data is being sent at maximum speed. It shows how fast a node can send or receive data.

•

TCP-Connect-Request-Response (TCP-CRR): To get performance data for round trip latencies, the TCP-RR test protocol can be used. When receiving a request packet, the server acknowledges the receipt by sending a reply packet. The client waits for this confirmation and then sends the next request. Such a request/response pair is called a transaction and NetPerf measures the transaction rate, which can be used to calculate the round trip time. The CRR test is a modification of the RR test. For each transaction, a new TCP connection is set up for a port number chosen from a given interval. In addition to the request/response packet pair all packets necessary to set up the TCP-connection are included. The CRR test is especially suited to model HTTP-type data traffic.

The test environment for the packet filtering performance is shown in figure 6. Since the switch had to have routing functionality, one slot was occupied by a route switch module (RSM), leaving three slots for line cards with 24 FE ports each. In total, up to 72 nodes could be attached.

(c)

24

•

8 (d)

of these four configurations, the maximum bandwidth of the backplane could be approached. Figure 5: Internal throughput across backplane busses: Traffic between the line cards

up to 72 Dual Pentium II (450 MHz)

Si

:

Cisco 5505 LAN Switch with RSM Figure 6: Test configuration for packet filtering

The first packet filtering test was aiming at the possible throughput for bulk data with large packet sizes. First we switched packet filtering off, and used the 3.6GBit/s backplane to simulate a Gigabit connection. Figure 7 shows the performance of layer 2 switching (no routing required) compared to that of layer 3 switching (with routing assisted by MLS). Half of the nodes were sending, the other half receiving. The packet size was 64 Kbyte. With 72 nodes, the throughput was close to 2 Gbit/s and the MLS caching was efficient enough that layer 3 switching achieved almost the same throughput. With MLS caching switched off, the

throughput was constantly 145 Mbit/s independent of the number of nodes.

characteristic. The CRR test as described above tries to build up a large number of TCP connection at a high rate. Request and response packets were 1 byte long and latency is defined as round trip latency.

1900,00 1700,00 Throughput [Mbit/s]

0,45

1500,00

0,4 Latency [ms]

1300,00 1100,00 Layer 2

900,00 MLS

0,35 0,3 RSM

0,25

700,00

MLS

16

24 48 Number of nodes

72

4

Figure 7: Throughput [Mbit/s] with maximum packet size

Packet filtering had only little impact on the data rate as can be seen from figure 8. The measurement was taken using 24 nodes sending and receiving, respectively, large packets. The filter rules were chosen such that the entire list had to examined for each packet. Without MLS caching (RSM), the data rate is 145 MBit/s and drops with increasing length of the access control list as could be expected. With caching on (MLS), however, the packet filtering had no impact on the throughput.

24

30

32

48

60

72

Number of nodes Figure 9: Latency [ms] as a function of the load

Again, the caching mechanism helped to keep the latency low independent of the number of nodes. The same can be seen from figure 10 where the latency is shown as a function of the length of the access control list. While the normal routing and packet filtering (RSM) had considerable performance degradation, the caching variant (MLS) could handle even twice as much nodes without problems.

10

1000 900 800 700 600 500 400 300 200 100 0

9

MLS

8 Latency [ms]

Throughput [Mbit/s]

0,2

7 6 5 4

RSM 24 nodes

3 2

RSM 0

128

254

506

1

2018

Number of filter rules (length of access control list)

Figure 8: Throughput as a function of the filter rule size

MLS 48 nodes

0 0

4

8

16

32

64

128 254 506 1010 2018

Number of filter rules (length of access control list)

Figure 10: Latency [ms] with and without cache

3.3 PACKET FILTERING LATENCY Throughput is only one aspect and for many internet applications small packets and high interaction are

3.4 CACHE EFFICIENCY To evaluate the efficiency of the MLS cache, each node set up a large number of short connections to establish a huge number of flows with bad locality for the cache. This could be done using the CRR test of NetPerf, where whenever a new TCP connection is opened, a new port number from a large interval is used. In parallel to these short living connections there were long living TCPconnections between single pairs with continuous data flow. With 24 nodes and 1500 different ports per node the throughput started to drop significantly indicating that some active flows could not be found in the cache. This measurement confirms a note in the documention of the switch that the number of entries should be no larger than 32K. To achieve this behavior, the aging of entries can be influenced by a parameter. By default, an entry ages out after 256 seconds. Another way to control the cache is the fastaging, which means that an entry ages out when the corresponding flow has had no more than a specific number of packets within fastagingtime seconds since the creation a packet. As can be seen in figure 11, the fastaging mechanism prevents the long-living TCP-connections from being removed from the cache.

Cisco 5505 as firewall

Cisco 5505 as firewall 1Gbit/s

:

Si

1Gbit/s

100Mbit/sec

:

Si

100Mbit/sec

Figure 12: Configuration for gigabit trunk test

consisted of a complete TCP connect/disconnect cycle with one request/response (1 byte payload). The performance was unaffected by any packet filtering due to MLS-caching. throughput [trans/s] 18.200,0

17.800,0

17.400,0

900,0

17.000,0

800,0

no trunk

700,0

1 trunk ISL

2 trunks ISL

Figure 13: Gigabit trunks connecting two routers

600,0 500,0

4 CONCLUSION

400,0

TCP fastaging

300,0

CRR fastaging

200,0

TCP no fastaging

100,0

1 trunk dot1q

The tests showed that packet filtering, which represents a considerable contribution to network security, can actually be performed almost at wire speed even for gigabit links. However, the MLS that provides this efficiency has some limitations that may not be tolerable in all situations.

CRR no fastaging

0,00 800

1500

3000

Number ofl ports per node Figure 11: Effect of replacement of cache entries

3.5 TWO ROUTERS WITH GIGABIT TRUNKS Finally, we connected two routers with one and two gigabit trunks and performed TCP-stream test and CRRtests. 24 nodes at either side were communicating pairwise with each other. To operate the gigabit trunks we used the proprietary Cisco protocol ISL and the IEEE 802.1Q (dot1q). For both protocols, a TCP stream with 900 Mbit/s could be measured. The transaction rate (Figure 13) showed a small drop compared to a local connection over the 3.6GBit/s backplane and was the same for one and two trunks operating. We have no explanation why the second trunk did not result in an improvement. A transaction

REFERENCES Ellermann,U. and Benecke, C. 1998. “Firewalls for ATM Networks.” In Proc. INFOSEC’COM 98’, (Paris, France, June 4-5. 1998 Cheswick, W.R. and Bellovin,S.M. 1994. Firewalls and Internet Security: Repelling the wily Hacker. Addison Wesley, Reading, Mass,. Cisco Systems Inc. 1998 Catalyst 5000 Series Switching System. Technical Documentation. Cisco Systems Inc. San Jose, CA

performance evaluation of firewalls in gigabit-networks - CiteSeerX

performance evaluation of firewalls in gigabit-networks - CiteSeerX

Suggest Documents

Performance Evaluation Model for Application Layer Firewalls

Performance evaluation of eigensolvers in nanostructure ... - CiteSeerX

Firewalls

evaluation of tutor performance - CiteSeerX

Performance Evaluation using Various Models in ... - CiteSeerX

Performance evaluation in research and development ... - CiteSeerX

Performance evaluation in computational grid environments - CiteSeerX

Multidimensional SME Performance Evaluation - CiteSeerX

Performance Evaluation of Partial Deployment of ... - CiteSeerX

Performance Evaluation of Partial Deployment of ... - CiteSeerX

Modeling and Performance Evaluation - CiteSeerX

Performance Evaluation of Scheduling Control of ... - CiteSeerX

Discovery of Policy Anomalies in Distributed Firewalls - CiteSeerX

Issues in Performance Evaluation of New TCP Stacks in ... - CiteSeerX

Performance Evaluation of Control Networks: Ethernet ... - CiteSeerX

Evaluation of High Performance Fortran through ... - CiteSeerX

Performance Evaluation of a Cooperative Manipulation ... - CiteSeerX

Performance evaluation of self-reconfigurable service ... - CiteSeerX

Performance Evaluation and Comparison of Westwood+ ... - CiteSeerX

Professional Evaluation Of Milking Machine Performance - CiteSeerX

Evaluation of Green Buildings' Overall Performance ... - CiteSeerX

Performance Evaluation of JXTA Communication Layers ... - CiteSeerX

Application Performance Evaluation of the HTMT ... - CiteSeerX

Performance Evaluation of View-Oriented Parallel ... - CiteSeerX