... a big traffic must be. Advances in E-Activities, Information Security and Privacy .... critical path, but the policy used to manage memory has an important impact ...
Advances in E-Activities, Information Security and Privacy
On Distributed Intrusion Detection Systems Design for High Speed Networks OUISSEM BEN FREDJ, HASSEN SALLAY, ADEL AMMAR, MOHSEN ROUACHED, KHALED AL-SHALFAN, MAJDI BEN SAAD Department of Computer Science College of Computer and Information Sciences Al-Imam Mohamad Ibn Saud University SAUDI ARABIA
{ obenfredj, hassen_sallay, aammar, mrouached, kshalfan, majdi.bensaad }@amansystem.com; http://www.amansystem.com/people/ Abstract: This article states the need of High Performance Computing (HPC) for Distributed Intrusion Detection Systems (DIDS) and discusses the design requirements of the system. Since high-speed networks are the performance key in HPC, the article studies the mapping of the different requirements over the software and hardware features of high-speed networks. The study has resulted in several recommendation for the design of IDS over HSN, starting from the communication protocol and the programming model that should be adopted, to the way the system should handle the communication flow, the memory management and the data transfer between IDS sensors. Key-Words: - DIDS, Architecture, Communication layer, HPC, High Speed Network, RDMA
1
second point is connected to the server switch immediately after the internal firewall. The NIDS analyzes the traffic which flows between the internal servers and the hosts present in the intranet. The third point is connected to the client switch immediately after the internal router. The NIDS of this point inspects the traffic which flows between any two hosts present in the intranet. When several NIDS exist in the network, each one will be called sensor. The sensors may work independently from each other or collaborate. In the last case, all the three network intrusion detection systems may pass the intrusion data to a central NIDS management system where the intrusion data can be analyzed. The number of NIDS may grow if the organization is composed by several distributed branches or subnets. Each branch may encompasses one or several NIDS type in one or more locations. There are three types of a distributed NIDS. The first type is a collaborative IDS where several NIDS dispatched in the network and they exchange information to deal with distributed attacks and to maintain a global security map [8,10,11]. This type require a communication optimization through a regular network technology such Ethernet or Giga-Ethernet. The second type is a powerful version of a centralized NIDS. This type is needed when a big traffic must be
Introduction
For a long time, NIDS have had a centralized architecture where the server is localized behind the internet firewall or the internet proxy. However, the introduction of high-speed network, the growth of the internet-dependent applications and the increasing number of attack types, they made the IDS a bottleneck of the network. Since the end of the last century, several works have tried to work around the centralized architecture. Thus, several distributed NIDS (DNIDS) architectures has been introduced [4,5,6,7,8,9,10,11], sophisticated, or faded out. Each of these works has its advantages and its drawbacks. However, all the DNIDS architectures are based on a high-performance computing (HPC) components. In this paper we will study the network architectures and the communication layers of recent and current HPC architectures used or that they could be used in the design of a DNIDS.
2
The DNIDS design requirements
Common DNIDS systems are located in one of more of three specific points in the network [4]. The first point is connected to the spanning port of the DMZ switch Spanning port where the entire traffic that passes through the switch gets replicated. In this point, all the traffic that flows through the DMZ switch would be analyzed. The
ISBN: 978-960-474-258-5
115
Advances in E-Activities, Information Security and Privacy
deeply analyzed and a single server could not deal with the whole traffic. The need of this type increases when the NIDS work in active mode and must decide an action for each packet. The architecture of such a NIDS requires high-computing systems, parallel communications and/or distributed communications over a high-speed network [9,12,13,14,15,16]. The idea behind that is to be able to pipeline or parallelize regular NIDS actions. The third DNIDS type is hybrid between the two previous types. This kind is implemented in a very large organizations. This paper assumes that DNIDS is composed by a set of sensors, each sensor is a server. The sensors might communicate between each other using a peer-to-peer, a broadcast, or a multicast paradigm. They may work in parallel or in independent mode. Using the previous assumptions, the DNIDS has several requirements:
developers, the IDS should offer a simple user-interface and an intuitive administration panel and a statistic results. Next sections analyzes the different HPC architecture from both the hardware and the software point of views, in order to depict the optimal architecture and recommendations that guarantees the above requirements.
3
One way to compare communication models is to classify them according to the sender-receiver synchronization mechanism required to perform data exchanges. There are three synchronization modes: the full synchronization mode, the rendez-vous mode, and the asynchronous mode. With the full synchronization mode, the sender has to ensure that the receiver is ready to receive incoming data. This means that a flow control is required. A credit scheme may be used in order to implement flow control; Before a host sends a packet, it checks for credits regarding the receiver; a credit represents a packet buffer in the receiver’s memory. Credits can be handed out in advance by pre-allocating buffers for specific senders, but if a sender runs out of credits it must block until the receiver sends new credits. The rendez-vous mode discharge the duty of flow control to the application. For example, VIA [3], VAPI, and MX all require that a receive request is posted before the message enters the destination node. Otherwise, the message is dropped and/or NACKed. Another solution is to use a transfer redirection that consists in pre-allocating a default redirectable receive buffer for short messages whenever a sender does not know the final receive buffer address. Later, when the receiver posts the receive buffer, data will be copied from the default buffer to the receive buffer. The asynchronous mode breaks all synchronization constraints between senders and receivers. The completion of the send operation does not require the intervention of the receiver process to take a complementary action. This mode allows to overlap computations and communications, zero-copy without synchronization, deadlock avoidance, and an efficient use of the network (since messages do not block on switches waiting for the receive operation). As a consequence, the asynchronous mode provides a high throughput and a low latency, in addition to flexibility (since the synchronized mode can be implemented using the asynchronous mode). As discussed previously, each synchronization mode has advantages and drawbacks. However, the asynchronous mode guarantees a high throughput and a low latency. The asynchronous mode is suited to DNIDS applications since the sensors works independently and may communicate with each other and with sensor manager occasionally. Henceforth, the asynchronous mode should
Real-time: the IDS must handle all the traffic of the link it supervises. This is a major requirement. It becomes hard to establish when the IDS runs in active mode and when a big attack is detected. In the last case, the IDS may take much time processing the attack analysis and actions. Thus, it become a bottleneck. The architecture must allow real-time properties and actions. portability: especially if the NIDS includes a client agents. The agents should be able to run a wide variety of architecture. It should support different host hardware component types and different network interconnects. Transparency: the portability requirement may lead to manage several software modules. This management should be done transparently to the final user (the NIDS administrator). flexibility: the flexibility aims at making the DNIDS easy to modify when adding other sensors, other software module, or other hardware support. Security: the IDS is a critic component of the network. An attack of the IDS makes the whole organization into disaster, in particular if the IDS runs in active mode. Adaptive: The IDS must adapt itself to handle the traffic when an attack is detected. The IDS architecture should help adaptive action such creating other cooperative processes, loading other modules, or migrating process. performance: the transparency and the portability issues are known to decrease the performance of the system. A trade-off between portability and performance must be studied. Since an IDS has real-time requirement, it should stress the performance issue even if the portability would be decreased. Scalability: the architecture must be scalable. It will be easy to add another sensor or to increase the traffic which must be handled. This is a key point of the success of a large scale IDS. User interface: since an IDS are intended to be used by system administrators and regular users and not a software
ISBN: 978-960-474-258-5
The Communication Protocol
116
Advances in E-Activities, Information Security and Privacy
be adopted as the communication model for a DNIDS. The remaining of the report focuses on one-sided communication protocol which is suited for the asynchronous mode. RWAPI [2] is based on the asynchronous mode and the one-sided communication protocol (Remote-Write).
4
to be able to translate the whole memory and it creates a small cache table in the NIC memory. The cache table contains a virtual-to-physical translation of the most used pages. To avoid page swapping, allocated buffers have to be locked. Another solution consists in splitting the message to send into several smaller messages which size is less than the size of a page. Yet another solution consists in managing physical addresses directly (introduced in [2]), without operating system intervention. The idea is to allocate a contiguous physical buffer and to map it into the contiguous virtual address space of the user process. Thus, just one translation is needed. Its most important advantage is the avoidance of scatter/gather operations at transfer time. FreeBSD provides a kernel function that allows to allocate contiguous physical buffers. Linux does not provide such a mechanism. However, there are two methods to allocate contiguous physical memory. The first one is to change the kernel policy by changing the source code of Linux. The second one consists in allocating memory at boot-time. A driver maps the whole contiguous physical memory into a contiguous virtual area. Then, a function is used to search for a contiguous memory area that fits the requested size in the set of free spaces. Note that this function can be executed in user space without any call to the operating system. Memory allocation is not a step of the communication’s critical path, but the policy used to manage memory has an important impact on data transfers. Our goal is to reduce the time spent in the virtual-to-physical translation by using contiguous physical memory allocations. Note that VAPI, for example, offers a limited contiguous physical memory allocation for a small and medium chuck of memory. If the required allocations are very big, the DNIDS must manage its own physical memory as introduced above which may require much development time, or the DNIDS may be based on virtual-to-physical memory page translations which is less efficient but does not involve kernel patches.
The Programming Model
There are many programming models that use the asynchronous mode. The best suited for the RDMA is the one-sided model. It means that the completion of a send (resp. receive) operation does not require the intervention of the receiver (resp. sender) process to take a complementary action. RDMA is usually used to copy data to (from) the remote user space directly. Suppose that the receiver process has allocated a buffer to room incoming data and the sender has allocated a send buffer. Prior to the data transfer, the receiver must have sent its buffer address to the sender. Once the sender owns the destination address, it initiates a direct-deposit data sending. This task does not interfere with the receiver process. On the receiver side, it keeps on doing computation tasks, testing if new messages have arrived, or blocking until an incoming message event arises. The one-sided programming model is simple, flexible and can be used as a high-level interface, or as a middleware between a high-level library such as MPI and the network level. The one-sided scheme can be achieved either by using one-sided reads or one-sided writes. The remote read requires at least two messages, the first to inform the remote DMA engine (the remote network interface, the remote OS, or the remote process) about the requesting data and the second to send the data effectively. The remote write operation requires one message. Hence, the remote write is simpler and more flexible.
5
Memory management
Memory allocation precedes any data transfers. It consists in reserving memory areas to store data to send and/or to receive. The way allocated areas are managed influences the performance. Since DMA should be used to transfer data, its main constraint is that physical addresses are required rather than virtual ones. Therefore, transfer routines must provide physical addresses of all pages of a message to the DMA engine. This is a tricky task because a contiguous message in the virtual address space is not necessarily contiguous in physical memory. A virtual-to-physical translation table built at allocation time can be used. Later, at the send (resp. receive) step, the translation table is used to gather (resp. scatter) data from (resp. to) memory to (resp. from) the network interface. GM2 [1] and MX add some optimizations to use the translation table: it stores the table in the user’s memory ISBN: 978-960-474-258-5
6
Host memory - NI data transfer
According to the one-sided scheme, the NI must communicate with the host memory in three cases. The first case is when the user process informs the NI for a new send, i.e. when the user process sets up a send descriptor to be used by the NI to send a message. Both the second and the third cases are when sending and receiving messages. For traditional message-passing systems, the user process must provide a receive descriptor to the NI. There are three methods to communicate between the host memory and the NI: PIO, WC and DMA. With the Programmed IO (PIO), the host processor writes (resp. reads) data to (resp. from) the I/O bus. This method is extremely fast. However, only one or 117
Advances in E-Activities, Information Security and Privacy
two words can be transferred at a time resulting in lots of bus transactions. Throughput is different for writes and reads, mainly because writing across a bus is usually a little bit faster than reading. Write combining (WC) enhances write PIO performance by enabling a write buffer for no-cached writes, so that affected data transfers can occur at cache line size instead of word size. The write combining may be used to communicate the DNIDS sensor states, statistic messages, and notifications. A Direct Memory Access (DMA) engine can transfer entire packets in large bursts and proceed in parallel with host computation. Because a DMA engine works asynchronously, the host memory be it the source or the destination of a DMA transfer must not be swapped out by the operating system. Some communication systems pin buffers before starting a DMA transfer. Moreover, DMA engines must know the physical addresses of the memory pages they access. Choosing the suitable type of data transfer depends on the host CPU, the DMA engine, the transfer direction, and the packet size. A solution consists in classifying messages into three types: small messages, medium messages, and large messages. PIO suits small messages such ACK messages between DNIDS sensors and the manager, write combining (when supported) suits medium messages like DNIDS sensor state, notifications, and alert messages and statistics data. DMA suits large messages like data forwarding between different sensor types. A formal definition of a medium message (and therefore the definition of both short and large messages) changes according to the CPU, the DMA engine, and the transfer direction. Since DMA-related operations (initialization, transfer, virtual-to-physical translation) can be done by the NI or the user process; a set of performance tests is the best way to define medium messages.
7
personalize the send operation (security level, reliability level). The NI uses send descriptors to initiate the send operation. Three steps are required: The first one is the initialization of the send descriptor. This step is a part of the transfer’s critical path if the send is a point-to-point communication. For collective operations (multicast, broadcast...), this step can be done once for multiple send requests. The second step consists in appending the send descriptor to the send queue. The third step of the critical path is the polling performed by the NI on the send queue.
8
Data transfer
In order to avoid bottlenecks and to use available resources efficiently, a data transfer should take into account the message size, the DNIDS architecture, the sensors architecture (processor speed, PCI bus speed, DMA characteristics), the NIC properties (transfer mode, memory size), and the network characteristics (routing policy, route dispersion...). Many studies have tried to measure network traffics to determine the size of messages. They show that small messages are prominent (about 80% of messages have a size smaller than 200 bytes). Small message may correspond to control and notification messages and IDS sensor alerts. Moreover, the one-sided scheme requires an extra use of small messages to send receive buffer addresses. Thus, it is interesting to distinguish between small and large messages. As discussed earlier, the maximum size for small messages should be determined using performance evaluation.
Send and receive queue management
In order to ensure an asynchronous execution of communication routines, two queues shall be used: a send queue and a receive event queue. Unlike synchronous libraries, the asynchronous mode does not need a receive queue to specify receive buffers. Although, it needs a receive event queue which contains a list of incoming events. Queues allow asynchronous operations. In fact, the user process just appends a descriptor to the send queue to send a message. Once the operation is finished, the sender continues with the next send or with the computing task. Receive event queue is used to probe or poll for a new receive event. The send queue contains a set of send descriptors provided by user processes and read by the NI at send time. A send descriptor determines the outgoing buffer address, its size, the receiver node and the receiver buffer address. Additional attributes can be specified to ISBN: 978-960-474-258-5
Fig. 1 Short send with one-sided communication protocol. For the transfer of small messages (see Fig.1), no send buffer address nor receive buffer address are required. Therefore, it is possible to store the content of small messages in the send descriptor. To send such a message, seven operations are performed: (1) the sender sets up the send descriptor (including data); (2) the sender informs the NI about the send descriptor; (3) the NI copies necessary data from the host memory, (4) the NI sends the message to the network; (5) the remote NI receives the message and appends it to the receive event queue; finally, 118
Advances in E-Activities, Information Security and Privacy
(6) the receiver process reads the data from the receive descriptor and (7) informs the NI that the receive has been done successfully. For the transfer of large messages (see Fig.2), the sender has to specify both the send buffer address and the remote buffer address and ten steps are involved : (1) the sender prepares the send buffer; (2) the sender sets up the send descriptor and writes it to the NI memory using PIO; (3) the sender informs the NI about the send descriptor; (4) the NI reads the send descriptor; (5) the NI copies data from the host memory according to the send descriptor; (6) the NI sends the message to the network; (7) the remote NI receives the message and writes incoming data to its final destination; (8) the remote NI appends a receive descriptor to the receive event queue; (9) the receiver process reads the receive descriptor and (10) informs the NI that the receive has been done successfully.
the polling function returns the receive descriptor describing the message. Quantifying the difference between using interrupts and polling is difficult because of the large number of involved parameters: hardware (cache sizes, register windows, network adapters), operating system (interrupt handling), runtime support (thread packages, communication interfaces), and application (polling policy, message arrival rate, communication patterns). First, executing a single poll is typically much cheaper than taking an interrupt, because a poll executes entirely in user space without any context switching. However, comparing the cost of a single poll to the cost of a single interrupt does not provide a sufficient basis for statements about application performance. Each time a poll fails, the user program wastes a few cycles. Thus, coarse-grain parallel computing favors interrupts, while fine-grain parallelism favors polling. For applications containing unprotected critical sections, interrupts lead to nondeterministic bugs, while polling leads to safe run. Moreover, for asynchronous communications, polling can lead to substantial overhead if the frequency of arrivals is low enough that the vast majority of polls fail to find a message. With interrupts, the overhead only occurs when there are arrivals.
10 Conclusion This paper presented several design issues and recommendations for high-speed communication protocols of a DNIDS. First, we classified communication models into three synchronization modes: full synchronization mode, rendez-vous mode and asynchronous mode. The asynchronous mode removes all synchronization constraints between the sender and the receiver. Moreover, this mode is suited with one-sided programming models which offer a simple programming interface and high performance. In addition, both the asynchronous mode and the one-sided scheme take advantage of the DMA feature to achieve a RDMA communication. Note that the one-sided programming model insure the implementation of all message-passing applications. Finally, the RDMA-write is sufficient to implement both two modes. We opted for the asynchronous mode which allows that the completion of the send operation does not require the intervention of the receiver process to take a complementary action. Also, we recommended the use of the one-sided protocol for its efficiency and its flexibility. The communication library previously presented which adopt these two paradigms (the one-sided protocol and the asynchronous communication mode) is RWAPI. Therefore, we opt for RWAPI in order to design and implement a DNIDS. The next section provides a deep analysis of the different steps of the communication and depicts the recommendations to design and implement an
Fig. 2: Sending a large message with one-sided communication protocol. It is benefit to distinguish between small and large messages in order to use the hardware efficiently and to implement a efficient one-sided scheme. Finally, data transfer is the most important step of the communication. So, care should be taken when writing its routines.
9
Communication control
The communication control focuses on how to retrieve messages from the network device. How should the NI inform the user process about the completion of the receive? an implementer can choose between interrupts or polling. The interrupt-driven approach lets the network device signaling the arrival of a message by raising up an interrupt. This is a familiar approach, in which the kernel is involved in dispatching interrupts to applications in the user space. The alternative model for message handling is polling. In this case, a network device does not actively interrupt the CPU, but merely sets some status bits to indicate the arrival of a message. The application is required to poll the device status regularly; when a message is detected, ISBN: 978-960-474-258-5
119
Advances in E-Activities, Information Security and Privacy
efficient DNIDS over RWAPI. The memory allocation step precedes any data transfer but it affects the implementation and the performance of data transfer routines. Network protocol implementers should avoid the virtual-to-physical translation of memory page addresses while transferring data. Communication routines should take into account that a big number of small messages may be exchanged. These messages correspond to memory addresses which are used in RDMA communications. Thus, a distinction between small messages and large messages is interesting. For small messages, PIO or the write-combining method is usable and DMA is suitable for large messages. In order to implement an asynchronous communication, the send request should be stored in a send queue. The send routine just need to append a send descriptor to the send queue. It is preferable to store the send queue in the network interface memory, thus avoiding the network interface to poll send descriptors from the host memory. Similar to the send message, the receive completion descriptor should be stored in a host receive queue to avoid the process to poll from the network interface. Finally, to achieve a good communication control, the programmer should use both interruptions and polling. Polling is suited for fine-grain applications and interrupts are used with coarse-grain ones. The programmer should avoid all unprotected critical sections and should use interrupts to signal exceptions such as the overflow of a buffer or a security problem.
integrating heterogeneous IDSs on grids. In Proceedings of the 4th international Workshop on Middleware For Grid Computing (Melbourne, Australia, November 27 December 01, 2006). MCG '06, vol. 194. ACM, New York, NY, 12. [7] Wang, Y., Behera, S. R., Wong, J., Helmer, G., Honavar, V., Miller, L., Lutz, R., and Slagell, M. 2006. Towards the automatic generation of mobile agents for distributed intrusion detection system. J. Syst. Softw. 79, 1 (Jan. 2006), 1-14. [8] Kannadiga, P. and Zulkernine, M. 2005. DIDMA: A Distributed Intrusion Detection System Using Mobile Agents. In Proceedings of SNPD-SAWN. IEEE Computer Society, Washington, DC, 238-245 [9] Zimmermann, J. and Mohay, G. 2006. Distributed intrusion detection in clusters based on non-interference. In Proceedings of the 2006 Australasian Workshops on Grid Computing and E-Research - Volume 54 [10] C. V. Zhou, C. Leckie and S. Karunasekera, "A Survey of Coordinated Attacks and Collaborative Intrusion Detection", in Computers & Security (Elsevier), July 2009 [11] Zaman, S. and Karray, F. 2009. Collaborative architecture for distributed intrusion detection system. In Proceedings of the Second IEEE international Conference on Computational intelligence For Security and Defense Applications. IEEE Press, Piscataway, NJ, 37-43. [12] Chen, K., Yu, F., Xu, C., and Liu, Y. 2008. Intrusion Detection for High-Speed Networks Based on Producing System. In Proceedings of WKDD. IEEE Computer Society, Washington, DC, 532-537. [13] Kruegel, C., Valeur, F., Vigna, G., and Kemmerer, R. 2002. Stateful Intrusion Detection for High-Speed Networks. In Proceedings of the 2002 IEEE Symposium on Security and Privacy. IEEE Computer Society, 285. [14] Abdelhalim Zaidi, Tayeb Kenaza, Nazim Agoulmine, "IDS Adaptation for an Efficient Detection in High-Speed Networks," Internet Monitoring and Protection, International Conference on, pp. 11-15, 2010 Fifth International Conference on Internet Monitoring and Protection, 2010 [15] Schuff, D. L., Choe, Y. R., and Pai, V. S. 2007. Conservative vs. optimistic parallelization of stateful network intrusion detection. In Proceedings of the PPoPP '07. ACM, New York, NY, 138-139. [16] Vasiliadis, G., Antonatos, S., Polychronakis, M., Markatos, E. P., a`nd Ioannidis, S. 2008. Gnort: High Performance Network Intrusion Detection Using Graphics Processors. In Proc. of the 11th international Symposium on Recent Advances in intrusion Detection 116-134.
References [1] "GM: A message-passing system for Myrinet networks 2.0.12," , 2001. [2] O. Ben Fredj and É. Renault, "The Critical’s Path of Communication Model Analysis for a Performant Implementations of High-speed Interfaces over the Myrinet Interconnect.," In the Proceedings of CIC 2006, Las Vegas, Nevada, USA, June 2006, B. J. d’Auriol, H. R. Arabnia, and A. Pescapè, eds., 2006, pp. 192–198. [3] "Virtual Interface Architecture Specification, Version 1.0, published by Compaq, Intel, and Microsoft," December 1997. [4] K.B. Chandradeep, "A Scheme for the Design and Implementation of a Distributed IDS" , 2009 First International Conference on Networks & Communications, 2009.pp. 265-270 [5] Chu, Y., Li, J., and Yang, Y. 2005. The Architecture of the Large-scale Distributed Intrusion Detection System. In Proceedings of PDCAT. IEEE Computer Society, Washington, DC, 130-133. [6] Silva, P. F., Westphall, C. B., Westphall, C. M., and Assunção, M. D. 2006. Composition of a DIDS by ISBN: 978-960-474-258-5
120