A Study of the Impact of Compression and Binary ... - CiteSeerX

0 downloads 0 Views 103KB Size Report
SOAP Web Services have been very successful as a set of protocols ... discussion of binary encoding and compression ...... SN_04_XMLCompress.pdf. [22] Ng ...
A Study of the Impact of Compression and Binary Encoding on SOAP Performance1 Alex Ng1 [email protected] 1

Paul Greenfield 2 [email protected]

Shiping Chen 2 [email protected]

Department of Computing, Macquarie University, North Ryde, NSW 2109, Australia 2

CSIRO ICT Centre, PO Box 17, North Ryde, NSW 1670, Australia problems in the current standards. The majority of these proposals are incompatible with current textbased XML implementations and require specially written protocol handlers or agents on both the sender and recipient ends.

Abstract SOAP Web Services have been very successful as a set of protocols for platform-independent service invocation. However, the use of text-based XML encoding results in increases in processing overhead, storage requirements and bandwidth consumption compared to traditional alternative binary protocols. This problem is made worse by the use of base64 encoding for binary data. The inefficiencies that come from the use of text-based XML have led to a number of proposals aimed at improving performance through the use of binary encoding or compression. This paper presents the results of studies into the performance impact of current and proposed mechanisms used to encode binary data and the effectiveness of compression for typical SOAP Web Services.

One approach commonly suggested for improving the performance of XML is to use a binary rather than a text-based encoding format [29]. These binary formats are not without their own problems and Pal et. al. [25] argue the case against standardising a binary representation of XML, raising issues of multiple binary representations (big or little-endian) and interoperability with existing standards, such as the Infoset [6]. Girardot [9] argues that efficient compression could solve the verbosity and bandwidth issues, without needing to move away from text-based XML. There are a number of XML compression techniques available [21], such as gZip, XMill [16], and Millau [9]. Although these techniques all produce good compression ratios, results of studies into their effectiveness as mechanisms for improving SOAP performance are not readily available.

1. Introduction The W3C Web Services Architecture Requirements draft [1] states that the top-level goals of the Web Services Architecture are “Interoperability, Reliability, Integration with the World Wide Web, Security, Scalability and Extensibility, and Management and Provisioning”. XML Web Services meets most of these requirements, but the verbosity inherent in XML encoding may cause performance problems in some classes of applications and in resource-constrained environments.

Given our previous studies into SOAP performance [22], we wanted to examine both the likely effectiveness of these binary encoding mechanisms and to see how well they compared to the simple alternative of compressing text-based XML messages. Binary data, such as encrypted text and binary attachments, is becoming more important as the XML Web Services stack becomes richer, and so we were also interested in examining whether the binary XML encoding proposals offered better

The performance characteristics of SOAP have been reported in earlier studies [4,7,22] and there are a number of performance enhancement mechanisms being proposed [4,10,26,27,29] to fix perceived

1

This work was supported by the Australian Government through the Advanced Networks Program of the Department of Communications, Information Technology and the Arts as part of the CeNTIE project

46

performance than alternatives such as the recent MTOM standard [10]. Our studies were focussed on typical commercial uses of SOAP and Web Services, rather than on the use of SOAP for high performance scientific computing and bulk data transfers.

proposals for binary-encoded XML offer significantly better performance for bandwidthconstrained situations? The rest of this paper is organised as follows: a brief discussion of binary encoding and compression mechanisms available or proposed for the SOAP protocol is given in section 2. Section 3 explains how we conducted the tests, including the decisions we made and the metrics used. Section 4 provides the results of our analysis of the data collected in our measurements. Section 5 discusses related work and our conclusions are presented in section 6.

The studies reported in this paper address the following questions: (1) How effective is compression as a way of improving the performance of typical commercial Web Services applications? Our studies have shown that good implementations of SOAP are not unduly processor intensive but could well be limited by network bandwidth. This finding would suggest that trading off some processor time to reduce bandwidth requirements may be useful in some circumstances. Since there are contradicting claims about the benefits of XML compression [27], we wanted to know what level of performance improvement could be obtained by using a simple compression mechanism, such as gZip.

2. Background SOAP is often perceived as inefficient when compared to long-established binary protocols such as IIOP and DCE RPC. This inefficiency is thought to come from its use of text-based XML encoding and has led to proposals using binary encodings such as Fast Web Services [26]. The supposed inefficiency of XML encoding comes from two sources: the verbosity of XML and the increase in the size of binary data that results from the use of Base64 encoding. The verbosity of XML

(2) How effective are the proposals for binary encoding for XML likely to be in improving the performance or cost-effectiveness of typical commercial Web Services applications? Do the

soapenv:actor="myActorURI" soapenv:mustUnderstand="1"> MIIDQTCCAqqgAwIBAgICAQQwDETAP…dmz09RosDKkLlM woAOGiuNvend5uveoXH0r23xJY8= kTahjUItjYFlR485FIOY9V….utQ8=

Figure 1 Example of a SOAP Message containing WS-Security information in the SOAP Header and Body

47

The W3C MTOM (Message Transmission Optimization Mechanism standard [10] selectively encodes portions of a SOAP message using XMLbinary Optimized Packaging (XOP) [11] to efficiently serialise XML Infosets containing binary data. MTOM streams binary data as MIME message parts. At the Infoset level, before final message serialisation, the binary data can be accessed as canonical base-64-encoded text so that other WS-* mechanisms, such as WS-Security signatures, continue to work properly. This allows seamless integration of MTOM with other WS-* protocols while still reducing the bandwidth required to send binary data over the wire.

comes from its use of text-based tags and ‘readable’ keywords, and is made worse by repetitive structures such as arrays and lists. The types of binary data carried within SOAP messages can be broadly classified into two classes: (1) inherently binary data, such as JPEG images; and (2) cryptographic data such as signatures, certificates and encrypted content. This binary data is usually transferred using the Base64 encoding convention [14]. This convention maps binary data to 7-bit USASCII strings, turning each 24-bit group (3 octets) of input data into 4 output characters. This increases the size of the encoded binary data by 33% and results in greater processor and bandwidth requirements.

W3C has also formed the Binary Characterization Working Group [29] to consider ways of improving the efficiency of XML through the use of binary encoding. Sun Microsystems has been working on the Fast Infoset Project [27] which uses ASN.1 and the Packed Encoding Rules (PER) [13] to produce binary-encoded XML. They have reported initial results showing that Fast Infoset applications perform two to three times faster than ones built using JAXRPC. The primary concerns of the Web Services community with these binary XML proposals are interoperability, compatibility with other WS-* standards and whether the problems being solved by binary encoding actually exist at all.

The use of the WS-Security standard [20] to secure Web Services results in the inclusion of a number of binary elements into SOAP messages. This standard uses a combination of XML Digital Signatures, XML Encryption and various security tokens to support authentication, message integrity and confidentiality. Many of these are based on cryptography and use binary data such cryptographic hashes, keys and X.509 certificates [12] that has to be carried inside SOAP messages. Figure 1 shows a SOAP message containing WS-Security header elements. The tags , , and contain binary data that is usually encoded using Base64 encoding.

The bandwidth required for transferring XML messages can also be reduced by using compression. A few of the promising XML compression techniques available are gZip, XMill, XGrind, Xpress, and XComp. Nair [21] reports that XMill is the best performer for messages over 20KB, XComp is a better choice for messages that are smaller than 4KB, and gZip is best for messages between 4KB to 20KB in length. The studies reported in this paper used gZip to evaluate the effectiveness of compression because it was widely available, provided better compression rates for the message sizes being tested [21] and required no prior knowledge of the document structure.

The question of how best to efficiently send binary data using the SOAP protocol has been debated in the Web Services community for some time. There have been a number of standards for attaching binary data to SOAP message proposed over the last few years. One of the earliest of these was SOAP Messages with Attachments (SwA) [24] which used the multipart/related MIME type to bind attachments to a SOAP envelope. SwA had problems with efficiency that came from MIME’s use of text strings to define the boundaries between message parts, and, more importantly, MIME messages could not be modelled as an XML Infoset. The first of these problems was addressed by DIME [23], which used explicit lengths rather than boundary strings to delimit message parts. The problem of incompatibility with the XML Infoset became more important with the advent of new standards such as WS-Security and so DIME was supplanted by PASwA (Proposed Infoset Addendum to SwA) which solved the XML Infoset compatibility problem and used standard MIME encoding. PASwA was picked up by the W3C XMLP working group and heavily influenced the current MTOM standard.

3. Experimental Platform and Design The test reported in this paper looked at the likely performance impact of compression and binary encoding for typical commercial uses of SOAP and Web Services. Mimoso [18] reports that most current Web Services projects are focussed on integrating internal systems and Kim and Rosu’s [15] survey shows that about 92% of SOAP messages are smaller than 2KB. Our test scenario reflects these findings

48

record and 50 product details, representing an invoice or a customer statement.

and models clients and servers exchanging messages about customers and products. We use a range of messages of varying complexity and size and simply send requests from a client and wait from the server so that we can measure response times (latency). We also measure processor utilisation and bandwidth requirements. The test client and server were built using Microsoft’s ASP.NET 1.1 SOAP product using the default Document/Literal encoding style. This combination had been shown to be the fastest implementation in our previous tests [22].

The next set of messages (simpleJPG, mediumJPG, and complexJPG) added a 14 Kbyte binary JPEG image, perhaps a photograph of a customer or product, to each of the first three messages. The third set of messages (simpleSigned, mediumsigned, and complexSigned) were the same as the first set, but the message content was signed and an X.509 certificate was included in the message header. These tests let us look at how efficient the various alternatives were at handling the small binary header elements used by WS-Security.

3.1 Overall Design The tests reported here were based on a simple clientserver application with effectively no application logic on either side. Those tests where signatures or encrypted data were sent did no security processing, such as decryption or checking signatures, as we were only interested in the costs of encoding and transferring SOAP messages. The test setup consisted of the following:

The last set of messages (simpleEncrypt, mediumEncrypt and complexEncrypt) were encrypted forms of the first set of messages. These messages included X.509 certificates and other binary data in the header as well as the binary encrypted message content. We set up four groups of tests to evaluate the impact on resource utilisation and performance of compression and binary encoding. The first set of tests simply sent and received all twelve test messages using conventional text-based SOAP messages with the binary content being sent as Base64 encoded strings. These tests established baseline numbers for performance, and processor and bandwidth requirements.

1. A multi-threaded Web Service test driver written in Microsoft’s C# language using the ASP.NET API. This client just constructs a SOAP request, sends it to the server and waits for the response. 2. A target Web Service, also written using C# and ASP.NET, that receives the test message, accesses its content and returns the original message with its direction field changed to “RESPONSE”.

The second set of tests looked at the effect of compression on SOAP performance and resource requirements. We did this by modifying our test configuration to call the SharpZipLib.dll gZip library [30] as part of the Web Services stack on both of the client and server machines and then repeating the tests run in the first scenario.

A configuration file was used to control the setup of the test environment, allowing us to specify parameters such as the number of client threads to be used, the duration of the test run and the message and encoding methods to be used.

The third set of tests sent and received the same messages using Microsoft .NET Remoting using binary formatting and running over an HTTP transport. .NET Remoting supports both XML encoding (using the SOAP RPC/enc encoding style) and a binary equivalent. This flexibility gives us a commercial implementation of something akin to binary-encoded SOAP and showed us what could be expected from such a protocol.

3.2 Test Scenario and Message We used twelve test messages in this study, allowing us to see how binary encoding and XML compression performed with messages of varying length and complexity. The first set of messages represents conventional text-based XML requests and responses. The first of these messages (simple) contains a single customer’s account record and uses string, Boolean and datetime data items. The second message (medium) consists of twenty customer account records, representing a batch inquiry and subsequent update transaction. The third message (complex) consists of one customer

The fourth set of tests looked at the likely effectiveness of MTOM for transferring binary data over SOAP. As we had no implementation of MTOM available to us, all we could do for these tests was translate our binary messages into the MTOM format

49

by hand and calculate the resulting bandwidth requirements.

4.1 Message Size Analysis A shareware TCP traffic monitor program (tcptrace) [8] was used to measure the size of the messages passed from the client to the server. Table 1 shows the size of the messages seen on the wire and the amount of compression given by the alternatives under test. As we did not have an implementation of MTOM available, we calculated the message sizes for MTOM by replacing the Base64 encoded parts of the messages with their binary equivalents and adding the required overheads for MIME and MTOM’s XOP packaging, such as elements.

Overall, these tests let us evaluate the effectiveness and costs of the various proposals for transferring binary data over SOAP, and at the relative effectiveness of compression as a way of reducing bandwidth requirements.

3.3 Performance Metrics and Measurement The following performance measurements were taken during our tests and used to evaluate performance, resource usage and scalability. (1) Latency. We measured the round-trip time taken to send a single message and receive a response, from the test driver to the server and back to the waiting test driver.

These tables show how effective binary encoding and compression are as ways of reducing message sizes, and so the bandwidth required for these messages.

(2) Processor and Bandwidth Utilisation. We also measured processor utilisation on both client and server machines and observed the bandwidth required to send and receive each test message.

Base SOAP+ gZip

4. Performance Analysis

Remoting Binary

Two identical Dell computers were used as both client and server systems. The hardware and software configuration of these systems was: • • • • •

874 958

SOAP

SOAP SOAP+ gZip

All the tests were run over the high performance CeNTIE wide area network [3] (1 Gigabit/second core network with 100Mbps access at the edges) with the client system located in Sydney and the server 300km away in Canberra. We have used this network configuration in previous tests. The delays caused by traversing a wide area network give us a better picture of how the alternatives under test would perform in future large-scale application integration scenarios. Using this network configuration also lets us make use of our existing SOAP performance model to predict the likely performance of these technologies in low bandwidth situations.

Remoting Binary

42%

27%

37%

720

15647

2510

3457

18%

26%

-4%

-6%

16608 22%

3130 -29%

3952 -21%

Medium Msg +JPEG +Signed 27935 9149 13860 3316 50% 64% 18403 5275 34% 42% 23321 10012 17% -9%

+Encrypt 12455 10068 19% 10168 18% 10666 14%

Complex Msg +JPEG +Signed 41498 22712 14000 3240 66% 86% 21083 7955 49% 65% 36897 23508 11% -4%

+Encrypt 31219 28004 10% 23881 24% 24380 22%

Base 7599 2278 70% 3476 54%

MTOM

SOAP SOAP+ gZip Remoting Binary

Microsoft Visual Studio.NET 2003 and C# were used to develop the test driver and the Web Services that were used for all tests.

MTOM

Base 21160 2322 89% 6153 71%

Table 1 – Observed Message sizes

50

+Encrypt 3271 2072

-10%

MTOM

3GHz Intel Pentium4 processor 1024Mbytes of memory Intel PRO/1000MT network card Microsoft Windows Server 2003 Standard Edition Microsoft .NET Framework 1.1

Simple Msg +JPEG +Signed 21210 2424 12395 1776

Some of this added binary data was cryptographic hashes and keys. Overall, the introduction of WSSecurity signatures added about 3K bytes to the textbased XML SOAP messages and 2K bytes to the .NET Remoting binary messages. gZip compression was able to compress the text-based XML messages to sizes that were smaller than .NET Remoting, perhaps due to its ability to compress noncryptographic information inside the X.509 certificates which would be carried across untouched as binary data by .NET Remoting. Our calculations showed that the MTOM message sizes were all greater than their text-based XML counterparts for the signed messages. Our signed message has three small binary fields, Digest Value, Digital Signature and Binary Security Token (X.509 certificate) and the savings coming from not using Base64 encoding are less than the fixed added costs of the MIME headers and XML protocol elements needed by MTOM.

There are two factors at play here and the relative effects of these can clearly be seen in the results. As expected, the effectiveness of binary encoding and compression vary with the different test scenarios. The JPEG image is already compressed so further compression could not be expected to have much effect and the cryptographic data is largely uncompressible mathematically random data. Compression can be expected to undo the 33% increase in size that came from mapping binary data to Base 64. For the simple message, the binary encoding used by .NET Remoting gave the most compact messages, followed by SOAP and then compressed SOAP. The message sizes were approximately the same, from 720 bytes to 958 bytes long. The fact that the use of GZip actually increased the size of the SOAP message shows that there is some intrinsic overhead in the gZip format that makes it less useful for small messages, confirming the results published by Nair [21]. For medium messages, the improved efficiency of binary encoding starts becoming apparent, with the binary message being only half the size of the textbased XML original. Compression is also quite effective here as well, giving a 70% reduction in message size. This trend was repeated for the complex message, with the binary message being only 30% of the size of its text-based equivalent. gZip compression was very effective here as well, reducing the bytes required on the wire from 21160 down to 2322. This test is probably a best case for compression, given that our complex test message contains arrays which leads to a large number of repeated XML tags which can be compressed effectively by gZip.

The encrypted message tests add new XML header elements, including some binary-encoded ones, and a binary message body that increases in size as we go from simple to complex. The text-based message will carry all of these binary components as Base64 strings. The encrypted message body is effectively a single random binary string that will be basically uncompressible. All of its internal structures and tags have gone as well, taking away much of the advantage of compression and binary encoding, and leaving only the overhead introduced by Base64 encoding to be eliminated. This is reflected in the results, with most alternatives to text-based encoding just offering up to about 20% reduction in message size, a long way from the 89% reduction achieved by compression for the complex message.

The JPEG test messages added 14KB of already compressed binary data to the messages used in the previous tests. The use of Base64 encoding in the text-based form of these messages increased the size of the image by 33% to around 20Kbytes. The binary .NET Remoting protocol carried the image across as binary and so only added 14 KB to each of the messages. Compression was quite effective in reducing size of the SOAP messages, more than undoing the effects of the Base64 encoding. Our calculations on the effect of MTOM encoding for the JPEG attachment showed that it would also be quite effective in this case, giving results similar to those observed with .NET Remoting.

In summary, these tests showed that both compression and binary encoding offered significant reductions in messages sizes compared to text-based XML. These reductions were greater for the medium and complex messages that had larger numbers of repetitive XML tags and structures. Compression, binary encoding and MTOM all handled large binary attachments efficiently, undoing the effect of Base64 encoding. MTOM generally works well for large binary attachments, but the overhead of using MIME and XOP makes it less suitable for small binary inclusions such as cryptographic hashes and X.509 certificates. The encrypted message was a problem for all the alternatives as the single large encrypted binary body was largely incompressible and showed no internal structure or tags.

The WS-Security signed message tests added three smaller binary components to the original messages, as well as another few hundred bytes of XML tags.

51

network, such as a 1.5Mbps T1 link would change these latency numbers considerably, and significantly improve the competitiveness of bandwidthconserving technologies such as compression and binary encoding. This issue is discussed later in this paper in section 4.4.

4.2 Latency Analysis The latency tests measured the single-thread roundtrip response time from the text-based SOAP, compression and binary .NET Remoting tests. Figure 2 shows the average latency recorded for each message type for each of these tests.

These latency tests all ran over HTTP using the Microsoft ASP.NET defaults, including having the client send the HTTP Post and then wait for an HTTP Continue from the server before sending the actual SOAP message itself. This optional behaviour adds one network round-trip (about 3-4ms) to the observed latencies.

Confirming the results seen in our previous tests [22], .NET Remoting HTTP/Binary proved to be the fastest overall technology but text-based SOAP was competitive for all but the complex message type. The use of compression increased latencies in all cases, often by a factor of two. These latency results come from tests that were run over a lightly-loaded 1Gbps wide-area network with 100Mbps links from the test computers. This highbandwidth test environment produces latency figures that are heavily influenced by the processor time needed on the client and server systems, rather than by the time that messages take to traverse the network. Changing to a lower bandwidth edge

.NET Remoting was about twice as fast as text-based SOAP for the complex message type. This test had a large number of XML tags in a repetitive array, and showed up the performance impact of XML’s verbosity. The latency figures obtained for the Encryption message set confirms that the performance of the

90.0 80.0 70.0 60.0 50.0 40.0 Latency (mS) 30.0 20.0 10.0

Remoting

SOAP

SOAPgZip

Figure 2- Results of latency tests

52

Remoting SOAP SOAPgZip

EncComplex

EncMedium

EncSimple

JpegComplex

JpegMedium

JpegSimple

509Complex

509Medium

509Simple

Complex

Medium

Simple

0.0

SOAP implementation depends on the number of XML tags and structures that have to be transferred and parsed. In this case, the tags and XML structure in the message body have been subsumed into the encrypted binary string and .NET Remoting no longer has a factor of two advantages over text-based SOAP, even though the SOAP message size is about 20% larger because of the overhead of Base64 encoding.

found that other implementations use considerably more processor resources, up to an order of magnitude more in some cases). These results were obtained using a fast wide-area network and processor utilisation would decrease considerably if the tests were repeated over a slower network. This low processor utilisation suggests that compression might be a very good technique for improving the performance of SOAP over lower bandwidth networks.

In summary, these latency results show that the performance of text-based SOAP is actually comparable to binary-encoded alternatives, such as .NET Remoting, for messages that do not have complex structures and large numbers of XML tags. The overhead of Base64 encoding made little difference to the observed latencies. These results only reflect the performance observed on a lightlyloaded high-bandwidth network and similar tests run over a lower-bandwidth or busier network would show increased latencies for the larger text-based SOAP messages, especially those with Base64encoded binary elements.

4.4 Performance Limits and Compression Our discussion so far has concentrated on resource utilisation and performance (as measured by latency) on a high-bandwidth network. Our results show that on this network text-based SOAP offers a level of performance comparable to competing binary protocols and that processor usage is quite low for the best implementations. Although this may change in the future, few users today have Gigabit networks available to run their Web Services applications and this section looks at the likely performance of SOAP and its variants on more commonly available networks.

4.3 Processor Utilisation We measured the processor utilisation on both the client and server systems while we were running these tests. The average processor utilisation figures observed ranged about 10% to 20% for text-based SOAP and .Net Remoting, and up to about 30% for compressed SOAP. These results were all obtained using non-Hyperthreaded Intel P4 3.0GHz processors and a single-threaded driver application. These processor utilisation numbers can be converted into the processor time required per message by the following formula:

Our analysis of the components making up the observed SOAP latency numbers for the medium messages is shown in Figure 3. This chart shows that on our 1Gbps network, processor time and network delays are contributing about equally to latency in this case.

Medium

Proc time = proc utilisation *100 / calls per sec This gives the following average processor-timeneeded-per-call figures: SOAP client SOAP server Comp. client Comp. server Remoting client Remoting server

Simple 1.4ms 1.3ms 3.3ms 3.3ms 0.4ms 0.7ms

Medium 3.2ms 2.9ms 8.5ms 8.6ms 1.3ms 1.8ms

Client proc

Complex 6.8ms 6.1ms 19.2ms 18.5ms 2.1ms 3.2ms

Server proc Light Speed Netw ork

Figure 3 – Components of observed latency Compression allows us to optimise overall performance and/or cost by trading off processor time against network bandwidth. Smaller messages will traverse a network more quickly, as fewer bits need to be clocked in and out of network interfaces and switches along the path, so reducing latency. We

Table 2 – Processor time per request These results show that SOAP is not a processorintensive protocol, at least not for the Microsoft implementation we used in these tests (we have

53

can reduce the size of a message by compressing it but this will require additional processor time that will add to latency, possibly negating the savings that came from using smaller messages in the first place.

Table 3 shows the performance (latency) predicted by this model for our base message types. These calculations show the performance trade-offs between processor time and network bandwidth quite clearly. In this environment, compression is not worthwhile just on performance grounds for the simple messages as the compressed messages are not smaller than the original messages (see section 4.1), so there is no reduction is network transit time, and the additional processor time needed for compression just adds to the latency. This situation changes for the medium and complex messages which compress quite well. In these cases, compressed SOAP is actually faster than uncompressed SOAP for bandwidths up to about 10Mbps for medium messages and 15Mbps for our complex messages (indicated by shaded cells in the table).

In the tests reported here, we were running on a lightly-loaded high-bandwidth network, so reducing message size had a small positive impact on latency that was overwhelmed by the negative impact of the additional processor time needed to compress the message. Our observations were that using gZip compression slowed down the overall performance in all test messages by at least 50%, indicating that compression was not effective in this environment for any of our test message types. Rather than repeating our tests with different network speeds, we built a predictive performance model based on our analysis of the factors contributing to the observed latency numbers. This model is based on a wide area network with a fast (multi-Gigabit) core and slower access links to the client and server systems at the edges. The model lets us vary the edge network bandwidth and predict the latency we would see for each of the protocols. Simple Bandwidth

Predicted Latency (ms) SOAP

gZip

Remoting

100Mbit/Sec

7

15

6

10Mbit/Sec

13

17

7

1Mbit/Sec

31

37

20

500Kbit/Sec

51

58

34

Medium Bandwidth

SOAP

gZip

Earlier authors looked at the effectiveness of XML compression in a number of different scenarios. Cokus and Winkowski’s [5] work reported on the use of compression for wireless applications using WBXML, MPEG-7 and ASN.1PER. Cai et. al.’s [2] analysis on the performance of Zip/Gzip and XMill for XML messages, and Zip on binary objects, used the TPC-H benchmark and concluded that Zip gave higher compression factors than XMill in smaller messages while XMill was more effective with large messages.

Remoting

16

26

8

10Mbit/Sec

28

30

13

1Mbit/Sec

147

72

67

500Kbit/Sec

279

118

127

Bandwidth

5. Related Work

Predicted Latency (ms)

100Mbit/Sec

Complex

Of course, improved performance for large messages may not the primary reason for using compression and the benefits of being able to push more SOAP traffic through existing network pipes or reducing the cost of the bandwidth required to support a given workload may well be more important in many cases. From these points of view, compression appears to be a useful technique for all but the least compressible messages, as it reduces the required bandwidth without significantly degrading response times, and even offers bandwidth reductions over binary protocols.

Predicted Latency (ms) SOAP

gZip

Remoting

100Mbit/Sec

29

46

11

10Mbit/Sec

62

55

20

1Mbit/Sec

386

109

115

500Kbit/Sec

745

169

220

Nair’s [21] survey of five different XML compression techniques (gZip, XMill, XGrind, Xpress, and XComp) reports on the time required to perform compression using each technique and the resulting compression ratios. Our study differs from these studies in that we are reporting on the overall performance of SOAP and the effectiveness of compression and binary encoding

Table 3 - Calculated latencies for different bandwidths

54

as ways of improving performance or reducing bandwidth requirements.

networks, as it reduces the required bandwidth without significantly degrading response times, and even offers bandwidth reductions over binary protocols.

6. Conclusion The studies reported in this paper have looked at the performance of SOAP for a number of typical messages, including ones containing both small and large binary elements. We have studied the bandwidth and performance of normal text-based SOAP, SOAP compressed using gZip and a fast RPC protocol based on SOAP (Microsoft’s .NET Remoting). We also calculated by hand the message sizes that would result from the use of the MTOM standard to encode the binary message elements in these messages.

Our next step in this work is to refine our models of SOAP performance and use these insights to find ways that the performance, scalability and effectiveness of SOAP can be improved.

References [1] Austin, D., Barbir, A., Ferris, C., et al. Web Services Architecture Requirements, W3C Working Group Note 11 February 2004 [2] Cai, M., Ghanderizadeh, S., Schmidt, R., et al. A Comparison of Alternative Encoding Mechanisms for Web Services. In Proceedings of the DEXA2002. 2002 [3] CeNTIE Home Page www.centie.org [4] Chiu, K., Govindaraju, M., and Bramley, R. Investigating the Limits of SOAP Performance for Scientific Computing. In Proceedings of 11th. IEEE International Symposium on High Performance Distributed Computing HPDC-11 2002 (HPDC'02). Edinburgh, Scotland, p. 246-254:IEEE, 2002 [5] Cokus, M. and Winkowski, D. XML Sizing and Compression Study For Military Wireless Data. In Proceedings of the XML 2002. [6] Cowan, J. and Tobin, R. XML Information Set (Second Edition), 4 February 2004 (on-line) Accessed 11 July 2004 http://www.w3.org/TR/xml-infoset/ [7] Davis, D. and Parashar, M. Latency Performance of SOAP Implementations. In Proceedings of IEEE Cluster Computing and the GRID 2002 (CCGRID'02). Berlin, Germany, IEEE, 2002 [8] Fell S. tcptrace, PocketSOAP http://www.pocketsoap.com/tcptrace/ [9] Girardot, M. and Sundaresan, N. Millau: an encoding format for efficient representation and exchange of XML over the Web, February 15, 2001 (on-line) Accessed 11 September 2003 http://www9.org/w9cdrom/154/154.html [10] Gudgin, M., Mendelsohn, N., Nottingham, M., et al. SOAP Message Transmission Optimization Mechanism W3C Proposed Recommendation 16 November 2004, (on-line) Accessed 19 January 2005 http://www.w3.org/TR/soap12-mtom/ [11] Gudgin, M., Mendelsohn, N., Nottingham, M., et al. XML-binary Optimized Packaging W3C Recommendation 25 January 2005, (on-line) Accessed 15 February 2005 http://www.w3.org/TR/xop10/ [12] ITU-T, Recommendation X.509 Information technology - Open Systems Interconnection - The Directory - Authentication Framework, ITU, 1988 [13] ITU-T, Information technology – ASN.1 encoding rules:specification of Packed Encoding Rules (PER), ITU-T Recommendation, X.691, ITU-T, 14 July 2002

Looking at message size, and so indirectly at bandwidth requirements, we found that both compression and binary encoding were quite effective at reducing message size, especially when the SOAP message had many tags and a complex or repetitive structure. These two techniques were also quite effective at undoing the growth in message size that comes from the use of Base64 to encode binary content in normal SOAP. MTOM appears to be an effective technique when used with large binary attachments, but the overhead of adding MIME elements to the SOAP message means that it is not effective for messages with a number of smaller binary components, such as those introduced when WS-Security signatures are used. Consistent with our previous studies, we found that SOAP is not a particularly slow protocol, at least for good implementations. Tests run on our high bandwidth network showed that SOAP offered latency times that were competitive to a fast binary protocol in most cases, only getting noticeably slower with large messages containing many XML tags. The added processor load of doing compression meant that the use of gZip in the Web Services stack degraded response times in all cases on our very fast network. Scalability and reducing costs are often going to be more important than improving already adequate response times. Compression lets us trade increased processor utilisation for reductions in message size and required bandwidth. Our tests also showed that good implementations of SOAP used little processor time to handle messages and parse their XML, possibly leaving enough processor resources for the implementation of compression at no increased cost. Compression appears to be a useful technique for all but the least compressible messages and the fastest

55

[28] Schmelzer, R. Will binary XML solve XML performance woes?, 22 Nov 2004 (on-line) Accessed 24 November 2004 http://searchwebservices.techtarget.com/tip/1,289483, sid26_gci1027726,00.html [29] XML Binary Characterization Working Group Public Page, (on-line) Accessed 22 June 2004 http://www.w3.org/XML/Binary/ [30] The Zip, GZip, BZip2 and Tar Implementation For .NET (web page) http://www.icsharpcode.net/OpenSource/SharpZipLib/ Default.aspx

[14] Josefsson, S., RFC 3548 - The Base16, Base32, and Base64 Data Encodings, The Internet Society, July 2003 [15] Kim, S.M. and Rosu, M.C. A Survey of Public Web Services. In Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters.p.312-313, 2004 [16] Liefke, H. and Suciu, D. XMill: An Efficient Compressor for XML Data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Dallas, USA, June 2000 [17] Ogbuji, U. Tip: Compress XML files for efficient transmission, 9 April 2004 (on-line) Accessed 1 November 2004 http://www-106.ibm.com/developerworks/ xml/library/x-tipcomp.html [18] Mimoso, M.S. Web services no longer 'emerging technology', 14 Oct 2004 (on-line) Accessed 15 October 2004 http://searchwebservices.techtarget.com/originalConte nt/0,289142,sid26_gci1015867,00.html [19] Mitra, N. SOAP Version 1.2 Part 0: Primer, W3C Recommendation 24 June 2003 http://www.w3.org/TR/soap12-part0/ [20] Nadalin, A., Kaler, C., Hallam-Baker, P., et al., Web Services Security: SOAP Message Security 1.0 (WSSecurity 2004), OASIS, March 2004 [21] Nair, S.S. XML Compression Techniques: A Survey, (on-line) Accessed 23 September 2004 http://www.cs.uiowa.edu/~rlawrenc/research/Students/ SN_04_XMLCompress.pdf [22] Ng, A., Chen, S., and Greenfield, P. An Evaluation of Contemporary Commercial SOAP Implementations. In Proceedings of 5th Australasian Workshop on Software and System Architectures (AWSA 2004); Melbourne, Vic. Swinburne University of Technology,p.64-71: 2004 [23] Nielsen, H.F., Sanders, H., Christensen, E., et al., Direct Internet Message Encapsulation (DIME), Intrernet Draft, Microsoft, 17 June 2002 [24] Nielsen, H.F. and Ruellan, H. SOAP 1.2 Attachment Feature W3C Working Group Note 8 June 2004, 8 June 2004 (on-line) Accessed 25 June 2004 http://www.w3.org/TR/soap12-af/#model [25] Pal, S., Marsh, J., and Layman, A. A Case against Standardizing Binary Representation of XML. In Proceedings of the Workshop on Binary Interchange of XML Information Item Sets. 2003 [26] Sandoz, P., Pericas-Geertsen, S., Kawaguchi, K., et al. Fast Web Services, August 2003 (on-line) Accessed 27 August 2003 http://developer.java.sun.com/developer/technicalArti cles/WebServices/fastWS/index.html [27] Sandoz, P., Triglia, A., and Pericas-Geertsen, S. Fast Infoset, June 2004 (on-line) Accessed 15 June 2004 http://java.sun.com/developer/technicalArticles/xml/fa stinfoset/

56

Suggest Documents