Agnostic AJAX

36 downloads 2083 Views 952KB Size Report
Despite the name AJAX (Asynchronous JavaScript and XML), it is well known that AJAX does not ... instructor) and the application returns a list of courses meeting the specified criteria and ..... With client targets of desktop or laptop systems ...
Agnostic AJAX: Asynchronous JavaScript and Data Clinton W. Smullen III, Stephanie A. Smullen University of Tennessee at Chattanooga, Chattanooga, TN 37403, USA [email protected], [email protected]

Introduction Despite the name AJAX (Asynchronous JavaScript and XML), it is well known that AJAX does not have to use XML as the data format for AJAX updates. This effort studies the use of four different data formats for AJAX updates, along with the use of gzip. The formats tested for AJAX updates are HTML, XML, JSON, and CSV. Comparisons are made based on the data size required to present the same information, the time needed by a browser to convert the data format to HTML for display, and the number of instructions needed to deliver the response to a query to an end user. These results should provide insight into the use of AJAX, and encourage developers to select AJAX update data formats appropriate to their applications, needs, and target clients.

The Application The application studied in this paper is based on an existing application that supplies real-time class information extracted from a university student information system (SIS). The user specifies one or more selection criteria (such as department, course/section, meeting days, start time/end time, location, instructor) and the application returns a list of courses meeting the specified criteria and additional information about each of the courses (including the title and current enrollment). The application uses a three tier model; the client communicates with the web server, which communicates with the database

server. It is a production application, used daily by students and faculty, not a “test” application. The web server is Apache, and the application uses PHP5 and custom database code to connect with the legacy SIS database. All pages returned are validated XHTML 1.1.

Fig1.jpg Figure 1. Initial HTML page for the production application. The initial page loaded by a user (see Figure 1) contains the HTML form used to prepare a query. There is a significant amount of “branding” overhead on this page; all of the University’s pages use the same layout, navigation items, style sheet, and graphics. These common elements consist of two graphical images, a CSS style sheet, and JavaScript supporting the common page navigation links, and total 15,573 bytes. These elements are linked to the HTML page and are static. For most browsers, they are downloaded once and cached, rather than being loaded with each query and response.

A typical user would first load the HTML page containing the query form (27KB) and the common elements (15.2KB). The user prepares a query and submits the query to a server application. The server application queries the SIS. The data extracted from the SIS is formatted as XML. The server process then reads the XML data and applies an XSLT transform to produce XHTML. The web server returns the XHTML as the response to the client. The page returned as a response to a query (see Figure 2) links to the common elements described above, contains the HTML formatted list of courses in answer to the query (or a message if no results are produced), and also contains the HTML form needed to make another query. As a result, even a query that produces no results has a response page of about 27KB (plus the linked common elements).

Fig2.jpg Figure 2. Results returned by standard production application.

Smullen and Smullen have compared the effects of using AJAX (with XML) to the HTML based application. In [SMU06] the effects on the client were investigated. In [SMU07] the effects of using AJAX on the server and on the request service times were studied. [SMU08] models the process, further analyzes the server effects and the network impact, and estimates the improvement for a typical user. For these studies, a test set of 122 queries were designed. These queries produce a range of responses, with HTML page sizes ranging from 27KB (a query that produced no results) to over 2MB (a query for all courses offered in the fall semester). Data on 13,260 queries have been collected and analyzed, totaling 2.7GB. A subset of the set of 122 test queries was used in this study as well.

Agnostic AJAX In Figure 2 the results of the query are displayed in a table (bounded by the gold bars). The remainder of the page other than this table is always the same. If the AJAX application implements the same look-and-feel as the HTML application, then the only change is how the response data is retrieved, converted to HTML, and displayed in the table. Hence for this study the other elements (the common elements, graphics, navigation menu, etc) will be ignored. This study focuses on the retrieval of a response to a query and the display of the results in an HTML table format to the user. The format of the data sent by the server responding to an XMLHttpRequest does not have to be XML. To study the effects of using different AJAX update data formats, four different formats were selected: HTML, XML, JSON, and CSV. The HTML format data used an HTML table that contained the results for the query; the HTML table was extracted from the production application’s HTML response, with all other HTML eliminated. It is just a table, and will not validate by itself. The XML format data was the XML produced by the production application. JSON refers to the JavaScript Object Notation format (MIME type application/json; see http://json.org and RFC4627). CSV refers to the comma-separated-value format (MIME type text/csv; see RFC4180). Data adapters were written to convert the XML format data produced by the production application to JSON and CSV formats. Unnecessary white space was eliminated from each of the AJAX update formats to better standardize the size comparisons. Another alternative was also studied: having the server return gzipped data. The GZIP data (MIME type application/x-gzip) was produced by gzipping the other data format files.

Summary descriptive statistics for the comparative sizes for the responses to each of the set of 122 queries are shown in Table 1. The full HTML size is labelled HTMLf; the linked common element sizes are not included in this value. The label HTMLgz represents the gzipped HTML table data. The other rows contain values for four other AJAX update formats for the set of 122 queries. Figure 3 displays the response sizes sorted by HTMLf size. It graphically compares the sizes of the responses. The full HTML (HTMLf) and HTML AJAX update format (HTML) track closely, while the other formats, in decreasing size, are XML, JSON, CSV, and HTMLgz.

Mean

Std Dev Median Min

Max

HTMLf

251,398

371,727

81,231

28,945

2,172,516

HTML

198,134

330,197

46,788

528

1,904,511

HTMLgz 12,722

19,210

3,241

224

105,443

XML

103,191

172,509

23,461

214

995,045

JSON

70,444

117,853

15,724

31

679,365

CSV

40,015

66,724

8,735

11

383,826

Table 1. Summary of response sizes in bytes for set of 122 queries. A smaller response size is generally better. A smaller response lessens the download time and the network impact, and means the client must process fewer bytes. The impact on the server depends on whether the server has the smaller responses stored or must generate them for each request. Generating a smaller response may in fact negatively impact the server performance if the effort required to do so is greater than the effort required to send a larger, easier to obtain response.

Fig3.jpg Figure 3. Size in KB for six response formats for the 122 queries. One measure of the effectiveness of a data format is the Byte Transfer Ratio (BTR); see [SMU06]. The BTR = (AJAX size/HTML size)*100. This represents the reduction achieved by using AJAX when compared to the HTML application. A Byte Transfer Ratio of 40% means that the size of the AJAX transfer is 40% of the size of the full HTML page displaying the same response (not including the linked common elements). A small BTR is better; it means the AJAX application is performing more efficiently than the HTML application. However any value less than 100% represents a reduction in bytes transferred when using AJAX. Figure 4 plots the value of BTR (in percentage) versus the size of the full HTML response page in KB. The BTR savings for the various formats are not linear. These curves can be statistically fitted by functions of the form (1- e- U); see [SMU08] for details.

It can be seen from Figure 4 that while small BTR values (hence large percentage savings) can be found for the smaller HTML response sizes, the percentage savings level off for larger HTML response sizes. These percentage levels, for this set of data, are approximately 88% for the HTML table format, 46% for XML, 31% for JSON, 18% for CSV, and 5% for HTMLgz format.

Fig4.jpg Figure 4. Byte Transfer Ratio versus full HTML size in KB for the 122 queries. Figure 5 plots the size in KB for various AJAX update possibilities against the size in KB of the corresponding full HTML response page. These plots appear to be straight lines; Table 2 contains the regression coefficients supporting this conclusion. One way to interpret these results is as follows: if the size of the full HTML response increases by 1 KB, then the size of the XML update will increase by about 46% of that, JSON by about 32%, CSV by about 18%, and GZIP by about 5%. These results apply across the entire range of query sizes tested.

Slope Intercept R2 HTMLgz 0.05

-0.21

0.991

CSV

0.179

-4.98

0.9997

JSON

0.317

-9.04

0.9999

XML

0.464

-13.16

1

HTML

0.888

-24.59

1

Table 2. Regression coefficients for Figure 5.

Fig5.jpg Figure 5. AJAX update size in KB versus full HTML size in KB for 122 queries.

Performance The results presented so far are based on size. Downloading fewer bytes is clearly an advantage – less work for the server, the network, and the client. However, making a decision solely on size may miss the complexity of the code needed in the client to reconstitute the HTML table from the update data format, and the work needing to be done on each update. To simplify the performance study, we focused on just the process of updating the
section of the page; all other elements were ignored. Since these elements were exactly the same for all of the variants considered, no comparative performance information was lost by doing this. The data produced by the server application is semantically tabular – a simple, regular structure containing character data. No attributes or metadata were included with the data. Hence the results should be, in some sense, a “best case” comparison for the options considered. Eight test queries were selected from the set of 122 for performance analysis. These queries were posted against the production system and the complete results (HTML and XML) were stored for each. Storing the results was necessary to ensure uniformity; otherwise the use of live data drawn from the production system could produce varying results for the same query done at different times. The set of eight queries included a query that returned no search results, one that returned five results, and one that returned 380 KB of data. The size of the HTML page returned for each of the eight queries (not including the common elements) is shown in the row labelled HTMLf in Table 3. Note that the first case (HSRV) returns 28.3 KB even though no query results were found. Data files in the AJAX update formats discussed above were generated for each of the eight test queries. The comparative sizes for these are shown in Table 3. The gz format has a mandatory data header that, for very small files, may actually force the gz file to be larger than the original file; see the HSRV entries for JSON/JSONgz and CSV/CSVgz. Other than these small files, every data format shows a reduction in size from the full HTML (HTMLf) size.

HSRV

FLNG

BUSA BMKT HIST

BIOL

ENGL

u8-11

HTMLf

28945

33472

40532

52896

74042

123390

213142

389211

HTML

528

4541

10818

21790

40500

84454

164549

320822

HTMLgz 224

761

1055

1965

2679

4803

8420

21791

XML

214

2117

5450

11190

20744

44424

88223

167498

XMLgz

181

561

826

1438

1870

3378

6006

16821

JSON

31

1353

3642

7526

13832

30320

61243

115234

JSONgz

61

419

678

1254

1599

2999

5414

15172

CSV

11

784

2083

4207

7433

17101

35814

66999

CSVgz

41

308

541

1064

1412

2656

4890

13657

Table 3. Comparison of response sizes in bytes for set of 8 queries. An XHTML page was written containing a
area, along with JavaScript to issue an XMLHttpRequest to fetch an AJAX update and retrieve the data. The AJAX code called a display function to convert the AJAX update to an HTML table and then display the table on the page in the
area. A separate display function was coded for each of the different AJAX update formats. The exact same XHTML page and JavaScript were used for each test, other than the display function and the format of the data sent by the server. The code used is experimental only, not of production quality. The code does not gracefully handle exceptions, performs no data validation, and no user-interface is implemented. Since no user interface is implemented, the AJAX XMLHttpRequest calls are all synchronous. The code requests the stored data from the server, downloads the data and displays it. The code represents a minimal test bed to measure the performance of the AJAX update alternatives. The XHTML and fixed JavaScript required 941 bytes. The sizes of the display code for the selected update formats in bytes are shown in Table 4. No special JavaScript was needed to handle the gzipped data, as unpacking the data was not performed by JavaScript. The browser handled unpacking the data and then passed it to the JavaScript code, which then processed it. The maximum size for any of the test bed pages, including all code, was less than 3 KB. Hence the

differences in code sizes among the alternatives do not appear to be important performance factors in this case.

HTML XML

JSON

CSV

Display code

18

1794

1511

1105

Total code

959

2735

2452

2046

Table 4. Test JavaScript code sizes. Two performance measures were investigated: (a) the time needed in the browser for JavaScript to process the AJAX update, produce the HTML table, and assign it to the
area, and (b) a count of the total number of instructions executed by the processor. The time needed by the JavaScript code running in the browser to convert the AJAX update data format to an HTML table was measured using a JavaScript profiler. Firefox 2.0.0.13 with Firebug 1.0 (http://www.getfirebug.com/) was used to profile the JavaScript code. Ten runs for each of the eight test queries for each of the eight data format possibilities were made, and the time required in each by the JavaScript display function was recorded. The averages of the display times (in milliseconds) for these ten runs are shown in Figure 6.

Fig6.jpg Figure 6. Average time in ms needed to convert data to HTML table Pin [PIN05] was used to collect instruction counts and thread counts. Pin is an instrumentation package that allows a Pintool, written using Pin’s API, to collect dynamic data about an application. Pin allows for a wide range of investigations of program performance evaluation and bug detection, including complete program instruction traces and execution counts. For this work, Pin was used to record a count of the total number of instructions executed for the following task: load the Safari (for Windows) browser, download the test XHTML page and associated JavaScript from the server, load that page in the browser, issue the XMLHttpRequest for AJAX update data to the server, retrieve the data

from the server, convert the AJAX update to an HTML table and display the table on the page in the
area, and close the browser. The Pintool collects the total instruction count needed to open a browser and deliver a response to the user. A modified version of the example Pintool incount2_mt (see http://rogue.colorado.edu/Pin/) was used to collect thread and instruction counts for the eight test queries using each AJAX update possibility. The Safari 3.1 for Windows browser (http://www.apple.com/safari/) was used because the underlying WebKit browser (http://webkit.org/) is open source. Future plans include modifying the WebKit browser’s code to allow the collection of additional, more detailed information on the browser’s performance. Five runs were made for each of the test possibilities. The average number of instructions over the five runs for each possibility was used to produce Figure 7.

Fig7.jpg Figure 7. Instruction counts for AJAX update possibilities.

Conclusions Table 2 and Figure 5 show that size reductions due to the use of different data formats are predictable and scalable with high confidence levels over a wide range of HTML response page sizes. The sample data shows reductions exhibiting uniform scaling over a range of HTML response sizes. This range covers a factor of more than 75 in size, from the smallest size to the largest. Table 3 shows the savings in size due to the use of different AJAX update formats and gzip. Viewed as percentages of the full HTML size (HTMLf), this data shows that responses in size up to 12KB can be reduced by 98% or better. That is, a different format for the data may produce a response that is only 2% of the original size. The largest response size shows a reduction of over 96%, to 3.5% of the original HTMLf size. Without the use of gzip, the size reductions from choosing a different data format are up to 86% (using 14% or less of the original HTMLf size) for responses up to 12KB in size, with the largest response size showing a reduction of up to 83% (using 17% of the original HTMLf size). Hence adopting AJAX with any of the update data formats studied can significantly reduce the response sizes.

CSV

JSON

XML

HTML

HSRV

372.7%

196.8%

84.6%

42.4%

FLNG

39.3%

31.0%

26.5%

16.8%

BUSA

26.0%

18.6%

15.2%

9.8%

BMKT

25.3%

16.7%

12.9%

9.0%

HIST

19.0%

11.6%

9.0%

6.6%

BIOL

15.5%

9.9%

7.6%

5.7%

ENGL

13.7%

8.8%

6.8%

5.1%

8-11

20.4%

13.2%

10.0%

6.8%

Table 5. Gzip file size as a percentage of the source file size Using gzip provides an additional reduction for all but the trivial files. Table 5 summarizes the size of the gzipped file compared to the original file. A value of 39% means the gz file is only 39% of the size of the original; hence the reduction was 60%. As seen in Table 5 the reduction provided by gzip is at least 60% and can be up to 95%. The largest files, when gzipped, can have at most 5% of the size of the original file size. The greatest reductions take place in the largest files. Hence if the size of a response is the primary consideration, then for any data format gzip can provide a reduction. Figure 6 suggests that the time used by the JavaScript code in the browser depends on the amount of work to be done; parsing XML takes more work, and hence uses more time, than does processing CSV and JSON. In all formats, the time needed to convert the gzip file for display is similar to, but slightly less than, that needed to convert the same data format not gzipped. For most of the cases examined the percentage difference here was about 1%. However, since the effort for unpacking is not accounted for by the JavaScript profiler (the JavaScript code does not perform the unpacking), there is an additional burden on the processor not shown in Figure 6. That extra effort is reflected in the instruction counts shown in Figure 7. The Pintool does show that the browser displaying the gzipped data formats used additional threads; these additional threads may account for the slightly better performance in JavaScript conversion times. Careful instrumentation inserted into the code for a browser is needed to better understand this. In general, figure 6 suggests that for small to moderate sized responses, the exact AJAX data format does not affect the time needed to convert the data very much. For the test queries, all formats performed similarly; the best time and the worst time were both within about 11% of the AJAX HTML table format time. For the moderate sized responses, the CSV format was the absolute fastest, while for the larger queries the CSVgz format was the absolute fastest. An examination of Figure 7 shows that when measured by instruction count, typically the CSV code executes the fewest instructions, followed by JSON and XML. Except for the largest files, the instruction counts are close together for a given format, within 5%. Unless the number of instructions is a critical factor, the update data formats CSV, JSON, and XML provide about the same performance. For the largest files, XML uses about 10% more instructions than does CSV, with JSON between these two, using about 5% more than does CSV.

The use of gz, however, does incur an execution penalty. Table 6 summarizes the increase in the number of instructions executed in each case when the gz data format is used. This shows that an average of 14% more instructions are needed when a gz data format is used when compared to the instructions used for the same data format when no gzipping is used. This penalty could be a factor when moving AJAX to devices with reduced processing power.

CSV

JSON

XML

HTML

HSRV

21.4%

15.4%

11.2%

16.5%

FLNG

13.9%

7.7%

18.1%

5.5%

BUSA

10.9%

11.3%

12.5%

7.5%

BMKT

13.3%

14.8%

13.9%

14.4%

HIST

14.2%

14.7%

10.8%

13.7%

BIOL

18.8%

20.8%

21.3%

21.0%

ENGL

11.6%

12.3%

8.7%

11.4%

8-11

16.4%

16.4%

13.4%

16.2%

Table 6. % increase in instructions due to use of gz One extension of this work involves modifications to the code in WebKit to allow the collection of more detailed information about instruction counts and thread usage. This may allow a more complete picture of how the browser is using the processor resource during AJAX updates. Another extension is using the PIN ARM emulator on Ubuntu to collect instruction counts from the Google ARM emulator, Android, for the test cases. This will allow analysis of the performance of AJAX on ARM architectures. Many readers will look for an answer to the question “Which data format should I use?”. As might be expected, we found no answer this question that can be applied without regard to the circumstances. With client targets of desktop or laptop systems using modern browsers and connected to high speed networks, then the use of AJAX with gzipped CSV or JSON data update format provides minimal response sizes without excessive execution penalty. If the target client

provides more limited processor resources, then the use of gzip should be carefully considered.

Bibliography [PIN05] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, Kim Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation (PLDI), Chicago, Illinois, USA, June 2005, pages 191-200. [SMU06] C.W. Smullen III and S.A. Smullen, "Modeling AJAX Application Performance", 524-074, Proceedings of Web Technologies, Applications, and Services, WTAS 2006, July 17-19, 2006, Calgary, Alberta, Canada, ed. J. T. Yao. IASTED/Acta Press, Calgary, AB, Canada; ISBN 0-88986-575-2. [SMU07] Smullen, C. and Smullen, S., "AJAX Application Server Performance", Proceedings of the IEEE SoutheastCon 2007 (CH37882), March 22-25, 2007, Richmond, Virginia, pp. 154-158; ISBN 1-880094-63-0. [SMU08] Clinton W. Smullen III and Stephanie A. Smullen, "An Experimental Study of AJAX Application Performance", pp. 30-37, JOURNAL OF SOFTWARE (JSW), ISSN:1796-217X, Volume 3, Issue3, March 2008.