Hidden Challenges on Web Application Design Eric Freeman
Katia Passos Cerise Wuthrich Catherine Stringfellow Department of Computer Science Midwestern State University Wichita Falls, TX 76308
Topic: Internet and Web-Based Systems
Corresponding author:
Catherine Stringfellow Associate Professor Department of Computer Science Midwestern State University 3410 Taft Blvd Wichita Falls, TX 76308 Phone: (940) 397-4578 Fax: (940) 397-4442 e-mail:
[email protected]
Hidden Challenges on Web Application Design Eric Freeman
Katia Passos Cerise Wuthrich Catherine Stringfellow Department of Computer Science Midwestern State University Wichita Falls, TX 76308
Abstract The evolution of data communication technology has increased Internet traffic with negative effects on the user’s response time. An existing mechanism providing such an improvement is Web caching, in which intermediary systems are used to temporarily store information that may be retrieved by multiple users or by a same user multiple times. Software engineers, designing applications for this new environment, depend on techniques that emphasize the satisfaction of the users’ requirements and the availability of computing resources on the processing site. In this paper, caching problems related to Web applications and heterogeneous network systems are presented. Basic design procedures are proposed in order to avoid those problems and examples of applying those concepts are shown, demonstrating their usefulness and influence in the system performance.
Introduction The evolution of data communication technology, with new hardware devices and enterprise applications, has increased Internet traffic with negative effects on the user’s response time. The added involvement of wireless networks has worsened the problem. Wireless networks due to their open communication medium are subject to high and unwanted interference. The growing network traffic further increases the interference and thus increases error rates, retransmission rates, bandwidth consumption and data retrieval times. Prolonged data retrieval times frustrate the user and require more intervention of the digital industry in the improvement of the network performance. An existing mechanism providing such an improvement is Web caching, in which intermediary systems are used to temporarily store information that may be retrieved by multiple users or by a same user multiple times. However, ignoring the existence of
1
Web caches at the application design stage may introduce critical errors in the behavior of designed system.
An analysis of a set of interactive Web applications will show that designers prefer using scripts to assemble Web pages after dynamically evaluating a user’ s request. However, this design approach results in dynamic Web pages that are not usually cached, and consequently do not benefit from the performance improvement provided by the cache [2]. Many studies have been conducted in this area focusing on the cache systems and developing algorithms to decide when to cache a dynamic page [6]. In this paper, a discussion on how heterogeneous systems and Web caching may affect the behavior of the application is presented, as well as basic ideas on how to design a Web application in such a way to avoid undesirable actions performed by those hidden resources.
Various architectures have been proposed for caching dynamic pages that also delete pages with invalid content. One such architecture is known as CachePortal [6]. Basically, this architecture proposes a front-end dynamic-web-content cache. This cache is kept current and fresh with the use of two modules, a sniffer and an invalidator, which interact with the web server, the application server, and the database management system.
The sniffer collects
information about user requests and database queries. Then the invalidator intelligently sends messages to the cache to discard stale cached pages that have been affected by changes in the database.
Software engineers, designing applications for the new Web environment, depend on techniques that emphasize the satisfaction of the users’ requirements and the availability of computing resources on the processing site. For example, Fernandez et al. provide a declarative query language to specify a site’ s content and a simple template language to specify a site’ s HTML representation [10]. They discuss the scripting language with which web sites are implemented and mention development tasks for web sites that include accessing data and developing a site’ s content, navigational links, and visual presentation.
2
New extensions to software engineering methods can model the processing site as a distributed environment that allows the designer to anticipate the use of communication networks and multiple systems. Bouguettaya et al. discuss a dynamic structure and system for describing data sharing [4]. Buhr explains how Use Case Maps (UCMs) can simplify complicated systems by allowing designers to view the entire system without getting bogged down in details. He also introduces initial information on the network medium scale as failures and time out actions [5]. Karunanithy et al. discuss a scenario where they provide personal information for mobile users with an assortment of end-user devices [12]. Mori, Paterno and Santoro discuss a tool that will help developers with different ways of presenting their tasks in a very detailed manner and to replicate their dynamic behavior [14]. This tool will also offer a complete area of support for the making and reviewing of joint applications, which then can better the design of interactive software applications. They also discuss an example of multiusers being able to access common artifacts. However, such techniques focus on systems that are easily identified either by the user or by the designer. Network devices that are transparent to the users, such as web caches and firewalls, are usually ignored in the design process and not considered as resources that can improve or downgrade the performance of the application.
Other studies in software engineering focus on application development and also do not consider the Web caching influence on the behavior of the application. Bernardi and Santucci discuss a definition of Web based hierarchical model libraries by using an object-oriented architecture. They also discuss different methods for developing reusable designs [3]. Cloyd describes her company’ s way of changing to use-driven processes and considers human factor methods [8]. Constantine and Lockwood discuss mostly user interface design and usability issues from software engineering that can be used in Web applications [9]. Hendrickson and Fowler discuss how the web involves more user interaction and hosts to a new generation of systems [11]. Knight and Dai suggest using an object-oriented approach to developing web applications, which provides easier testing, debugging and modifying of code [13]. Pertriu, Shousha, and Jalnapurkar discuss the development of a Layered Queueing Network from an UML description and its performance under different loads of a telecommunication product [17]. Petriu and Woodside discuss sources of weakness in an analysis that includes lack of knowledge and other aspects using model building from software design products [18].
3
The results of this study affect the way that applications are written in the implementation phase of the software project. The proposed technique has the benefit of preventing unknown behavior of the application when implemented in different systems. However, it has a slight impact in software performance. In order to identify such problems and establish the basis for the new coding rules, the next section presents a brief description of web caching and web application scripts, followed by a discussion on problems identified with the implementation of interactive web applications in a wireless system. The subsequent section reports experimental data collected during this study. The paper ends with a summary of the concepts and the observations presented in this study.
Background Web caching is commonly used to reduce the latency a Web user perceives when requesting a page on the Internet. One of the most common forms of Web caching has been to place a cache component in the user’ s system. Other choices are at institutional, regional and higher levels, as represented in figure 1. In a caching structure similar to the one in figure 1, the Web browser maintains in the host hardware a copy of the most recently visited Web pages. When a user requests information from a Web site, the browser attempts to fulfill the request from the pages located within its cache. When the request cannot be fulfilled from the local cache, a request to the remote host is initiated.
To avoid having users from a same organization initiate duplicate requests for the same Web page, a cache is also often placed at the point where an organization’ s computer network connects to the Internet, usually a firewall system. This helps to reduce the number of outgoing requests, reducing the user’ s response time and the data traffic on the external network.
A potential problem that must be considered at all levels of caching is the quality of the information of a stored Web page [1, 19]. If document X is potentially valid for several hours before it must be updated, then caching X will improve the overall system performance without having any negative consequence. However, if an object Y is more likely to have a short life
4
span and be meaningless if not frequently updated, caching Y could send invalid responses to the user accessing that information.
regional cache
Firewall (institutional cache)
server Browser (local cache)
Figure 1. Diagram showing possible placement of Web caches.
The hypertext transfer protocol (HTTP), the communication protocol used to access Web pages, defines cache management parameters that describe how to properly handle information located on a page, for example, if a page is ‘cacheable’ or not. Currently, some of the most common and significant parameters allow the applications to specify if a page is cacheable, the length of time the information can be kept in a cache, if the page should be cached only at the user’ s local cache, or not cached at all.
Applications that interact with the user in a way that generates new information, such as a flight reservation system or an online bookstore, are usually implemented by using scripting languages and database access. In this study, the scripting language chosen for the testing application was PHP (PHP Hypertext Preprocessor). PHP is considered to be the most popular Web application server [20]. PHP is a server-side scripting language, which means that the Web server, instead of the client, processes the code and the browser shows the result of that processing. PHP is used to define that access to the database and to build the web page to be sent as response to the user’ s request. The response web page is coded in HTML for the more 5
traditional Wired Web and eventually in wireless markup language (WML) when communicating with wireless devices such as a cell phone [7, 21]. A simple example of WML code is shown in figure 2.
For this particular study, the Web server was required to have the MySQL database software installed and running. By placing all the necessary information in the database, data accessed by the applications used in the experiments could be viewed through HTML Web pages or WML decks, thus reducing redundant efforts.
Select a Test
Figure 2. WML commands
Web application challenges Web caching is usually transparent to the user and to the application designer, except for the possible improvement in response time. The application designer, when planning the development of a system, usually will not have enough information to judge if a web cache is involved. Also, if this developer is not knowledgeable in network protocols, he or she will focus on the application functionality, i.e., the interface between the scripting language and the database and the assembly of the pre-defined response pages. In order to understand the consequences of ignoring the information on web caching, assume the following scenarios exemplifying the use of interactive applications:
6
Case 1 – A cell phone user arrives at the airport and finds out that her flight was cancelled. By accessing the Internet through the cell phone, such a user can immediately reschedule her flight (as seen in some TV commercials). In order to do that, the user needs to find a different flight, check for seat availability and make a reservation. A second passenger, with the same problem, borrows her cell phone and without disconnecting tries to re-schedule his flight to the same flight that she got the new reservation. A typical system will provide an answer for the second user that shows at least one less seat in the availability chart of the new flight (the seat reserved by the first user). However, if the caching system resident in the cell phone decides to store dynamic web pages, the second user may end up getting exactly the same answers as the first user, which would imply that both reserved the same seat (also assumed here is that no special dynamic caching technique was used to invalidate the pages after they were stored).
Case 2 – A student takes an online multiple-choice test and when verifying the score, finds out that one of the answers in the test key was wrong. Under these conditions, the student got a score of 85 points when it should really be 90. The student complains with the faculty member responsible for the test who immediately corrects the answer key in the database. A second student comes in and takes the same test in the same machine. By coincidence, this student answers exactly the same choices as the previous student. Even considering that the database with the answer key has been corrected, the student still gets an 85 point score. It is clear then that the machine had cached the previous responses from the server and was unable to distinguish between the two users.
When developing the application, the designer may not be able to obtain enough information to predict these situations.
However, a simple solution exists to avoid such a
problem. The web caching systems store and identify pages by their URL. The caching system, when receiving a URL, should be able to identify it as a dynamic page, if the URL includes any form of parameter passing. For example,
http://www.hotelselect.com/res/html/HInfo?hotel=FL555&sd=7
7
is a modified URL from an actual request for a hotel reservation system. If the browser is unable to recognize this URL as a dynamic request, then it will retrieve stored pages that were cached earlier with the same URL, introducing an error in the information generated by the system as described in the two case scenarios presented earlier.
Users, as well as developers, may look at solutions that investigate all the current characteristics of the network structure in which the system will be used. Such a solution is not sufficient to eliminate the possibility of errors, since future changes in the network may introduce new variables in this process. Another possibility is to resize the local cache to zero in all machines using the system. This would still leave a chance of future changes in the cache size. In addition, it would not solve the problem, if another cache existed in the path between the local system and the application server. This last option would also downgrade the performance of the Web browser for other applications that do not depend on dynamic pages.
A simple solution, which would work according to the HTTP definition, is the use of HTTP header parameters to specify the dynamic responses as non-cacheable. While this is probably the best approach, it still assumes that any cache system will be correctly configured, working under the expected HTTP standards. A more trivial solution would be to add to the list of parameters the time and date of the request, making each request unique with respect to the cache screening mechanism. The server side script would disregard the date/time information, if not necessary to the application, and proceed by preparing and sending a new answer. Unfortunately, this solution increases the amount of data being transmitted by the system. Other solutions would require a better software engineering methodology, which could result in a design that would take this problem into consideration, however that is beyond the scope of this paper.
Experiments
In order to verify the possibility of these catastrophic errors, an online wireless testing application was developed, involving several steps. First of all, a series of WML decks were written introducing the user to a school’ s home page and then linking to individual faculty
8
member’s pages. From a faculty Web page, the online testing application could be accessed through a hyperlink.
Next, the MySQL database was created with tables containing all
information about the tests including name of the test, objectives covered, questions, answer choices, and feedback responses. The WML code was written with a PHP script embedded into it, connecting it to the MySQL database to prompt the user to select which test to take.
A
portion of the experimental output on a generic cell phone is shown in figure 3, while a snapshot of the system running in a PDA is shown in figure 4. This application allows a student to take multiple-choice tests. The questions are displayed one at a time, along with answer choices.
Figure 3. Display of question with answer choices (Image courtesy Openwave Systems Inc.).
Each time the student answered a question, the answer was saved and the next question along with answer choices was retrieved. The application continued to display questions until it was determined that all questions for a particular test had been presented. At that time, the application showed each question number, the student’ s response to the question, and either a smiley face or a sad face. If the student answered incorrectly, a brief feedback response was presented as seen in figure 5. After all answer choices and feedbacks have been displayed, the student’ s test score was shown. The application was tested using the simulator embedded in OpenwaveTM SDK [15, 16]. This simulator assumes that any page is cacheable and therefore, it will retrieve from the cache dynamic pages that passed exactly the same parameters. During the experiment, the main
9
concern was if and how the various pages would be cached. When the simulator was initially started and a particular test with 20 questions was selected and taken, there were 25 server hits.
Figure 4. Testing system implemented in a PDA.
Figure 5. Display of feedback (Image courtesy Openwave Systems Inc.).
While testing the application, the home URL was set to the web site with the testing application. After taking the test once, the home button was selected and the same test was taken again with exactly the same answers. During this instance of test taking, there were 0 server hits or 100% cache hits. To further examine the caching process with the simulator, the test was taken once again, supplying all correct answers (score of 100) and without disconnecting before
10
taking the test a second time. Before taking the test the second time, the database in the Web server was updated and the correct answer choice on question 1 was changed.
Taking the test
again with the same 20 answers as the first time should result in a score of 95. Actual results show it still scored 100, because it used a page in the cache left over from the previous time the test was taken.
The same experiment was conducted using the simulator provided with Nokia Mobile Browser version 4.0. Though it was more difficult to determine when pages were retrieved from the server or from local cache, it appeared that the only time the Nokia Browser fetched pages from the local cache was upon using the Back button. When the simulation involving changing the database in between tests was done, there was no wrong answer since the Nokia Browser had not cached any pages with dynamic content.
In order to improve the test with the Openwave browser, meta tags were added to the header portion of the WML code specifying a zero value for maximum age, as seen in figure 6. This action was equivalent to assigning the page as non-cacheable. The experiment was conducted again and during the second test, all pages were retrieved from the server even though the answer list was identical.
Figure 6. HTTP header. Simulations were also conducted in a Palm m515, seen in figure 4. Unfortunately, there were many problems with browser simulators for Palms. There was only one, the Expedition WAP browser, which would work on both the simulator and the actual Palm device in its first try. There was no method for examining cache transactions on the simulator, but it was obvious when either the simulator or Palm made a server hit, because a timing icon appeared during the wait period. From this observation, the Expedition browser did not seem to cache any of the pages with dynamic content.
11
Another PDA software tested (after solving some installation problems) was the Neomar WAP browser. Unfortunately, there was no easy way again to verify the use of the local cache system except by measuring the response time. In this case, it also seemed that the browser always retrieved pages from the server.
These experiments helped to confirm the uncertain behavior of the caching systems. The current research in caching dynamic pages also suggests that future changes will be implemented in the systems and the developer will not be able to prepare for them. The use of unique parameters in the page requests seems to be the more stable and reliable solution at this point and worked well in experimental trials.
Summary The increasing use of the Internet and the World Wide Web has caused the proliferation of Web applications, which depend on data communications. The consequent heavy traffic in the network increases the user’ s response time. One of the solutions for this problem is the use of Web caching. As with other network solutions, such as security firewalls, web caching is usually transparent to the user and to the application developer. This paper shows that web caching systems may interfere with the behavior of an application and introduce invalid information to the user. Two simple mechanisms already available in the network protocols allowed the developer to guarantee the correctness, in any circumstance, of the behavior of the application with respect to web caching. Unfortunately those solutions contribute to increase the network traffic load. Additional study is now being conducted to allow the identification of new solutions to this problem and to evaluate the cost benefit of adopting those solutions.
Acknowledgements This work was partially supported by the Texas Advanced Research Program under Grant No. 003656-0108b-2001. Openwave and the Openwave logo are trademarks of Openwave Systems Inc. All rights reserved.
12
References
1
M. Arlitt, L. Cherkasova, J. Dilley, R. Friedrich and T. Jin, “Evaluating Content Management Techniques for Web Proxy Caches,” ACM SIGMETRICS Performance Evaluation Review, Vol. 27, No. 4, March 2000, pp. 3-11.
2
G. Barish and K. Obraczka, “World Wide Web Caching: Trends and Techniques,” IEEE Communications, Vol. 38, No. 5, May 2000, pp. 178-184.
3
F. Bernardi and J. F. Santucci, “Model Design Using Hierarchical Web-Based Libraries,” Proceedings of the Design Automation Conference, New Orleans, June 2002, pp. 14-17.
4
A. Bouguettaya, B. Benatallah, L. Hendra, M. Ouzzani and J. Beard, “Supporting Dynamic Interactions among Web-Based Information Sources,” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 5, May 2000, pp. 779-801.
5
R. J. A. Buhr, “ Use Case Maps as Architectural Entities for Complex Systems,” IEEE Transaction on Software Engineering, Vol. 24, No. 12, December 1998, pp. 1131-1155.
6
K. S. Candan, W.-S. Li, Q. Luo, W.-P. Hsiung, and D. Agrawal, “ Enabling Dynamic Content Caching for Database-Driven Web Sites,” Proceedings of the ACM SIGMOD 2001, Santa Barbara, CA, Vol. 30, No. 2, May, 2001, pp. 532 - 543.
7
E. Castro, HTML for the World Wide Web, 4th Ed., Peachpit Press, Berkeley, CA, 2000.
8
M. H. Cloyd, “ Designing User-Centered Web Applications in Web Time,” IEEE Software, Vol. 18, No.1, January/February 2001, pp. 62-69.
9
L. L. Constantine and L. A. D. Lockwood, “ Usage-Centered Engineering for Web Applications,” IEEE Software, Vol. 19, No. 2, March/April 2002, pp. 42-50.
10
M. Fernandez, D. Florescu, A. Levy and D. Suciu, “ Declarative Specification of Web Sites with Strudel,” in the Very Large Data Base Journal, Vol. 9, No. 1, 2000, pp. 38-55.
11
E. Hendrickson and M. Fowler, “ The Software Engineering of Internet Software,” IEEE Software, Vol. 19, No.2, March/April 2002, pp.23-24.
12
K. Karunanithi, K. Haneef, B. Cordioli, A. Umar and R. Jain, “ Building Flexible Mobile Applications
for
Next
Generation
Enterprises,”
Proceedings
of
the
IEEE
Academia/Industry Working Conference on Research Challenges, Buffalo, NY, April 2000, pp.127-132.
13
13
A. Knight and N. Dai, “ Objects and the Web,” IEEE Software, Vol. 19, No. 2, March/April 2002, pp.51-59.
14
G. Mori, F. Paterno and C. Santoro, “ CTTE: Support for Developing and Analyzing Task Models for Interactive System Design,” IEEE Transactions on Software Engineering, Vol. 28, No. 8, 2002, pp. 797-813.
15
Openwave Mobile Access Gateway Edition, Openwave Systems, Inc., Redwood City, CA, Feb. 2002, pub. DSMAG-R5.1-004.
16
Openwave SDK WAP Edition 5.0, Openwave Systems, Inc., Redwood City, CA, Sep, 2001, pub. DSSDK-R1-001.
17
D. Petriu, C. Shousha and A Jalhapurkar, “ Architecture Based Performance Analysis Applied to a Telecommunication System,” IEEE Transactions on Software Engineering, Vol. 26, No. 11, November 2000, pp.1049-1065.
18
D. Petriu and M. Woodside, “ Analyzing Software Requirements Specifications for Performance,” Proceedings of the Workshop on Software and Performance WOSP ’02, Rome, Italy, July 2002.
19
L. Rizzo and L Vicisano, “ Replacement Policies for a Proxy Cache” IEEE/ACM Transactions on Networking, Vol. 8, No. 2, April 2000, pp. 158-170.
20
A. Royappa, “ The PHP Web Application Server,” The Journal of Computing in Small Colleges, Vol. 17, No.4, March 2000, pp. 201-211.
21
Wireless Application Protocol WAP 2.0. Technical White Paper, WAP Forum, Jan. 2002.
14