Extending HTTP Models to Web 2.0 Applications: The Case of Social ...

1 downloads 894 Views 180KB Size Report
several traffic characteristics of the Facebook SN service. Finally, Section V ..... “news feed” (i.e, the page resuming in real-time the changes happening within the SN), .... http://www.facebook.com/press/info.php?statistics. (Accessed Sept. ... [12] World Wide Web Consortium (W3C), ”The WebSocket API”, Septem- ber 2011 ...
2011 Fourth IEEE International Conference on Utility and Cloud Computing

Extending HTTP Models to Web 2.0 Applications: the case of Social Networks Luca Caviglione Institute of Intelligent Systems for Automation (ISSIA) - Genoa Branch Italian National Research Council (CNR) Via de Marini 6, I-16149 Genova, Italy e-mail: [email protected]

Abstract—Owing to an explosive growth, Web 2.0 technologies are now widespread. Moreover, they have been increasingly mixed with the cloud model to produce highly sophisticated services, also with an increased social flavor. In this perspective, Social Networks (SNs) are a paradigmatic example, allowing to share user-generated contents, while assuring a high degree of interactivity. Such facets introduce new usage patterns reflecting in new features in the produced HTTP traffic. Therefore, additional modeling is needed, since standard HTTP behaviors could not easily capture new trends. This paper introduces an extension to standard HTTP behavioral models, by considering new elements generated by SN applications. To prove the correctness of the proposed approach, a traffic characterization of one of the most popular SN is presented.

the Asynchronous JavaScript and XML (AJAX) one, allowing items to be dynamically updated in users’ web browsers in a near real-time manner. Inevitably, to support this content rich and interactive vision, web pages have to mutate accordingly. A typical web page is composed by several objects, which have to be retrieved to compose it entirely. Two types of objects exist: i) the main object containing the HTML document and ii) in-line objects(s) that are those linked within the hypertext. Web 2.0 highly impacts on in-line objects, which can embed additional services, for instance to support audio and video streaming, or complete tools for audio and video conferencing. Consequently, such mutations reflect into new traffic patterns [4]. Specifically, reference [5] estimates that for the HTTP traffic observed in 2007, the size of in-line objects is greater than the one of the main object. Alas, in-line objects are highly specialized, and each one has a set of well-defined peculiarities, thus it is hard to converge to an ultimate model. As an example, reference [6] investigates the impact of the additional traffic produced by in-line objects embedding video sources within a web page. Further difficulties are due to the presence of mashups, which are entities generated by dynamically retrieving and merging contents from different remote providers (see, as a possible example, reference [7]). Therefore, we focus on changes in HTTP traffic when adopted for accessing SN-oriented applications and how to update classic behavioral model to capture such new features. To evaluate the correctness of the proposed extension, and to give a realistic traffic characterization, we selected as an archetype, Facebook (http://www.facebook.com), which has more than 750 million active users [8] (see, as other successful services, LinkedIn, and the new Google+). We point out that, at the author’s best knowledge, specific behaviors of SNs have never been investigated, measured or modeled, even if some preliminary works about AJAX-generated patterns have been already presented in reference [9]. In essence, the contribution of this work is: i) to extend classical HTTP behavioral models to capture specific features of SNs; ii) to perform a preliminary traffic analysis of SN applications; iii) to comprehend their importance in terms of features to support cloud services, also in the perspective of characterizing their network behaviors. As an additional contribution, the model we present differs from previous ones, since we introduce a new layer. This also guarantees enough

I. I NTRODUCTION Web 2.0 technologies enable to create and share contents with an increased “social” connectivity. They has been adopted to produce a new-wave of Web-based applications, also to support the more complex user requirements. We mention, among the others: i) Virtual Control Rooms (VCRs) for the access to and control of computing services and infrastructures, or remote instrumentation; ii) highly interactive Graphical User Interfaces (GUIs), locally operated from web browsers, to remotely issue commands and route back feedbacks for Software as a Service (SaaS) platforms [1], and iii) additional features to enrich the Web, e.g., by adding video streaming capabilities to standard Web pages. Furthermore, Web 2.0 aims at pushing the interactivity of pages and among users, becoming fundamental for the establishment of an Internet of People. Besides, Web 2.0 and cloud computing have been increasingly mixed to produce highly sophisticated services, also in the purpose of supporting an increasingly connected user population [2]. To better comprehend such a “technological coalesce”, a possible synecdoche is represented by Social Networks (SNs), which are the paradigmatic example of how the whole process of the utilization of the Web has changed. Furthermore, they are experiencing a terrific explosion, also according to their integration within legacy services, such as websites, blogs, wikis, and search engines. Mostly, SNs rely upon user-generated contents (e.g., photos, text and videos) while allowing continuous interaction among participants [3]. This has been made possible by exploiting approaches not presented in the original Web vision, such as 978-0-7695-4592-9/11 $26.00 © 2011 IEEE DOI 10.1109/UCC.2011.60

361

generality for a proper tweaking to model other applications sharing the same underlying technologies and functionalities. The remainder of the paper is structured as follows: Section II deals with classic HTTP models and also investigate their limits in capturing features of SN applications, while Section III introduces their extension. To prove the correctness of the proposed model, Section IV showcases an analysis of several traffic characteristics of the Facebook SN service. Finally, Section V concludes the paper and deals with future developments.

Summing up, the aforementioned techniques reduce the precision of classical models, specifically they decrease the accuracy of the OFF states, where data is assumed to be absent. We recall that such paradigms highly differ from the classical Web one, where data is exchanged through page-bypage patterns. Even if periodically refreshed pages have been considered from years, this behavior does not suffice to capture the more complex nature of Web 2.0. In fact, also earlier measures on the HTTP traffic do not reveal a one-to-one correspondence between requests and pages [13]. To cope with this more complex scenario, the basic entity for modeling HTTP has shifted from page to web-request: this assumption is especially suited to reflect the more interactive nature of Web 2.0 applications.

II. C LASSIC HTTP M ODELS AND N EW B EHAVIORS As said, SNs are characterized by a high degree of interactivity and an intrinsic sharing of information among participants, which connect each others on a relationship basis (e.g., friendship or business partnership). Such platforms are mostly accessed through web browsers, even if on mobile or handheld devices, ad-hoc client interfaces are available. Then, linked users are prompted about changes happening in their network of contacts (e.g., when new elements have been published). Direct interaction is also possible, typically via dedicated mails or Instant Messaging (IM) services. As a result, a user constantly receives and delivers stimuli, carried by HTTP traffic, even when in idle or reading state. The most widely adopted HTTP traffic models implement a source with two states, i.e., ON and OFF. The ON state represents the request and the consequent download of objects, while the OFF one denotes an interval of inactivity (see, e.g., references [5] and [10]). The alternation of ON/OFF periods reflects in a given distribution of the viewing time. However, in a SN, to achieve “responsiveness”, data is delivered to users despite such timings. To this aim, a possible solution is to exploit the aforementioned AJAX (via the XMLHttpRequest JavaScript object) to have a constant exchange of information between the browser and the server. Other techniques are grouped under the Comet hypernym, and try to avoid the limitations imposed by the classical page-bypage web model. Essentially, they are based upon a persistent (or long-held) HTTP connection enabling the web server to send additional data to the browser, without the need of further HTTP requests. A quite standard development approach is to have two connections between the browser and the web server hosting the SN-based service. One connection is triggered by user actions, while the other conveys information to perform real-time updates. This can reduce the overall performances, since the HTTP 1.1 protocol specification [11] affirms that a browser should not have more than two simultaneous connections with a web server. Such limit can cause the browser to indefinitely wait for having the “clearance” to begin a new connection, resulting in being blocked from sending new requests. The drawback is usually eluded by creating fictitious hostnames for a given web server (i.e., aliases). To prove the importance of supporting a continuous data exchange, HTML5 will support natively a communication channel via WebSocket [12]. Then, a standardized solution will make the workaround unnecessary. At the same time, future advancements such as the WebSocket make the availability of proper models as a mandatory requirement.

III. E XTENDING HTTP M ODEL TO SN To extended classical HTTP traffic models (see, references [5] and [10]) to better capture the essence of a SN, we introduce a four layer approach, with a new behavioral entity defined looping, as to emphasize its autonomous and repetitive characteristics. Figure 1 depicts the overall behavioral model. Web Request

Viewing ON

t OFF Session Page

HTML in-line object 1

Object

in-line object 2 in-line object N

Looping

Fig. 1.

Overview of the behavioral model of a typical SN application.

In details, the session layer models ON and OFF states of two consecutive web-requests. As soon as a request is issued to the server, the main object is sent back. At this level of abstraction, the basic unit is the page. Next, the HTML code is parsed and all the in-line objects are retrieved. In Figure 1 we do assume the page composed by N objects. We point out that how objects are retrieved is influenced by the specific version of HTTP (e.g., HTTP 1.0 allows multiple requests to be sent over the same connection, while HTTP 1.1 uses pipelining through persistent connections). Such layers are sufficient to model classic Web traffic, since when the “page” is complete, the user switches to OFF. To take into account features of a SN, we introduce the looping layer, which is an in-line object for modeling the continuous data exchange between HTTP endpoints to update the status of the SN (e.g., a “friend” posts a new item or a new message arrives). Specifically, it recovers to the lack of accuracy of the OFF state, since it models the “latent” traffic produced in an autonomous way. According to the specific

362

TABLE I AVERAGES OF THE HTTP TRAFFIC PRODUCED BY THE LOOPING ELEMENT.

application, the looping object fetches updates, which can be composed by images, videos and text. In addition, it can also contain information to notify running processes, such as IM, or widgets to provide feedbacks to users. As depicted in Figure 1, HTTP traffic is produced according to the specific behavior of the looping entity, and it can be modeled again via the first three layers. When the user requests another page, the process is iterated. We underline that: i) if a page does not contain looping elements, a standard model suffices and ii) for the sake of simplicity, the looping object starts at the beginning of the OFF state, but it can be arbitrarily placed within the timeline. A. Limits and Strengths of the Model The heterogeneity of SNs, both in terms of features and technologies, imposes to rely on a “general” approach, which may require additional tuning. For instance, the produced traffic could vary according to the number of individuals linked within a user’s network. In fact, in the majority of SN applications, the traffic can be influenced by the number of individuals linked within a user’s network (i.e, more “friends” reflects in more objects to be retrieved), even if a specific implementation can mitigate such a dependency. As a possible example, “friends” not displayed in the browser could not account for additional traffic for displaying contents (which are carried through additional in-line objects), but could require “signaling” to update widgets displaying notifications. Then, to be accurate, the looping layer should need some additional “behavioral” parameters (e.g., the average number of known contacts), as well as the design of a suitable function γ ∈ Γ, i.e., v˜ = γ(u), where, v˜ is the produced volumes (e.g., in terms of in-line objects) and u is the number of users composing a user’s SN. This is part of our ongoing research, and an interesting intersection among engineering and social sciences [14]. Additionally, as it will be presented in Section IV, the proposed approach has been tested only against a well-defined set of features of SN applications (e.g., aspects related to GPS localization have been neglected). Therefore, a rigorous characterization for a broader set of applications is mandatory to effectively capture the complex nature of the Web 2.0. At the same time, the model is general enough to be adapted to other services relying upon objects producing autonomously traffic during OFF periods.

Request Type

No.

% Relative

% Total

GET POST Total

1300 74 1374

94.61 5.39 100

49.62

Response Type

No.

% Relative

% Total

200 OK 340 Not Modified 408 Req. Timeout Total

1341 25 22 1388

96.61 1.80 1.59 100

Other

No.

% Relative

% Total

-

7

100

0.25

50.13

O(t) = C(t) + LK (t − KT ) for K = 1, 2, 3, . . ., where, O(t) is the overall traffic, C(t) is the “classic” HTTP contribution (as observed in early works, e.g., see reference [10] and references therein) and LK (t − KT ) is the one generated by looping with a “dutycycle” of T . Quantities are intended as throughputs in kbyte s , at the time t. Interested readers can find its preliminary validation for generic Web 2.0 applications, as well some measurements in satellite environments in reference [17]. We underline that the proposed form is quite general, thus it can be used with a variety of Web 2.0 services, e.g., to model Web-based video platforms, by injecting the behaviors of the in-line object retrieving the video within the looping. IV. T RAFFIC C HARACTERISTICS In this Section we present some characteristics of the traffic produced by the looping object exploited by Facebook during OFF periods. The data-set is composed by 6 hour long sessions, collected daily for 3 months. To conduct tests, we prepared an ad-hoc profile linked against 1,000 “friends” and subscribed to 150 groups. Initially, let us analyze average values of the HTTP traffic generated during OFF periods. A session is composed by 2,769 HTTP PDUs, and the detailed breakdown of the composition of the collected HTTP traffic is summarized in Table I. We highlight that 1,374 requests and 1,388 responses are performed during OFF periods, accounting for an exchanged HTTP traffic volume of 1.88 Mbytes. Also, we remark that during such periods users are supposed to not produce traffic, e.g., they are viewing the page or they have the focus over another application. Figure 2 depicts the Cumulative Density Function (CDF) of the size of HTTP PDUs, as well as the 99% confidence bounds. The average packet size is equal to 680 bytes, and the resulting HTTP traffic is not predominantly composed by tiny packets, that when in presence of a large user population can impact over the network infrastructure [15]. This is even more important if users access the service through IEEE 802.11 connectivity [16].

B. Development of an Analytical Model Behavioral models are proven to be very effective to represent HTTP traffic and to make performance evaluation campaigns through synthetic traffic generators. At the same time, developing an analytical model could be beneficial, e.g., to easily perform numerical simulations. As said, SNs are highly heterogeneous and developing an unique analytical model could be difficult. A possible general form, to roughly take into account the presence of a looping element, partially borrowed from reference [17], is:

363

0

10

6

12

10

−1

10

News Feed

8 −2

10

Energy

Cumulative Probability

x 10

Buddy List

6

−3

10

Presence Update

4

AJAX HTTP PDU CDF Confidence Bounds

−4

10

1600

1400

1200

2

1000 800 600 HTTP PDU Size [byte]

400

200

0

0

Fig. 2. Cumulative Density Function (CDF) of the size of HTTP PDUs generated by the looping layer (the cumulative probability is in log format).

0

0.01

0.02

0.03 0.04 Frequency [Hz]

0.05

0.06

0.07

Fig. 4. Power Spectral Density (PSD) plot of the traffic load generated by the looping layer through the AJAX-based paradigm. TABLE II D ISTRIBUTION OF TCP PDU SIZES OF THE LOOPING - RELATED TRAFFIC LOAD .

12

Ajax−generated

Throughput [Mbit/s]

10

Size [bytes]

N. of Packets

%

8

40 - 79

5,987

51.51

80 - 159

81

0.70

6

160 - 319

599

5.15

320 - 639

1,000

8.60

640 - 1,279

988

8.50

1,280 - 2,560

2,968

25.54

4

2

0

0

Fig. 3.

1800

3600

5400 Time [s]

7200

9000

three major peaks of energy represent data belonging to the “news feed” (i.e, the page resuming in real-time the changes happening within the SN), the buddy list and the presence information (e.g., if the user is on-line, and the browser is idle), respectively. The PSD can be used: i) to tune the “dutycycle” of the looping element, ii) for application identification purposes, especially when privacy is a concern, iii) to have a scalable techniques when due to high loads approaches a-l`a Deep Packet Inspection (DPI) are impeded, and iv) for security purposes (see, Section IV-E).

10800

Throughput of the traffic generated by the looping layer.

A. Throughput and Volumes Figure 3 portraits the throughput of all the data sent and received by the looping layer. The concurrent retrieving of objects allow high transmission rates (in our testbed, clients access the Internet through dedicated 10 Mbit/s links). Besides, peaks are due to the presence of large amounts of items, such as images or video previews published concurrently by several users composing the network. As regards exchanged volumes, Table II showcases the distribution of the TCP PDUs carrying data triggered by looping-generated activities, and the average size is 554.81 bytes. The overall number of packets is 11,623 and half of them are acknowledgements (ACK), while the total exchanged volume is 6.45 Mbytes.

C. A Simple Correction to Better Model OFF states We briefly introduce a rough approximation of the impact of the looping over OFF states. Specifically, a rough approximation would be properly increasing the traffic generated during ON periods, to take into account data produced during OFF ones. As an example, by assuming the traffic as uniformly distributed, the impact of looping can be regarded as addictive f low , i.e., i) an additional volume in terms constants as per hour of ∼ 1.07 Mbytes of TCP transfers distributed within 1, 931 segments, and ii) ∼ 0.31 Mbytes of HTTP traffic, resulting in 466 HTTP requests.

B. Periodicity and Predictability As shown, the continuous and regular “polling” performed by the looping entity may trigger data transfers via TCP, or HTTP, e.g., information about on-line users, and to update proper widgets or IM services. To quantify the periodicity of such actions, Figure 4 presents the Power Spectral Density (PSD) plot of the overall traffic produced by looping. The

D. Impact on the CPU usage The presence of the looping element also accounts for additional CPU usage. To proper quantify such overhead, we

364

TABLE III AVERAGE CPU USAGE OVERHEAD OF DIFFERENT BROWSERS TO OPERATE THE LOOPING OBJECT. Browser

CPU Usage Overhead%

Firefox

21.87

Opera

25.38

Chrome

15.08

performance evaluation through simulations. Nevertheless, SN applications are often accessed through mobile devices, also by using ad-hoc client-interfaces. A part of our ongoing research is devoted to understand the behavior of SNs over mobile devices, also quantifying if ad-hoc applications account for additional unknown patterns. Lastly, better quantify relations among the number of users and the resulting load is also part of our ongoing research activities. R EFERENCES

performed a set of trials with different browsers. Collected values are reported in Table III. On the average, the presence of a looping element reflects in an overhead in the CPU usage. For instance, when using Firefox to access the service, the looping add a Δ = +21.877% respect to using the same service with looping functionalities disabled. Even if this is not a huge overhead, it must be taken into account for producing realistic models. Especially, when performing investigation aimed at quantifying power consumptions, e.g., for green-networking [18]. However, such characterization considers the looping as a monolithic entity. As a possible improvement, future work will aim at relating different looping-based services (as reported in Section IV-B) with their own CPU usages.

[1] E. Knorr, “Software as a Service: The Next Big Thing”, Infoworld, March 2006, available on-line: http://www.infoworld.com/article/06/03/20/76103 12FEsaas 1.html (Accessed Sept. 2011). [2] P. Banerjee, R. Friedrich, C. Bash, P. Goldsack, B. A. Huberman, J. Manley, C. Patel, P. Ranganathan, A. Veitch, “Everything as a Service: Powering the New Information Economy”, IEEE Computer, Vol. 44, No. 3, pp. 36 - 43, March 2011. [3] A. C. Weaver, B. B. Morrison, “Social Networking”, IEEE Computer, Vol. 41, No. 2, pp. 97-100, Feb. 2008. [4] W. Li, A. W. Moore, M. Canini, “Classifying HTTP Traffic in the New Age”, Extended Abstract for SIGCOMM’08 Poster. [5] J. J. Lee, M. Gupta, “A new Traffic Model for Current user Web Browsing Behavior”, Intel, Santa Clara, CA, USA, 2007. [6] Y. Chen, W. Lei, X. Zhang, “Traffic Model for HTTP Video Page”, Proceedings of the 3rd International Conference in Communications and Networking in China (ChinaCom’08), Hangzhou, China, Aug. 2008, pp. 432 - 436. [7] J. Zhang, M. Karim, K. Akula, R. K. R. Ariga, “Design and Development of a University-Oriented Personalizable Web 2.0 Mashup Portal”, IEEE International Conference on Web Services (ICWS 08), pp. 417424, Beijing, China, Sept. 2008. [8] Facebook usage statistics, available on-line: http://www.facebook.com/press/info.php?statistics (Accessed Sept. 2011). [9] F. Schneider, S. Agarwal, T. Alpcan, A. Feldmann, “The new Web: Characterizing AJAX Traffic”, Proceedings of the 9th International Conference on Passive and Active Network Measurement (PAM’08), pp. 31-40. [10] P. M. Crovella, “Generating Representative Web Workloads for Network and Server Performance Evaluation”, Proc. of the 1998 ACM SIGMETRICS joint Internat. Conf. on Measurement and Modeling of Computer Systems, WI, USA, June 1998, pp. 151-160. [11] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, “Hypertext Transfer Protocol - HTTP/1.1”, RFC 2616, Network Working Group, IETF, June 1999. [12] World Wide Web Consortium (W3C), ”The WebSocket API”, September 2011, available on-line: http://dev.w3.org/html5/websockets/. [13] H.-K. Choi, J. O. Limb, “A behavioral model of Web traffic”, Proceedings of the 7th International Conference of Network Protocols (ICNP ’99), Toronto, Canada, Oct. - Nov. 1999, pp. 327 - 334. [14] L. Backstrom, D. Huttenlocher, J. Kleinberg, X. Lan, “Group Formation in Large Social Networks: Membership, Growth, and Evolution”, Proceedings of 12th International Conference on Knowledge Discovery in Data Mining, New York, USA, 2006, pp. 44 - 54. [15] C. Partridge, P. P. Carvey, E. Burgess, et al., “A Fifty-Gb/s IP router”, IEEE/ACM Transactions on Networking, vol. 6, no. 3, pp. 237-245, June 1998. [16] L. Caviglione, “Traffic Analysis of an Internet Online Game Accessed Via a Wireless LAN”, IEEE Communications Letters, Vol. 10, No. 8, Oct. 2006, pp. 698 - 700. [17] L. Caviglione, “Can Satellites Face Trends? The Case of Web 2.0”, Proceedings of the International Workshop on Satellite and Space Communications (IWSSC’09), Siena, Italy, Sept. 2009. [18] A. Bianzino, C. Chaudet, D. Rossi, J. Rougier, “A Survey of Green Networking Research”, IEEE Communications Surveys & Tutorials, No. 99, pp. 1-18. [19] G. Lawton, “Web 2.0 Creates Security Challenges”, IEEE Computer, Vol. 40, No. 10, pp. 13-16, Oct. 2007.

E. Security Considerations As already presented in literature, Web 2.0 applications spawn security risks, particularly due to their increased interactivity via scripts, and data retrieval possibly from unverified sources [19]. Obviously, being funded on such technological pool, SN applications inherit all the aforementioned issues. Specifically, the dynamic nature of the generated content, accounts for injection-based attacks. The most popular one are the: Cross-Site Scripting (XSS) and Cross-Site Request Forgery (CSRF). Therefore, when employing a looping-based model, security implications must be taken into account. On the contrary, the presence of looping can also help to secure an application. In fact, the repetitive nature of its traffic patterns can be used to perform traffic identification and to reveal anomalies or detect possible misbehaviors. Lastly, we also performed a traffic investigation by accessing the service through the Hypertext Transfer Protocol Secure (HTTPS). Apart the minor overhead introduced by the security mechanisms adopted by the protocol itsels, we did not notice any relevant difference in terms of the presented behaviors. V. C ONCLUSIONS AND F UTURE W ORK In this paper we extended standard HTTP models to better reflect the more complex nature of the Web 2.0, especially by considering SN applications. Besides, we introduced a new behavioral entity defined as looping, which take into account the autonomous and repetitive characteristics of logic devoted to assure prompt updates and interactivity among users participating into a SN. Also, a traffic characterization of a popular social platform, i.e., Facebook, has been showcased. Future works aim at refining the looping behavior, also in the perspective of defining and validating an analytical model to produce synthetic traffic generators and to allow 365

Suggest Documents