. If these tags exist, the aspect ratio of this block is assumed to be not adjustable. Finally, we need to specify the alternative (ALT) for each information block. It can be obtained by computational information extraction or manual input from the author. In this paper, we will not go into the detail of text summarization algorithms. In our current implementation, if there are any header tags like or 5 DRESS-based document layout adaptation Based on the previously described DRESS, either defined by the author or automatically generated, the problem of web document layout adaptation can be better handled. In the following, we will discuss the concept of information fidelity and an algorithm to find the optimal web document layout under various constraints on display size. (2) As shown in Table 1, the importance is divided into three levels. For each block, the feature vector contains 10 spatial features and 9 content features, such as the position and size of blocks, length of text, number of hyperlinks, etc. Based on these features, learning algorithms have been used to train a model to assign importance values. Experimental results show that a Support Vector Machine (SVM) with a radial basis function (RBF) kernel can achieve the best performance with Micro-F1 79% and Micro-Accuracy 85.9%, which is quite close to that of a person. Detailed performance results can be found in [19]. After obtaining the importance values, we will normalize them so that their sum in a web page becomes 1. 4.3 Other attributes According to the definitions in Sect. 3, MPS, MPH and MPW will be calculated according to the total number of characters, height of one character, and the longest word within 5.1 Information fidelity Information fidelity introduced here is the perceptual ‘look and feel’ of a modified version of content object, a subjective comparison with the original version. The value of information fidelity ranges between 0 (lowest, all information lost) and 1 (highest, all information kept just as original). Information fidelity gives a quantitative evaluation of content representation. Hence, the optimal solution is to maximize the information fidelity which is delivered to the end user via an adapted representation. The information fidelity of an individual adapted block is decided by various parameters such as spatial region of display, content reduction of text, color depth or compression ratio of image. For a web page P consisting of several blocks, the resulting information fidelity is defined as a weighted sum of the information fidelity of all blocks in P. Straightforwardly, we can use the importance values from DRESS as the weights of contributions to the perceptual quality of the whole page. Thus, the information fidelity of an adapted result is described as Table 1 Block importance levels IF (P) = Level Description 3 2 1 The most prominent part of a page, such as headlines, main content, etc. Useful information, but not very relevant to the topic of a page, such as navigation, directory, etc.; or relevant information to the theme of a page, but not with prominent importance, such as related topics, topic index, etc. Noisy information such as ads, copyright, decoration, etc. IMPi · IF Bi (3) Bi ∈P This is used as the object function of our adaptation algorithm. In this paper, we assume that the IF value only depends on the version of content block, i.e., whether it is summarized or not. If a content block is replaced by its alternative, the IF value of this block is defined to be 0; otherwise it is 1. An adaptive web page layout structure for small devices 39 5.2 Problem definition Level 0 We introduce P as the set of unsummarized information blocks in a web page P, P ⊂ P = {B1 , B2 , . . . , B N }. Thus, our mission is to find the block set P’ that carries the largest information fidelity while meeting the display constraints. In order to ensure that all the content blocks are possible to include in the final presentation, the following constraint should be satisfied. size (ALT i ) + MPSi ≤ Area (4) Bi ∈P / where Area is the size of the target area and size (x) is a function which returns the size of display area needed by ALTi . If the constraint (4) is transformed to size (ALT i ) (5) (MPSi − size (ALT i )) ≤ Area − Bi ∈P Bi ∈P then the web layout optimization problem becomes: max IMPi · IF Bi = max Bi ∈P P Bi (MPSi − size (ALT i )) ≤ Area − Bi ∈P {B1} Level 1 Level 2 {B1, B2} {} {B1} {B2} {} Fig. 3 The binary tree used for selecting optimal block sets Bi ∈P {} IMPi subject to ∈P (6) size (ALT i ) Bi ∈P We can see that Eq. (6) is equivalent to the traditional NPcomplete 0-1 knapsack problem. Equation (5) also implies that the display size should not be too small; otherwise, we will not be able to find a valid layout even when all the blocks have been summarized. In this rare case, sub-tree summarization will become necessary. Since Eq. (4) does not ensure that the MPH or MPW will be satisfied, we solve the problem by a two-level approach. First we use a branch and bound algorithm to enumerate all possible block sets P , then for each block set, we use a capacity ratio-based slicing algorithm to test whether a valid layout can be found. By this process, we search among all possible block sets P to select the optimal one, i.e., the scheme that achieves the largest IF value while presenting an aesthetic view. 5.3 Page rendering algorithm Since problem (6) is NP-complete, it is very time-consuming to simply try possible solutions one by one. We design a branch-and-bound algorithm to select the block sets efficiently. 5.3.1 Block set selection As shown in Fig. 3, we build a binary tree for block set selection in which the root node is a null set that implies all the blocks are summarized and • Each node denotes a set of unsummarized blocks; • Each level presents the inclusion of an information block; • Each bifurcation means the choice of keeping the original contents or making summary of the block in the next level. Thus, the height of this tree is N, the total number of blocks inside the web page, and each leaf node in this tree corresponds a different possible block set P . For each node in the binary block tree, there is a limit on the IF value it can achieve among all of its sub-trees. Obviously, the lower limit is just the IF value currently achieved when none of the unchecked blocks can be added, that is, the sum of IF values of blocks included in the current configuration. And the upper limit is the addition of all IF values of those unchecked blocks after the current level, in other words, the sum of IF values of all blocks in the web page except those summarized before the current level. We perform a depth-first traversal on this tree according to following constraints: • Whenever the upper bound of a node is smaller than the best IF value currently achieved, the whole sub-tree of that node including itself will be truncated. • At the same time, for each node we check the area size constraint in Eq. (4) to verify its validity. If the constraint is broken, the node and its whole sub-tree will be truncated, because including a new block will increase the sum of MPS values. • If we arrive at a block set with an IF value larger than the current best IF value, we will check the feasibility of this solution by running the layout algorithm in Sect. 5.3.2. If this solution is feasible, we will replace the current best IF value by this one. By checking both the bound on possible IF value and the layout validity of each block set, the computation cost is greatly reduced. We also use some other techniques to reduce the time of traversal such as arranging all the blocks in decreasing order of their importance values at the beginning of the search, since in many cases only a few blocks contribute to the majority of the IF value. 40 X. Xie et al. Some web pages contain a large number of information blocks, so even the branch and bound algorithm will be slow. In this case, we will use a greedy algorithm to generate a sub-optimal result instead of trying to get the optimal solution. The greedy algorithm simply tries to add the information blocks to the block set one by one in decreasing order of their importance. If one block cannot be added to the set, it will not be considered in the future. This approach has proven to be efficient in traditional knapsack problems. In our implementation, when the number of blocks is larger than 15, the greedy algorithm will be used to speed up the computation. 5.3.2 Capacity ratio-based slicing When checking the feasibility of a block set P , we try to find an aesthetic layout to put all content blocks and summarizations in the target area. The general problem of web layout optimization is similar to floor planning in VLSI design, which is known as a NP-hard problem. However, as mentioned before, because the slicing tree structure remains constant during our adaptation process, only the slicing numbers need to be decided. We use a capacity ratio-based slicing method to deal with this problem in two steps as shown in Fig. 4. First, we go through the slicing tree in a bottom-up manner to calculate the capacity, height and width constraints for each node. Then, the slicing numbers are computed top – down. The detailed procedure of the first step is shown in Fig. 5, where the capacity, minimal height and minimal width of each node stands for its display constraints. For a summarized block, it corresponds to the values of its alternative which can be calculated automatically from the summary text. The second step compares the capacities of two sub-trees of each inner node and lets the slicing number of that node be proportional to them. This leads to the name of our algorithm, capacity ratio-based slicing. We also perform some adjustments on the slicing numbers to meet the display constraints. Figure 6 presents the algorithm details. The complexity of this layout algorithm is O(N), since both steps can be performed in O(N) time. Therefore, our capacity ratio-based slicing algorithm is fast enough to check the layout feasibility of each possible block selection. An ex- Fig. 4 Capacity ratio-based slicing algorithm Fig. 5 The algorithm for calculating the capacity, height and width constraints of each node ample is shown in Fig. 7. The MPS of the four content blocks in the page are 1000, 1000, 2500 and 2500, respectively. Suppose all of them are not summarized in the adaptation, the slicing number of the root will be: 1000/(1000 + 1000+ Fig. 6 The algorithm for assigning the final display rectangle to each node An adaptive web page layout structure for small devices 41 Fig. 7 a An example web layout with MPS labeled. b The corresponding slicing tree with slicing number labeled 2500 + 2500) = 0.14. That is to say, if the screen is 600 in height, we should split the region horizontally at the height of 600 × 0.14 = 84. This procedure is executed recursively until all the slicing numbers are decided. However, if block 1 has set its MPH to 100, then we will split the display at the height of 100 instead in order to meet this requirement. 5.3.3 Content accommodation After layout optimization, each block, whether summarized or not, has been assigned a rectangle region for display. According to the slicing algorithm, the area size of the assigned region for each node will be larger than its MPS value. The height and width of the display region will also meet the requirements of each block. We use the ADJ attribute of each information block to aid the content accommodation process. If a block is indicated to be adjustable, we will fit it into the target rectangle by zooming and wrapping. Otherwise, we will only zoom the block while maintaining its aspect ratio. Other content adaptation techniques like attention model based image adaptation [7] can also be integrated into this step. When a user follows the link of an alternative, a technique similar to the fisheye view [18] is adopted to highlight the corresponding block. We redo the adaptation process with a new constraint that this block should not be summarized. However, if no such solution is found, the user will be navigated to a new page of the original contents which is displayed using the whole target area. Note if the block is oversized, scrolling is inevitable. Therefore, we suggest huge blocks be avoided in the web design. 6 DRESS-based web browsing on small terminals In order to study the performance of our approach, two user study experiments have been carried out. Experimental results are presented and discussed in this section. 6.1 Implementation of a DRESS browser We have implemented a prototype of DRESS-based web browser With DRESS and the page rendering algorithms, Fig. 8 An example DRESS representation the browser can dynamically select and summarize proper parts of web pages, then optimize the layout of the target area. We do not try to propose new web page formats since currently HTML has already been widely used on the Internet. In our prototype, the DRESS structure is stored as comments within the HTML files or style sheets. For example, the slicing tree structure of the web page in Fig. 1 is saved as a Polish expression “((A − (B|(C|(D − E)))) − F)” where “−” and “|” denote horizontal and vertical slicing, respectively. Other attributes of the DRESS are described in the same way as shown in Fig. 8. We allow users to designate the desired target display size in the prototype. When browsing a web page, our prototype first parses the DRESS information inside, and then generates a main result HTML file that exactly fits the target display. At the same time, those summarized block contents are saved in temporary HTML files which are accessible by following the links of alternatives. These intermediary data can be saved in memory instead of files if the adaptation is performed at client side. If there is no DRESS information found in the web page, the algorithm described in Sect. 4 will be applied to generate the DRESS representation. It is worth noticing that DRESS does not require additional modification on client devices. The transformation can be done at either the content server or intermediary proxies. For example, a DRESS-enabled proxy can detect whether a web page is in DRESS format and then transform it to non-DRESS pages according to the display size of client device. To illustrate our scheme, we use the same example page in Fig. 1 whose DRESS is manually defined as in Fig. 8. Figure 9 shows the comparison results on three typical screen sizes without scrolling. As shown in Fig. 9a, c and e, when using web thumbnails on Handheld PC (640 × 240), SmartPhone (128×160), and Pocket PC (240×320), almost all the contents are hardly recognizable because of excessive down-sampling. In contrast, our solutions in Fig. 9b, d and f provide a much better experience in both content reading 42 Fig. 9 The results of DRESS-based layout adaptation compared with thumbnail views X. Xie et al. and link navigation. For instance, in Fig. 9b and f, parts of the unimportant or oversized page contents are summarized while the remaining parts are still kept readable in the original version to deliver maximum information fidelity. If a user clicks the link on the left sidebar in Fig. 9f, a new layout is generated with the sidebar expanded as shown in Fig. 9g. In Fig. 9d, the target area is too small to accommodate any unsummarized information blocks, thus all blocks are presented in summaries as a table-of-contents while the original content will be displayed to the user after the links are clicked. Note in Fig. 9h the rotation factor, which is common among mobile devices, is taken into consideration and a better result is presented accordingly with the left navigation sidebar added. 6.2 Browsing performance We carried out a user study experiment to compare the DRESS-based approach with the widely studied thumbnail method. The thumbnail method is implemented based on the work in [8]. There were six participants. They were recruited from nearby universities and all of them were familiar with web browsing and mobile devices. Each user was asked to complete sixty web browsing tasks in the same order. All the tasks were based on some pre-selected web pages and one question was given for each task. For each question, the subject tried to find the answer as quickly as possible. The web pages were selected in order to include a variety of page layouts. The users were evenly divided into two groups. One group used the DRESS-based interface first and then the thumbnail-based interface. The other group used the thumbnail-based interface first and then the DRESS-based interface. For each interface, they browsed half of the web pages. The display size was set to 240 × 320. In order to reduce any performance influence due to familiarity and experience, the subjects were first asked to try both interfaces using two sample tasks for a sufficient amount of time. All the page navigations and corresponding timestamps were recorded for later analysis. The average browsing times for each subject are shown in Fig. 10. As we expected, the DRESS-based interface was better than the thumbnail-based interface, though different people tended to devote different amounts of efforts in finishing the tasks. On average, the DRESS-based interface can reduce browsing time by 23.5%. We hope that the performance can be further improved if the DRESS structures were manually edited. The subjects said that by using DRESS, they could still read something without losing context information on small devices, while the thumbnail-based interface transformed the whole page into a small image. It is also reported that the automatically generated summaries in DRESS is not very satisfactory, so we plan to improve it in the future. An experiment was also conducted to test the efficiency of our algorithm by logging the computational time costs 43 Average browsing time(second) An adaptive web page layout structure for small devices 70 DRESS 60 Thumbnail 50 40 30 20 10 0 1 2 3 4 5 6 Subjects Fig. 10 The average browsing time for the two interfaces while making 10 layout adaptations for each web page. Our test bed was a Dell Optiplex GX 270 using a 2.8 GHz Pentium 4 processor, 512 MB of RAM, and Windows XP Professional system. In the test, we got an average DRESS generation cost of 27.5 milliseconds and an average DRESS rendering cost of 35.2 µs. The result shows that without code optimization our algorithm is already fast enough to be used in real-time web browsing systems on either web servers or proxy servers, or even on client devices. 6.3 Authoring performance Though the DRESS representation can be automatically generated, it is difficult to achieve satisfactory results due to the complexity of web pages. As shown in Fig. 11, we provide an authoring tool in the DRESS browser to let users edit the automatic results. There are two modes in the DRESS browser, one is the browsing mode and the other is the authoring mode. In the authoring mode, the user can click on any block and a dialogue will popup for editing the properties of that block. Another user study was carried out to test the authoring performance. Similar to the previous experiment, forty web pages were selected as the test data. Three subjects took part in this study. First, the web pages were transformed to the DRESS format using our automatic generation algorithm and then the subjects were asked whether they were satisfied with the Fig. 11 The authoring interface for DRESS 44 X. Xie et al. Table 2 The performance for editing DRESS representations Average time for adjusting structure (s) Satisfied with summary (%) Average time for adjusting summary (s) Subjects Satisfied with structure (%) 1 2 3 80.0 87.5 85 116.0 150.6 125.3 37.5 47.5 30.0 226.3 238.8 189.6 generated structures and summaries. If they were not satisfied with either structure or summary, they could use the authoring tool to revise the DRESS representation. All the subjects were taught to use the authoring tool for a sufficient amount of time before the experiment. The experimental results are shown in Table 2. We can see that users were more satisfied with the generated structures than the summaries. For those unsatisfactory structures, the subjects spent less than three minutes on adjusting the parameters. One of the subjects suggested that the editing procedure could be more convenient if the parameters could be edited using some visual tools. For summaries, only 38.3% of the results were satisfactory. It also cost a bit more time to revise the summary texts. In our future work, we will consider using more advanced text summarization techniques to improve the performance of the summary generation module. In general, the subjects thought that it was quite easy to understand the usage of the authoring interface. Typically, it took a few minutes to refine the DRESS representations when they are not satisfactory. 7 Conclusions In this paper, we proposed a Document REpresentation for Scalable Structures (DRESS) to help information providers make composite documents, typically web pages, scalable to display size in both logic and layout structure. By integrating techniques from computer aided design, DRESS allows easy negotiation between author and viewer, hides layout manipulation from the author but still keeps the result structure predictable, and represents contents in dynamic layouts to fit various user preferred display sizes. An automatic method for creating DRESS representations has also been discussed in the paper. Experimental results show that the DRESSbased interface is better than the thumbnail-based interface and the DRESS representation can be easily authored. Acknowledgements We would like to express our special appreciation to Xin Fan, Xiaodong Gu, Yu Chen, and Steve Lin for their insightful suggestions. References 1. Badros, G., Borning, A., Marriott, K., Stuckey, P.: Constraint cascading style sheets for the web. In: Proceedings of UIST’99, pp. 73–82. Asheville, USA (1999) 2. BickMore, T.W., Schilit, B.N.: Digestor: device-independent access to the world wide web. In: Proceedings of WWW’97, pp. 655–663. Santa Clara, USA (1997) 3. Björk, S., Holmquist, L.E., Redström, J., Bretan, I., Danielsson, R., Karlgren, J., Franzén, K.: WEST: A Web Browser for Small Terminals. In: Proceedings of UIST’99, pp. 187–196. Asheville, USA (1999) 4. Borning, A., Lin, R.K., Marriott, K.: Constraint-based document layout for the web. ACM Multimedia Syst. J. 8(3), 177–189 (2000) 5. Buyukkokten, O., Garcia-Molina, H., Paepcke, A., Winograd, T.: Power browser: efficient web browsing for PDAs. In: Proceedings of CHI’00, pp. 430–437. Hague, Netherlands (2000) 6. Chen, J.L., Zhou, B.Y., Shi, J., Zhang, H.J., Wu, Q.F.: Functionbased Object Model towards Website Adaptation. In: Proceedings of WWW’01, pp. 587–596. Hong Kong, China (2001) 7. Chen, L.Q., Xie, X., Fan, X., Ma. W.Y., Zhang, H.J., Zhou, H.Q.: A visual attention model for adapting images on small displays. ACM Multimedia Syst. J. 9(4), 353–364 (2003) 8. Chen, Y., Ma, W.Y., Zhang, H.J.: Detecting web page structure for adaptive viewing on small form factor devices. In: Proceedings of WWW’03. Budapest, Hungary (2003) 9. Cohoon, J.P., Paris, W.D.: Genetic placement. IEEE Trans. Comput. Aided Design (6), 956–964 (1987) 10. Fuchs, M.: An evolutionary approach to support web page design. In: Proceedings of 2000 Congress on Evolutionary Computation, pp. 1312–1319. Piscataway, USA (2000) 11. González, J., Rojas, I., Pomares, H., Salmerón, M., Prieto, A., Merelo, J.J.: Optimization of web newspaper layout in real time. Comput. Netw. 36(2–3), 311–321 (2001) 12. González, J., Rojas, I., Pomares, H., Salmerón, M., Merelo, J.J.: Web newspaper layout optimization using simulated annealing. IEEE Trans. Syst., Man, Cybernet. – Part B: Cybernet. 32(5), 686– 691 (2002) 13. Gu, X.D., Chen, J.L., Ma, W.Y., Chen, G.L.: Visual based content understanding towards web adaptation. In: Proceedings of 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems, pp. 164–173. Malaga, Spain (2002) 14. Han, R., Perret, V., Naghshineh, M.: WebSplitter: A unified XML framework for multi-device collaborative web browsing. In: Proceedings of CSCW’00, pp. 221–230. Philadelphia, USA (2000) 15. Hori, M., Kondoh, G., Ono, K., Hirose, S., Singhal, S.: Annotation-based web content transcoding. Comput. Netw. 33(1–6), 197–211 (2000) 16. Milic-Frayling, N., Sommerer, R.: SmartView: flexible viewing of web page contents. In: Proceedings of WWW’02. Honolulu, USA (2002) 17. Opera Small Screen Rendering. http://www.opera.com/products/ smartphone/smallscreen/ 18. Sarkar, M., Brown, M.H.: Graphical Fisheye views. Commun. ACM 37(12), 73–84 (1994) 19. Song, R., Liu, H., Wen, J.R., Ma, W.Y.: Learning block importance models for web pages. In: Proceedings of WWW’04. New York, USA (2004) 20. Suhit, G., Gail, K., David, N., Peter, G.: DOM-based content extraction of HTML documents. In: Proceedings of WWW’03. Budapest, Hungary (2003) 21. Wobbrock, J.O., Forlizzi, J., Hudson, S.E., Myers, B.A.: WebThumb: interaction techniques for small-screen browsers. In: Proceedings of UIST’02, pp. 205–208. Paris, France (2002) 22. Xie, X., Zeng, H.J., Ma, W.Y.: Enabling personalization services on the edge. In: Proceedings of ACM Multimedia 2002. Juan-lespins, France (2002) 23. Yang, Y.D., Zhang, H.J.: HTML page analysis based on visual cues. In: Proceedings of ICDAR’01. Seattle, USA (2001) Suggest Documents |