WEB LAYOUT MINING Using NLP: A PARADIGM ... - Semantic Scholar
Recommend Documents
and effective so that it can be browse easily and in less time [22]. Sports and .... example, a web designer wants to build a news website. He is a given an option ...
In this project, AQ11 and ID3 were chosen as classic data mining algorithms representative of the inductive learning tradition; their results were compared.
Jul 8, 2006 - language processing. Abstract. Web layout designing is a difficult, complex and fuzzy problem to address. The designing of a web layout.
pages for each page reference, and user registration ... The real data in the Web pages, that is, the data the ... remotehost is the remote hostname or its IP.
Speech Language Engineering System for Automatic. Generation of Web based User Forms. Imran Sarwar Bajwa. Faculty of Computer & Emerging. Sciences.
This is the case both for the navigation through a Web site and for carrying and .... advertisements that the user is likely to buy, or that the vendor prefers to sell).
Jul 17, 2015 - Page 1 .... To monitor the subsidence related to mining, observation stations are ... D-InSAR in monitoring land subsidence [15]: (1) spatial ...
Mining Web Sites Using Unsupervised Adaptive Information Extraction. Alexiei Dingli and .... project names there are no NERCs available, so we induce an ...
intelligent marketing strategies and relationship management. Web usage ... The WWW continues to grow at an amazing rate as an information gateway and ... campaigns, tracking leaving customers and find the most effective logical structure ...
Personalization can either be done via search engines such as Lycos, or by making web sites adaptive. Initial work in this area has basically focused on creating.
JOURNAL OF COMPUTING, VOLUME 1, ISSUE 1, DECEMBER 2009, ISSN: 2151-9617. HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/. 81.
viewpoint where the image was taken, to estimate the ground plane and ceiling orientations, to find vertical wall segments and even to distinguish between ...
Dec 19, 1997 - We develop a performance metric for the document layout analysis by nding the ..... ance, and justification of text-lines (left, right, center, or justified). ..... alarm). If an entity Gi matches to a number of detected entities, we c
[3] Jiawei Han and Micheline Kamber, 2001. âData. Mining: Concepts and Techniquesâ, Morgan. Kaufmann Publisher: CA, 2001. [4] R. Agrawal, Mannila, H; ...
use a novel spatial layout representation called interval en- coding, that can be .... to }~ and sÐ is the cluster center nearest to }Ð . We will call this distance ...
administrators, but also help in creating adaptive web sites. While there are many existing tools that generate fixed reports from web logs, they typically do not ...
This paper applies an empirical study on a web site, ICP-game point, .... marketer to slice and dice the data to find which item ..... recommendable game list.
association rule continuously or incrementally by computing new record counts in the source database created since last computing date. In web mining, we.
tory Project which lists many interesting pages and. Google search engine. All this should be done given that the local NIC (CRNIC) was unable to publish the.
behind corporate or personal firewalls, (3) being reachable only through ...... frontline: hunting bin laden: who is bin laden?: a biography of osama bin laden.
required for processing the text data, links, and other elements in Web pages. ... the form an HTML element, a relationship instance will then exist in the form of ..... can be represented by a set of features derived from its content, anchor text of
The research area of Semantic Web Mining is aimed at combining two fast de- ... address the current challenges of the World Wide Web (WWW): turning un-.
usage mining provides the support for the web site design, providing personalization server and other business making decision, etc. In order to better .... the pages in the local caching or the proxy servers caching without leaving any record in ...
also discusses an application of WUM , an online. Recommender System ... World Wide Web. Web usage mining provides the support for the web site design,.
WEB LAYOUT MINING Using NLP: A PARADIGM ... - Semantic Scholar
trends. The layout of web page for an educational institute is quite different from the web .... For example, a web designer wants to build a news website. He is a ...
International Journal of Egyptian Computer Society, Volume 29, Number 2
WEB LAYOUT MINING Using NLP: A PARADIGM FOR INTELLIGENT WEB LAYOUT DESIGN Imran Sarwar Bajwa1, M. Abbas Choudhary2 CISUC –Department of Informatics Engineering, University of Coimbra, 3030 Coimbra, Portugal E-mail: [email protected] 2 Higher Education Comission of Pakistan Islamabad, Pakistan. E-mail: [email protected]
Abstract:
The problem in designing of modern website projects is to produce contents according to the latest trends and styles. The common website editors just help to draw the intended layouts but the problem is to design the accurate web layout according to the demand and latest trends and style. This approach is useful when the user has a specific layout already in mind and is familiar with the web page layout principles as to what kinds of layouts are possible. It is intrinsically difficult for particularly those who have limited artistic and creative abilities to design good layout from scratch which is acceptable in every respect. An automated system is required that has ability to mine the layouts of the desired type of websites. The designed system for “Web Layout Mining (WLM)” helps to mine the most popular web-layouts from the internet database and design a weblayout that is near to acceptable and have all the marks and features of modern requirements. The designed system actually bases on a rule based algorithm which helps the user to search out some samples related to his website category and afterwards the user himself chooses a desired web-layout and designs its own one with proper implications and variations according to his own requirements.
Keywords: Web layout mining, Human Computer Interaction, Information retrieval, Natural language processing.
1. INTRODUCTION A successful webpage design is typically based upon the modern and latest web designing trends. The layout of web page for an educational institute is quite different from the web layout of a website designed for showbiz. Similarly the layout of a sports webpage is totally different from the website that has been designed for the commercial purposes [2]. Therefore, a website designer who is going to design a business, showbiz, sports, education, informative website, or website for some other general contents, he should be well aware of the current and
International Journal of Egyptian Computer Society, Volume 29, Number 2
latest trends and styles in particular discipline. A website for a an educational institute as a university or a college usually have peculiar style of having a menu bar on the tops with links such that home, faculty, admissions, research, admission, contact us, etc. Typically these website have both static and dynamic pages [5]. Such web pages don’t have so many extensive menus and don’t have so many contents. They need just manifestation of the standard of the education at their institute. Thus they need a flexible design to promote their educational policies and research activities. These website have more graphical and visual contents as compare to normal websites [1]. A person who is going to design an educational website, he should know that to highlight which contents and to prioritize which stuffing. The business and showbiz websites are mostly more colourful and attractive as compared to other websites [14]. They also have lot of menus and options and are often dynamic websites. These website highlight their contents using lot of images and colors. Due to these heavy contents in terms of size, website designers has to take care that the website layout should be so efficient and effective so that it can be browse easily and in less time [22]. Sports and newspaper websites are very simple, not very much colourful but generate the dynamic contents as updating the scorecards and hourly updating the sports, business and weather news. Some other website for general contents are also very much peculiar in there web-layout style. A web layout is basically the arrangement of the various web contents on a web page and it is highly integral and significant component of the structural design of the website. Often the websites from various perspectives have their peculiar web layouts and designs. As business websites use more user forms and reports as compared to informative website which ahs more menus and more graphics [5]. The introductory websites as of educational institutes and universities are more regular and the commercial and showbiz websites rather more irregular and informal. Showbiz and Personal web pages have more animated pictures, audio and video contents than any other website. On behalf of this differentiation each website has own set of requirements for design and development.
2. Problem statement Generally WYSIWYG (What You See Is What You Get) editors are used for designing the websites [4]. WYSIWYG editors are word processor like software, which display the page almost exactly as it will appear after publishing or printing [17]. It is very difficult for inexperienced web designers to design good-looking web page layouts especially professional level web pages. The WYSIWYG editors just help to draw the intended layouts but the problem is to design the accurate web layout according to the demand and latest trends and style [19]. It is intrinsically difficult for particularly those who have limited artistic and creative abilities to design good layout from scratch which is acceptable in every respect. 2
International Journal of Egyptian Computer Society, Volume 29, Number 2
The designed system for “Web Layout Mining (WLM): A new paradigm for Intelligent Web layout Design” helps to mine the most popular web-layouts from the internet database and design a web-layout that is near to acceptable and have all the marks and features of modern requirements. The designed system actually bases on a rule based algorithm which helps the user to search out some samples related to his website category and afterwards the user himself chooses a desired web-layout and designs its own one with proper implications and variations according to hi own requirements. This is a effective way of designing awesome WebPages with less effort in less time.
3. Intelligent Web Designing There are many fields of software engineering and purposely web designing is one of the important fields which has absolutely revolutionized and grasped the way of communication, information interchange and business styles. To design a successful and excellent website is real technical task. Web designing field comprises various aspects. A web designer has to concentrate on various aspects as the web contents, web technology, web visuals and web economics [3]. Web contents related to the actual data, facts and figures which are actually placed on a web page. These web contents provide the building blocks for the complete designing of a building that is typically a website. Web technology provides the actual functionality of a website in the variety of forms, reports, dynamic web content generation and others [15]. The core functionality of a website depends upon the particular web technology that has been comprised for its designing purpose. Web visuals are primarily related to the outlook, shape, looks and feels of a website. This is the feature which principally attracts the viewers and influences them to surf on that particular website. The web visuals may consist of static images, animated images, audio and video streams for batter and long-lasting impact of the website. In the last, the web economics contributes the economics rectifications where required [3]. The web economics helps the web surfers to perform business transactions through web. In early web designing days, websites were small, simple and static. Information was less and websites were typically specific, hence the design was easy and straightforward. Now a days data as aspects of a website have grown up to an explosive size due to advancements in technologies and requirements [6, 8, 11]. A website can be successful and excellent on the basis of various factors as its usefulness, correctness, usability and its pleasant appearance. More or less all these features are directly related to the structural design of a website. Successful and effective websites are useful to their users. A website is useful if she occupies the features of utility & usability.
3
International Journal of Egyptian Computer Society, Volume 29, Number 2
•
Utility describes the website’s functionality that a user hopefully meets his requirements and needs easily
•
Usability describes the ability to manipulate the site’s features in order to accomplish a particular goal.
•
Correctness is also a noteworthy issue. The user should find precise and related information on a particular web page.
•
Pleasant appearance of a website is main key of success or failure for a particular website. More pleasant the website is healthy chances are there for its success and usefulness.
These entire four features ultimately relate to the layout design of a website and more or less constitute toe the success factor of a website. A website may be failed due to its complex and unrealistic design [9]. Unrealistic design means that the functions provided by the websites are so confusing that a website is not functionally useful. Usable sites are easy to learn, efficient and help the user to easily and satisfactorily accomplish their task in error free manner [18]. Layout design is difficult due to its vast scope as it involves tangible and intangible factors with such high degree of vitality and subjectivity.
4. Related Work Generally, interactive software applications are used for web designing as WYSIWYG (What You See Is What You Get) interfaces based applications where the user can edit the document visually without explicitly typing HTML tags [23]. So many web designing tools such as GoLive, Frontpage, WebSphere, HomePageBuilder, and Dreamweaver are available for creating web pages [16]. This approach is useful when the user has a specific layout already in mind and is familiar with the web page layout principles as to what kinds of layouts are possible. But, WYSIWYG interfaces are not very helpful in the early stages of design because the editing process in a these interfaces does not support the quick exploration of multiple possibilities [5]. The research in visual interface layout design came into being with the advent of new visual applications as web layout and graphical user interface for computer applications. From so many examples some are UIDE [13], ADDI [14]. Various methods and techniques have been defined to address the problem of automatic web-layout generation. These interface applications typically provide the design process and also support incorporation of domainspecific preferences [3]. These applications provide the half functionality as the course of mapping the domain objects and their properties into corresponding visual properties in the layout design is left for the user. WebStyler [6] generates an actual HTML file from a simple sketch. It can help users to quickly obtain an html page corresponding to the input sketch. 4
International Journal of Egyptian Computer Society, Volume 29, Number 2
DENIM [7] is a sketch-based design tool for early stage of web design. Their user study showed the rapid sketch interface is effective for making a design. However, DENIM is designed for professional designers who can easily derive more detailed web pages from their rough sketches.
5. Methods and Materials The major emphasis of the conducted research was to first search the intended web layout and then providing the easy interface to use any one of the searched web-layout for someone’s own purpose. Designed system works like a conventional search engine that mines for appropriate web page layouts. Typically, the orthodox search engines use keywords [9] while some other search engines use various search methods as SQL and natural language based queries [10]. The user gives his query in simple natural language and designed system understands the query of the user and searches for the desired web-layouts. A list of searched web-layouts is provided to the user. User selects its desired web-layout and the WLM system extracts the only layout of the website after excluding the textual and image contents so that the user may add its own contents to personalize the web-layout. The designed WLM system works in two halves. In first half the user’s given input text is read by the system and after proper understanding and analysis the necessary information is extracted. This information is further used to draw the sample web layouts. In second half if user wants to draw the user forms automatically, those can also be designed by just providing the information about the forms as how many text boxes are required what are their names and other properties. 5.1. Mining a Web Layout The designed system, Web Layout Mining (WLM) first of all searches the desired type of web layouts. For example, a web designer wants to build a news website. He is a given an option of searching his desired web-layouts. As shown in the Figure 1.0, user writes ‘News Website” in the search toolbar. Here user can also user other standard search engines as www.google.com and www.altavista.com to search appropriate web-layouts. As shown in the Figure 1.0, various web-layouts of the news websites have been returned to the user. Each searched link has two options. First option is [open link] that is used to open the actual link of the website and second option is [personalize] and user can basically use this option to personalize a web-layout for his own website. 5.2. Personalizing a Web Layout After selecting a particular web-layout that the user wants to personalize, user clicks the website link and that particular link is opened into a new window. This new window contains only the web-layout of the selected page. 5
International Journal of Egyptian Computer Society, Volume 29, Number 2
An algorithm has been designed to extract the web-layout of the desired web-page. This algorithm has the following steps. Step 1 – Read the HTML Code of the web page. Step 2 – Find HTML tags as , , ,
, etc. Step 3 – Every character that exist outside of these tags are ignored. Step 4 – A new .html file is created which only consists of the structure of the website excluding the whole actual contents as images, text.
Figure 1.0: Automatically generated sample web-layout from user given preferences
After following these steps following is the output of the extracted web-layout.
Figure 2.0: Automatically generated sample web-layout from user given preferences
6
International Journal of Egyptian Computer Society, Volume 29, Number 2
5.3. HTML Code for Web-Layout After extracting this type of information the designed system has a vigorous ability of generating related HTML code on the base of this information. On the behalf of extracted information nested tables technique is used. For this particular example following code is generated by the system.
Text
Text
Text
Text
Text
Code -1: Automated HTML generated code
This HTML generated code is stored in a new file. The designed system is adequately flexible in analyzing the given text as in the given example the levels and modules are define horizontally (first layer and then its particular modules) and the analysis was successful. Designed system also has vigorous ability to analyze the text where layers and modules are defined vertically (layers are defined first and then modules are defined with reference of the defined layers).
6. How WLM System Works The designed system WLM first searches the desired web-layouts and then helps the user to personalize a particular web-layout. The whole designed system can be divided into two major halves as a- Searching Web-Layouts b- Personalizing Web-Layouts
7
International Journal of Egyptian Computer Society, Volume 29, Number 2
In first half the desired type of web-layouts is searched on World Wide Web and in second half the selected web-layout is personalized for own website application. Following is the detail of all the steps that are performed during the web-layout mining. The intended system based on the structural design shown in the following figure 3.0.
PE RS OL I Z I N G
S E A R C H I N G
Layout Personalized Web layout page
Personalizing Web Layout
HTML Code Generation
Making desirable changes
Selecting Web-Layouts
Extracting particular Web-layout
Type of websites, user wants to search
Searching Web-Layouts
Finding the desired web-layouts of user
Analyzing User Input Input
Figure 3.0: Structure of Automatic Web Layout Generation using Natural Language Processing Techniques
6.1. Analyzing User Input This is the first phase and it helps to acquire input text preference from the user. User provides his requirements in from of paragraphs of the text. This module reads the input text in the form characters and generates the words by concatenating those input characters. This module is the implementation of the lexical phase. Lexicons and tokens are generated in this module. 6.2. Searching Web-Layouts This phase reads the input provided by the module 1 in from of words or tokens. These words are categorized into various classes as verbs, helping verbs, nouns, pronouns, adjectives, prepositions, conjunctions, etc for the various intentions as understanding and further processing of the text. 8
International Journal of Egyptian Computer Society, Volume 29, Number 2
6.3. Selecting Web-layouts This phase particularly extracts different objects as the levels and modules of the web-layout and the layouts are determined by the
tag and module are represented by
tag. Other respective attributes are extracted on the basses of the input provided by the preceding module. 6.4. HTML Code Generation After extracting the information required to draw the particular HTML tags as
,
,
tags, the actual code is generated by this phase which actually divides the whole web-page into component boxes and these boxes are further used to add contest like text and images. 6.5. Personalizing Web Layout This is the final phase which uses the extracted information from the previous phase to actually generate a new HTML file. The HTML generated code in previous phase is embedded in this file. In response, the output is provided to the user according to his requirements.
7. Conclusion The designed system “Automatic Web Layout Generation using Natural Language Processing Techniques” was started with the aim to not only support the experts and save their time but also provide a very simple interface to novel users who are not highly skilled in designing HTML pages and are not skillful in using complex web designing software applications. The user provides his requirements and preferences using simple English text and the designed application performs the compound analysis of the given text after reading it. Desired HTML code is generated on the basis of the extracted information. A new HTML file is generated which contains the newly generated web layout. The used approach is based on a newly designed rule based framework which is highly capable of understanding the user given text and performs the desired task.
8. Future Work The designed system can be further improved in terms of its functionality as existing design is only capable of designing the web-layout. There are so many other tasks still to perform as adding contents (text, images, etc) in this web layout automatically. Furthermore, user forms are more common these days, more work done is required for automatic generation of these user forms.
9
International Journal of Egyptian Computer Society, Volume 29, Number 2
9. References [1] Nikiforos Karamanis and Hisar Maruli Manurung, 2002, Stochastic text structuring using the principle of continuity, Proceedings of the Second International Conference on Natural Language Generation (INLG-2002), Ramapo Mountains, NY [2] Imran S. Bajwa, M. Asif Naeem, Riaz-Ul-Amin, M A. Choudhary, Speech Language Processing Interface for Object-Oriented Application Design using a Rule-based Framework, 4th International Conference on Computer Applications 2006 Rangoon, Myanmar [3] A.R. Ahmad, O. Basir, K. Hassanein, “Intelligent Expert System for Decision Support in the Layout Design”, Working Paper, Systems Design Engineering, University of Waterloo, 2004. [4] Pant G., Srinivasan P., Menczer F.: Crawling the Web. In M. Levene and A. Poulovassilis, editors: Web Dynamics, Springer-Verlag (2004). [5] Yasunari Hashimoto1 Takeo Igarashi, "Retrieving Web Page Layouts using Sketches to Support Example-based Web Design" Proceedings of EUROGRAPHICS Workshop on Sketch-Based Interfaces and Modeling (2005). [6] Lin J., Newman M. W., Hong J. I., Landay J.A.: Denim, “Finding a Tighter Fit Between Tools and Practice for Web Site Design”. In CHI Letters: Human Factors in Computing Systems, 2, 1(2000), 510-517. [7] Hearst M.A., Gross M.D., Landay J.A., Stahovich T.E, “Sketching Intelligent Systems”. IEEE Intelligent Systems, 13, 3(1998), 10-19. [8] A. R. Ahmad, O.Basir, K.Hassanein, “Fuzzy Inferencing in the Web Page Layout Design”, Proc. of the 1st Workshop on Web Services: Modeling, Architec. & Infrastructure, France, pp. 33-41, April 2003 [9] HU W.C., CHEN Y.: An Overview of World Wide Web Search Technologies. In Proc. of 5th World MultiConference on System, Cybernetics and Informatics, (2001). [10] Florescu D., Levy A., Mendelzon A, “Database Techniques for the World-Wide Web A Survey”. SIGMOD Record, 27, 3(1998), 59-74. [11] K.A. Dowsland, S. Vaid, W.B. Dowsland, “An algorithm for polygon placement using a bottom-left strategy”, Euro J of Op Res., Vol. 141 (Special issue on cutting and packing), pp. 371-381, 2002 [12] Henderson, James Merlo, Paola Petroff, Ivan Schneider, Gerold (2002): "Using syntactic analysis to increase efficiency in visualising text collections". In: Tseng, Shu-Chuan (ed.): Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). Taipei, Taiwan: 335-341. [13] J. Foley, W. Kim, S. Kovacevic, and K. Murray, “UIDE-An Intelligent User Interface Design Environment”, In J W Sullivan and S.W. Taylor (Eds.), Intelligent User Interface, ACM, NY, 1991 [14] Gould J.D., Lewis C, “Designing for Usability: Key Principles and What Designers Think.” Communications of the ACM, 28, 3(1985), 300-311. [15] Google: http://www.google.com/ [16] Hu W.C., Chen Y, “An Overview of World Wide Web Search Technologies”, In Proc. of 5th World MultiConference on System, Cybernetics and Informatics, (2001) [18] Hjaltason G., Samet H.: Contractive Embedding Methods for Similarity Searching in Metric Spaces. Technical Report TR-4102, Computer Science Department, Univ. of Malyland, (2000). [19] Ivory M., Hearst M., Sinha R, “Empirically Validated Web Page Design Metrics”, ACM SIGCHI’01 Conference: Human Factors in Computing Systems, (2001) 53-60. [20] Lee S.Y., Hsu F.J.: 2D C-String: A New Spatial Knowledge Representation for Image Database Systems. Pattern Recognition, 23, 10(1990), 1077-1087.
10
International Journal of Egyptian Computer Society, Volume 29, Number 2
[21] Petrakis E.G.M., Faloutsos C., Lin K.L.: ImageMap: An Image Indexing Method Based on Spatial Similarity. IEEE Transactions on Knowledge and Data Engineering, 14, 5(2002). [22] Petrakis E.G.M., Orphanoudakis S.C.A.: Generalized Approach to Image Indexing and Retrieval Based on 2-D Strings. Intelligent Image Database Systems, World Scientific, (1996), 197-218. [23] Rui Y., Huang T.S., Chang S.F.: Image retrieval current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 10 (1999), 39-62. [24] A.R. Ahmad, O. Basir, K. Hassanein, “Efficient Placement Heuristics for Ge netic Algorithm based Layout Optimization”, Working Paper, Systems Design Engineering, University of Waterloo, 2003.