Database Support for Multi-media Information in Web Based Applications Ammar Benabdelkader, Hamideh Afsarmanesh, and L.O. Hertzberger University of Amsterdam, WINS - Department of Computer Science, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Phone: +31 20 525 75 33, Fax: +31 20 525 74 90 E-Mail: {ammar, hamideh, bob}@wins.uva.nl
ABSTRACT: Storing, organizing, accessing, and maintaining distributed information (documents) for specific Web applications is a difficult and error-prone task if many critical information is not easily and directly accessible at run time, but rather generated information is embedded in various static HTML resources. In this paper we introduce a new approach addressing these problems along with others such as database versioning and document indexing. Database is used to keep track of the updated information at different times to justify some decisions taken in the past, while document indexing is a mean of acceleration of the search operation and making it more efficient. The system described in this paper is a two-fold approach based on (1) the use of an object-oriented database beyond Web applications for better support of Web document evolution and dynamic and flexible user interfaces and (2) the development of several user-friendly interfaces through which users can easily and efficiently find multi-media information of interest from the World Wide Web. Keywords: World Wide Web, software engineering, multi-media data, document indexing, database support for distributed Web applications.
1. Introduction The World Wide Web applications based on the Hypertext Transfer Protocol (HTTP) have become the main information exchange service by extending older services like ftp, e-mail, telnet, and other distributed applications [PED 98, AFS 98]. Depending on the application domain, Web tools may have the scope of individual pages, complex static Web sites, or distributed Web applications consisting of many sites. Based on the nature of the application, Web tools may support static publication-oriented hypertext, database centric systems [DOK 98, GEL 97], or highly dynamic and interactive Web sites. Static hypertext applications are characterized by both static links and static pages, which are difficult to build, to maintain, and to keep up to date. Database centric systems, are characterized by both dynamic page creation and dynamic link structure. Finally, regarding dynamic and extensible Web applications, more advanced tools may support different stages in the Web application life cycle. Regarding dynamic and extensible system, clients or users within a Web-based marketing applications For instance, require to be able to efficiently navigate through different Web sites and documents in order to: - Check and update the information easily (add market research reports, or modify company profiles in case they are bought or get new funding, add new products, etc.). - Browse and visualize different kinds of information. For instance, a client wants to know which company makes the NokiaLD51 product. An easy way for him to do that can be a direct search for it into the product Cell Phone, and find out who makes it, what it costs, etc.
The client must be able to click on a link of a the product picture and see the specifications, click on the company name in order to see its profile and check how much sales the company did last year. This paper describe an example of a marketing Web application in which multi-media information about company characteristics and product manufacturing is gathered from different sources (Web sites, reports, etc.), classified, organized, and stored in a database, so it can be easily, efficiently, and dynamically accessed. Further, the development of two main categories of tools to maintain and access this large heterogeneous database is addressed in this paper. Namely, (1) the Database Administrator interface (DBA) to directly manage and maintain the database schema and information and (2) three Web-based interfaces to provide the user with the right information he/she is looking for in a consistent way and supporting him/her in accomplishing his/her activities. The remainder of this paper is organized as follows. In section 2, we will discuss distributed Web application characteristics and requirements, presenting an overview of different kinds of information and end-user requirements for such application. In section 3, we will present the database design for the distributed Web application including its database versioning and document indexing features. Furthermore, aspects regarding the choice of the database system and the major characteristics of a database supporting Web applications are enumerated. The server architecture design and implementation including a strategy definition using the available server site scripting programs are described and presented in section 4. In section 5, we will present the database administrator interface, the three developed Web interfaces, and the main characteristics and requirements that must be considered in building user-friendly Web interfaces. Section 6 describes the status of our work and the plan for future research and developments.
2. Web Application Characteristics and application domain Requirements In marketing application domain for instance, the main part of the repository content for each Web application consists of data about the company that is candidate for acquisition (factual information about the company like business results, assets, patent, etc.). Figure 1 represents an example of a simplified repository for a company, the most important element in the application structure is the reference (document) that contains the information, these references are of different kind and they may include: - Document (word, postscript, text, etc.) - Tables and reports (excel, front page, etc.) - HTML pages - Java applets - Images (JPG, GIF, BMP, icons, FIG, EPS, etc.) - Audio (wave, Real Audio, MPG, etc.) - Video (QT, VDO, MPEG, etc.) - Animation (AVI, GIF, etc.) - Application (exe, bat, Java Applets, etc.) A good system must represent the confluence of various trends in computer science and technology and users specifications. The Web technology enables modeling concepts in ways that are semantically meaningful to developers and users. This enables a flexible and abstract interface to the system. Although these technologies are used in many problem domains, they must in particular be applicable to different application domains and properly support end user
requirements with multimedia and/or complex data management. As such, the system must support: - Large amount of heterogeneous data support, easily accessible and understandable - A friendly Web user-interfaces - Dynamic Web pages generation (depending on the user request) - Document search facilities - Short response time for on-line requests - Security access (each client access is limited to the section of the site relevant to his/her interest) - System openness that supports different application domains - Keeping track of the updated information (always be visible when the information was last updated, so we can make sure information stays current). WebStore Company - Description - Tel/Fax - Address - Etc. Context
Business Model
Problem statement ra, mp3, rts, wav, etc.
Executive Summary Doc, pdf, html, txt, etc.
Product Road Map Doc, pdf, html, txt, etc.
Product Cost Doc, pdf, html, txt, etc.
Fab. Decision
Doc, pdf, html, txt, etc.
Market Application Areas Market Forecast & timings Supporting Market Data Market Strategy
Products Technology Assessment Intellectual property positions Cost of Goods 3rd Party Solution
Channels OEM’S VAR’S Retail
People and structures Proposed Organization Business Analysis Discussion & Risks
Competitors Competitors
Doc, pdf, html, txt, etc.
Conclusions & Recommendations Next Steps Doc, pdf, html, txt, etc.
Figure 1: Sample Data Classification for a Company
3. Database Support for Web Applications Due to the requirements of today’s Web applications enumerated above in Section 2, we have identified that there is a need to use a database system behind the Web server to better support the various application domain activities. Using a Web-deployed database with a well-designed user interfaces enable user to search large amounts of information in a consistent way. Today’s Web applications lack dynamism and flexibility because they do not deploy a database as a back-end that support that features, even most of the applications that use a database as a back-end, usually
they do not take full advantage of using a database in distributed Web applications. In most common situations, a database system is used as a storage facility for textual data. More than that, databases for instance, can incorporate/integrate indexing facilities, versioning features, flexibility and dynamism capabilities, and keyword management. It can be also augmented and improved with external tools to automatically manage the data (e.g. create the index, generate keyword, assure versioning) and uses the relevant information (e.g. report generation). Following are the characteristics representing the major benefits gained from the approach that will be considered for the system design and implementation: - Support both application’s versatility (dynamism) and extensibility, data can be updated very frequently (e.g. new document, new applications, and new links). - Assure Dynamic Web pages generation on the fly based on the user request - Keep track of the updated document by exploiting the database versioning facilities - Databases add convenience to maintenance procedures - Databases enable users to find exactly what they are looking for, using intelligent and comprehensive search features, while automating the site updates. 3.1 Database Design The dynamism and flexibility of the system we are building mostly depend on the database design and how open it is to support several application domains with different structures and different sizes. As described in Section 2, information in Web applications is stored as a set of inter-linked documents of different kinds grouped by domain of interest. In order to support the necessary requirements for an efficient distributed Web environment that can be applied in different activities, such as, organizing and accessing data, short response time for on-line requests, and high data loading rates, there is a real need to design a database that supports documents of different types (Text, HTML, Word, Excel, PDF, Images, Audio, Video, etc.) as well as a good set of relationships between the different information’s pieces within the same application in order to assure navigation between inter-linked elements. After the analysis and the examination of the data requested and handled in distributed Web applications, we identified the need for a set of classes and a set of relationships that must be defined between these classes. The database classes and relationships defined within this section for the Web application database design are comprehensive enough to support any kind of Web applications with minor changes. The Document type, for example, is a Company Application Application_Name general reference that can represent Company_Name any kind of Web or non-web Company_Address Company_Description information such as an html Company_Tel Company_Fax document, an image, an audio stream, Etc. an excel table, etc. The Category type Document of Application [0,1] Category of Application [0,1] serves as a directory to hold and Application Documents [0,n] Application Categories [1,n] organize different information by subject of interest (e.g. organization Category Document Document of Category [0,1] and structure, manufacturing, Document_Name Category_Name Document _Location products). The Company concept Categ_Location Document _Type Category Documents [0,n] Etc. Document _Size however, consists of a set of global Document _Keywords Cration_Date information about the application Category SubSets [0,n] Modification_Date Document_Content definition, the Company description, SubSet of Category [0,1] and the parameter’s set up. Figure 2: Base Schema definition for the Web application
3.1.1 Classes Figure 2 represents a comprehensive database design for distributed Web applications, in which a set of classes and relationships are defined in order to make the three classes working together. However, for the sake of simplicity, the examples presented here only show the most important classes, other classes are also included in the database design such as the Log File class for keeping a track of all document updates and the Frequent Asked Questions (FAQ) class for users questions support. Following is the description of the database classes: 1- Company: contains general and global information about the Company and its applications domain such as company name, application description, application parameters, etc. 2- Category (item): this class will be used to define all the company application categories for the information description, which may, in turn consist of a set of categories and/or a list of documents grouped by domain of interest. 3- Document: describes all the information about each document/reference stored in the database (e.g. name, type, size, location, etc.). A non-limited list may include: - Document name - Document keywords: used by the user during the search process to get the most relevant and desired information based on some specific keywords for document content. - Document location: can be used by the system to make a direct link to the local/remote document source, not the document itself but a launch to where the document is stored. - Document type: having the document type stored in the database strongly helps during the query processing without checking document content. Namely, within the user interface query processing, where the user can queries based on the document type, or during the Web pages presentation in which, a small representative icon can be displayed with the document based on its type in order to give the user a better understanding on the document. - Document size: gives the user an idea about the size of the document before he goes ahead and processes it. - Document creation date: to give the user the date and the time when the document was created. - Document last update date: to give the user an overview about recently updated documents 3.1.2 Relationships Relationships link objects together to form a network. For example, the categories and documents objects are connected together and form a network of objects. With a database system (Section 3.4) the user can determine the documents in a category, and also identify whether a given category is a component of another category. It also allows you to go back through the inverse links, automatically established by the database system. This capability offers several advantages: - It provides fast navigational access. - It allows constraints and actions to be associated with the link in either direction. - It is possible, for example, to specify that a ’document’ object is a component of at least one other ’category’ object. - It always maintains the inverse link. But as you have defined this inverse relationship, you can associate it with all the characteristics that can be specified for a relationship. - It provides referential integrity. If you delete the document object, the database system has all the information it needs to update the category and Application objects.
In order to make the three classes defined above working together we have defined a set of relationships between them: 1- Application Categories [1,n]: relationship that links a company application to its immediate sub-categories organized by domain of interest. As described in Figure 1, the company application WebStore for instance, has a link to {context, market application area, products, intellectual property, competitors, etc.} as Company Application Categories 2- Category of Application [0,1]: relationship that links a category to the company application it belongs to (e.g. “Products” Category of Company Application “WebStore”). 3- Application Documents [0,n]: relationship that links a company application to the immediate document it contains (e.g. “WebStore” Company Documents {readme.txt, philps.html, philips.gif, etc.}). These are documents that contain general information about the application or about the company itself such as a short introduction, a logo image, and so on. 4- Document of Application [0,1]: relationship that links a document to the company it belongs to (e.g. “logo.gif” Document of Company “WebStore”). 5- Category Documents [0,n]: relationship that links a category (domain) to the list of documents it contains (e.g. “context” Category Documents {context.doc, context.pdf, etc.}) 6- Document of Category [0,1]: relationship that links a document to the Category it belongs to (e.g. “context.doc” Document of Category “Context”). 7- Category Subsets [0,n]: relationship that links a global Category to the list of sub Categories it contains (e.g “context” Category SubSets {Problem statement, executive summary}) 8- SubSet of Category [0,1]: relationship that links a sub Category to the supper Category it belongs to (e.g. “Problem statement” SubSet of Category “Context”) The definition of these kinds of relationships between the different company items make the querying process more dynamic and extensible to support different kinds of user requests at different levels. Based on these relationships, the user can easily navigate through a large amount of information and get the information he/she desires linked together automatically: 1- Having a company name, the system is able to get all the company categories (sub categories), documents, links, etc. 2- Having a category name, the system is able to get all its sub-categories, to which company it belongs, and what kind of documents it contains. 3- Having a document name, the system is able to get all the information about the document (type, size, location, etc.), for which company it was created, and what kind of information it contains. 3.2 Document Indexing In popular search tools, the search engine is driven by a robot or spider, which is a program that visit all the documents and records information about them into the database. The information they record to describe a document is usually of two forms. Some robots simply grab the title of the document and any text that appears on the document. Other robots use a more sophisticated technique that involves gleaning the description and search keywords for a site embedded in the head of the html document using tags [SAT 97]. The recorded information, for each document, will be used as an index for the database in order to better support search facilities and short time response to user requests. Our approach for document indexing uses a sophisticated three-step technique. As described below step 1 covers most of the indexing techniques supported by current indexing software tools (e.g. Yahoo, Lycos). Step 2 adds more convenience to the database indexing mechanism by
integrating and using at run-time external indexing tools for non-textual documents. Step 3 however; allows the user to manually improve the document indexing keyword lists for which the format does not support commercial/non-commercial indexing facilities (e.g. some audio, video, and image format): 1- Automatically creates the first list of indexing keywords. Namely, a list of keyword for each document at this level consists of the document description, document title, document type/size, document ownership, etc. this keyword list can serve as a base for the developed search engine described in Section 5.3. 2- To better improve the search mechanism and response time for the developed user interfaces, different external commercial and non-commercial indexing tools may be used at run time to perform a keywords list extension for each document. Several indexing software tools can be used depending on the type of the document (text, word, PDF, images, audio, video, etc.). 3- Gives the administrator of the application the possibility to manually maintain and update the list of keyword of any document in the database through the database interface. This holds true especially for the type of documents that are not supported by any indexing software (e.g. some audio, video, and image format). The list of keyword will be created for each new or updated document by automatically checking the size and/or the last update date of the document. However, in order to avoid a wiping of the user keywords upgrade described in point 3 above, the automatic keyword process as described above in points 1 and 2 will be only activated for the new considered documents (references). 3.3 Matisse Database system Regarding the various and promised features supported by Matisse database system (version 4.0), we have chosen Matisse Database system as the database support for the work described in this paper. Some of the supported features that can indeed help developing the distributed Web applications include [MTS-a 98, MTS-b 98, MTS-c 98, MTS-d 98, and MTS-e 98]: - I/O Parallelism and Copy Semantics: The database Server provides high-end parallelism for multimedia streaming and large databases with a large number of users. The database Server scales linearly as new CPUs or new disks are added. Objects are not updated in place, a new version of an object can be written to any available disk, with optimal load balancing across the disks. - Collect Versions: The collect version mechanism runs automatically to reclaim disk space. It preserves the most recent version and the versions, which have been explicitly saved. - Transaction Model and Concurrency Control: Concurrency control is enforced by read or writes database locks. The locking granularity is at the sub-object level, as the database Server locks separately the relationship part of an object and the attribute part of an object. However, transactions are not affected by version access queries, the later can run concurrently without locking. - Temporal Features (historical database versioning): The database Server intrinsic Versioning is the key underlying technology that differentiates it from other storage management systems. Intrinsic Versioning is the automatic generation and control of object versions. When the value of an object changes, a new copy of the object is created, rather than updating its current version. The database can be queried consistently as of time, without affecting the current transaction processing and without locking any data. - Disk Fault Tolerance: MATISSE Server provides disk fault tolerance through its page level replication facility. When there is a disk failure, the database remains on-line and the MATISSE Server automatically replicates the missing data pages on the remaining disks.
-
Object oriented database (ODMG compliant, ODBC driver, ODL, etc.) Distributed system (transaction management, security access, etc.) Supports large amount of data with high speed for data storage and data retrieval (till 20 M/s) Several programming interfaces: C/C++, Java binding, SQL queries, ODBC driver, etc. Access control, SQL extensions, text search extensions, Multi-media streaming, Automatic version collect, etc.
4. Web Interface Architecture In this work, the structure of the Web is used as a guide for the search activity. Our system is made up of several components, which are grouped in three categories of system activity, HTML, HTTP, and Internet Server-Side: (1) The HTML code that defines the way the user sees the program interface, and it is responsible for collecting user input. It is the window through which the user interacts with the Internet Server-Side program. (2) The HTTP is the transport mechanism for sending data between the Server-Side program and the user HTML form. And (3) the Internet Server-Side program is responsible for understanding both the HTTP directions and the user requests, the Server-Side program takes the requests from the user and sends back valid and useful responses to the Web client who is clicking away on the HTML Web page. These categories of activities have to work closely together to make the Internet application working on-line [HER 96].
Client Machine
Internet or Intranet
ASP
HTML input form
HTTP request Dynamic web Page displayed by web browser
Server-Side Machine
HTTP response
CGI IDC
HTML page
DB
Figure 3: System Architecture Description
4.1.1 HTML Forms Web forms enable a reader to return information to a Web server for some actions. For example, it is possible to formulate and execute an optimized query from a set of keywords and some checked boxes collected information in order to generate and send back some information to people who request it. The HTML form has become the method of choice for sending data across the network because it allows to set up a user interface using input tags. With the HTML form it is possible to set up input windows, pull-down menus, checkboxes, radio buttons, and more with very little effort. In addition, the data from all these various data-entry methods is formatted automatically and sent for the user when he/she uses the html form. The process of incoming data is usually handled by a script or program written in C, Perl, or another language that manipulates text files and information. The forms themselves are not hard
to code. They follow the same constructs as other HTML tags. What could be difficult is the program or script that takes the information submitted in a form and processes it. 4.1.2 HTTP Server Hypertext transfer protocol (HTTP) is the high-level protocol (sitting on top of the low-level TCP/IP protocol) that Web browsers and servers use to pass Web pages between them. 4.1.3 Internet Server-Side Program Although, HTML by itself makes nice windows, but to do anything more than looking pretty requires programming and that programming must understand the Internet Server-side environment. Before the Server-Side scripting program is initiated, the Web server has already created a special processing environment for this program in which to operate. That environment includes translating the entire incoming HTTP request header into environment variables [LIT 97] that the server-side program can use for all kind of valuable information. In addition to system information, like the current date, there is information about who is calling the scripting program, where the program is being called from, and possibly even state information that helps keeping track of a single Web visitor’s actions (a state information is anything that keeps track of what the program has done the last time it was called). Three general types of server-side programs are used for the current implementation: the Common Gateway Interface programs (CGI), the Internet Database Connector (IDC), and the Active Server Pages (ASP) [LIT 97]. For the implementation purpose of the user interfaces, we intend to combine different technologies within the same application in order to support Web application requirements, as described in section 2. Based on the detailed study of each task within the application we decide what technology to use, since these different technologies can be combined and used together. The strategy followed for the user interfaces development uses the Internet Database Connector (IDC - Section 4.1.3.1) as the first choice, because it is a fast and efficient way to link the database to the Web server (Log Record, select list, profile, etc.). In case of the user needs a more flexible environment where he/she can freely specify various inputs, the application uses the Active Server Page (ASP - Section 4.1.3.2) mechanism since it is flexible and also Web database dedicated (e.g. Document Search, parameter for user interfaces). In case of other limitations such as missing some database connectivity components (e.g. OLE, ODBC driver), the application uses the Common Gateway Interface (CGI - Section 4.1.3.3) scripting that supports a very flexible programming environment. This strategy assures the combination of both efficiency and short response time. 4.1.3.1 The Internet Database Connector (IDC) Using IDC the user can design Web pages of dynamic content but it does not allow a flexible programming environment. The Log Records section, for instance, (see Figure 7) generates a dynamic page on the fly using the IDC mechanism, and the content of this page depends on information available in the database. This mechanism is very easy to use and it responses in a very short time. However, it is not very flexible, the user can not specify parameters or operate the resulted information (the page has a static view), the HTX specification for this scripting mechanism includes no provision for controlling the formatting of fields, the only way to format the data differently is to use your database conversion and formatting functions to manage the
data into a more appealing format before it merges with the HTX file (e.g. select price * quantity as total). 4.1.3.2 The Active Server Pages (ASP) The Active Server Pages specification is a technology built on top of the Microsoft Internet Information Server (IIS) family of Internet servers that provides the framework for producing dynamic Web application. Unlike the IDC specification, ASP is not limited to database connectivity. In addition to its support for database publishing, ASP includes capabilities for providing powerful server-side scripting using a variety of scripting languages, including VBScript and JavaScript. In addition, ASP supports the use of server site components, and other features that help creating robust client/server Web application easier. This scripting technology is powerful and flexible enough for creating interactive data-driven Web applications that do just about any thing. 4.1.3.3 The Common Gateway Interface (CGI) CGI programming involves designing and writing programs that receive their starting commands from a Web page - usually, a Web page that uses an HTML form to initiate the CGI program. CGI acts as a Gateway or interface program between other larger applications, it translates and formats the data being send, this translated data will be passed to some type of database program. The database program would do the necessary operations on its database and returns the results to your CGI program. The CGI program then could reformat the return ed data as needed for the Internet and return it to the client request. The WWW server that started the CGI program creates some special information for the CGI program and expected some special responses back from the CGI program.
5. System Implementation The implementation Architecture adopted for the system mainly uses and combines several key technologies. Namely, it is based on an object-oriented database server as a back-end support for dynamism and flexibility. It combines different software technologies for the friendly Web-based user-interfaces development. Beyond supporting advanced features for distributed Web application, the system was built taking in consideration the most important Web design requirement to easily and quickly guide the visitor navigating within the dynamic Web site. The database interface described in Section 5.1 is powerful and friendly user interface to fully access and maintain the database. For the Web user-interfaces development the system takes advantage of the software technology available. Usually, in most common cases, different Web tools are available; each serves better for a specific task. As presented in section 5.3, depending on the task a different software technology will be used at different stages of the system development, usually based on a better combination between user requests and short time response. In terms of functionalities, the system is designed and built to support mainly two activities, on one hand creates and maintains the database server, and on the other hand creates and runs the Web user interfaces to access the server in a consistent way.
5.1 The Database Administrator Interface (DBA) The database administrator interface (see Figure 4) is the main important component in the system since it directly manages the database schema and information. This user-friendly interface written in Visual C++ and in Windows NT environment fully supports the database management and can be only accessed by the authorized administrators. The DBA framework supports two main operations: the database set up and the database updating (maintaining). All the operations supported by this tool are automatically done, the administrator will not have to do more than selecting an item or pushing a button. However, the administrator still has the possibility to directly Figure 4: The Database Administrator Interface access the database and manage its components manually. - The Set up operation allows the administrator to automatically create the schema, initialize the database, load new company profile, generate a complete static Web pages for each company application, etc. - Through the updating operation the database administrator can load into the database new company information, automatically update information about existing companies, generate historical and log database, update the keyword list for each document that has being modified, and so on. 5.2 Web Design Requirement Focusing on versatility and flexibility is a key issue for developing distributed Web application. Furthermore, to build robust and user-friendly Web site, the implementation strategy must take in consideration other Web requirements in order to cover the most important features in designing robust Web sites. These requirements includes: incorporating the fundamental content areas in the site structure, taking the responsibility of making things easy for users when moving around within the site, providing the most relevant information, and taking advantage of HTML and other manipulation techniques for a faster downloading time. 5.2.1 Organizing and Grouping Content Areas This section talks about taking the general Web design elements and grouping them into distinct identifiable and intuitive content areas. At the root of any Web site, there are certain fundamental content areas that are usually incorporated in the site structure. The following list offers an
overview of sampling of common content areas that users tend to expect, and use within a Webbased application [SAT 97]. • About the Application: users are curious to know exactly what the company does, what kind of people work there, who runs the show, and so on. These kinds of details provide users with a clear picture of what the application is about. • Product Information: depending on what site is about, areas that provide information about the products, programs, services, or events are needed. In case of software promotion for example, information about the software features, system requirements, updates, trial version, free source code samples, and costs must be supplied. • News/Press releases: having a section on the site that deals with new items and announcements keeps users informed about what’s news, what is happening today, and what developments progress will occur. • FAQ: In the context of question and answer format, users can have a direct point of reference in navigating through the Web site. Some users may not know what they want, FAQ furnishes them with ideas about how to deal various issues regarding the site or product. • Contact Information: when all is said and done, the visitors need a way to get in touch with the company because they invariably have questions about the Web site or the company products and activities. An e-mail address, a mailing list, telephone and fax numbers are given as contact information. 5.2.2 Direct Visitor Flow The design of the Web site must make things easy for users to move around within the site. The application designer has the responsibility of providing information to the audience compellingly, easily, and quickly. Enabling users to cross-navigate between and within the sections of the Web site, reducing the number of screens and reloads between the visitors and the information that they want, and achieving cross-navigability by providing consistent site navigation through: • Graphics: a common graphic or visual clue on the same placement on different Web pages helps the user to orient himself more easily. Users understand that they are in the same environment when they see these graphics. • Search engine: a search mechanism enables users to go directly to their desired information, the design elements mentioned earlier give them the option to navigate freely between the sections. • Frames: enables users to peruse the content of the site without ever loosing their way back to the other sections. 5.2.3 Taking Advantages of HTML Techniques To deliver Web graphics with as much "pow" and speed as possible, the advantages of HTML and the other manipulation techniques must be taken in consideration: • Create a perception of a faster download time: use an
technique that loads a light GIF before the heavier and detailed JPEG navigational menu • Keep images’ sizes down: keeping the images’ sizes down makes downloading time faster and potentially more rewarding • Compress the downloadable documents
5.3 Web User Interfaces Finding information located somewhere among others, on the Web site or locally, is an errorprone and frustrating task for the end users in distributed Web applications as described earlier in section 2. The user needs to access and navigate through different documents of different types and in different formats, check their content and pick up some relevant information to be executed, visualized, printed, or included in his/her final analysis report. We believe that such a user really needs an intelligent tool that takes him/her step by step to find the more relevant information he/she is looking for.
Figure 5: Data Show Interface Results for WebStore Company
Figure 6: an example of a Corporation Company Profile
In order to allow the user accessing the information available about the company at different levels, three Web user interfaces are designed and developed. The access to these interfaces will be restricted to only a set of authorized users. By user we mean an expert in the domain of companies profiles and patents that can navigate through all the information available in order to accomplish his/her activities (generating some reports, browsing and checking some information, etc.). These interfaces must provide the user with the right information he/she is looking for. Each time the user interacts with the Web interface he/she receives as response a dynamic Web page based on the specifications of his/her request. 5.3.1 Data Show Interface The Data Show interface provides the user with the complete information available about the company. As depicted in Figure 5, that represents a sub set of raw data about the WebStore company, this information is shown to the user as a set of sub categories, each contain a link to a
set of sub items and a list of documents that are of different kinds. Using this interface, users can check and visualize the content of any document regardless its format. 5.3.2 Profile Show Interface This interface allows a user to visualize and browse general information about the company such us description, address, number of employees, general business results, and so on. As presented in Figure 6, that represents a profile example for a corporation company, a part of this hypertext document includes the generated business analysis report and it contains links to different related Web sites. 5.3.3 Document Search Interface With the explosive information that is available through the Web, it is becoming increasing difficult for the users to find information of interest. Therefore, various search engines mechanisms are becoming very popular and useful because they allow the user to find information of interest from the World Wide Web [NOR 97]. However, most of the popular search engines today are textual and have several limitations; based on one or more keywords they can retrieve Web documents that have those keywords [MIL 97]. However, they do not allow the user to find information on the Web based on different criterions such as content, type (images, word, excel, etc.), date, document ownership, etc. The Document Search interface described in this section is a powerful tool for searching and selecting a set of documents. As depicted in Figure 7, using this interface, the user can specify to the system some specification about the information he needs, the search is based on different criterions: document name, document content, document type, document size, etc. As a result, the system returns a dynamic Web page for which the view will be restricted to just include the more relevant information. The user will decide to either continue working with the resulted pages or reformulate his/her query to get more relevant information depending on his/her needs (extended/restricted results). These features show clearly how useful is the use of a database beyond a Web browser in assuring the dynamism and flexibility of the system. As depicted in Figure 7, a query for the user consists of a list of keywords and some check-boxes using a friendly Web interface, the hard task will be done automatically by the system. Based on the entry specified by the user, a simple/complex query will be made up of a collection of keywords, possibly linked with Boolean operators and sent to the server that will execute it, create and send back a well organized Web page as a response to the user request. The checking query option within this interface allows query languages experts to directly formulate and execute SQL/OQL queries based on Figure 7: Document Search Interface their knowledge about the database
structure. The dynamic Web pages presented to the user consist of a set of links to the most relevant documents/references. Within each link some other additional information may be presented in the Web page such as the size of the document, list of keywords, etc. which can be helpful for the user to get an earlier idea about each document (link). Document Search is the most important interface for this application, since it helps the end user step by step reaching his/her goal. The two other interfaces are the Profile Show interface and the Data Show interface, where the Profile Show interface is used for brief and general information about the company, while Data Show interface provides a large amount of information and links concerning the company.
6. Conclusion and Future Work In this paper, the strategy followed for the design and the implementation of a robust system that satisfies many distributed Web application domains and multi-media information requirements, has been described in details. Systematically, this strategy involves (1) the analysis of the general end-user requirements and information management functionalities required for the Web environment; (2) the design and development of a database as a support for multi-media information as well as the necessary framework to automatically store organized data and maintain document evolution, and application’s dynamism and flexibility; and (3) the development of three user-friendly interfaces that help Web users in marketing applications to easily navigate through multi-media information and find those of their interest, the information of interest can be then selected and used for different activities. The designed and developed application represents an important approach for the user dealing with large amount of distributed multi-media information. However, for the user, there are still several interesting issues to be addressed and considered to better support him/her accomplishing his/her activities. Namely, during the process of collecting and classifying information of interest, one of the heaviest tasks in such application is the collecting of meaningful and up-to-date data process for which, a solution could be the development of an intelligent agent (also called spider or robot) that automatically visits specific Web sites, checks their contents against the information available in the database in order to dynamically assure up-to-date information in the database. Similarly, the data classification and organization is an error-prone and very difficult task for the user. Usually, under the normal approach, the data classification and organization is a fully manual task. The user, who should be an expert in the domain area, checks the content of each document and decides where this information should be placed and under which category it should be classified. The goal here is to build a knowledge base for decision making support and to develop the necessary set of adapter interfaces to semi-automatically support the data classification and organization process. All these aspects will be addressed in future extensions to this work.
7. Bibliography [PED 98] A. J. H. Peddemors and L. O. Hertzberger, « A High Performance Distributed Database System for Enhanced Internet Services », In the proceeding of the 6th international High Performance Computing and Networking HPCN98, April 98, Amsterdam, the Netherlands.
[ART 96] Arturo Crespo, and Eric A. Bier. WebWriter, « A Broser-based Editor for Constructing Web Applications », In the proceeding of the 5th international World Wide Web Conference, May 96, Paris, France [DOK 98] D. A. Dokter, A. Allodi, H. W. A. E. Kuipers, «Chimera: Multiple faces of Web-documents ». In the proceeding of the 10th IFIP International Conference PROLAMAT 98, September 9-12, 1998, Trento, Italy. [GEL 97] Hans-Werner Gellersen, Robert Wicke, and Martin Gaedke. «WebComposition: an objectOriented support system for the Web engineering lifecycle », In the proceeding of the 6th international World wide Web conference, April 97, Santa Clara CA [AFS 98] H. Afsarmanesh, A. Benabdelkader, and L.O. Hertzberger. «Cooperative Information Management for Distributed Production Nodes », In Proceedings of the 10th IFIP International Conference PROLAMAT ’98, Trento, Italy: Chapman & Hall Press. [NOR 97] Nortel, Rick Kazman, «Web Query: searching and visualizing the Web through connectivity », In the proceeding of the 6th international World Wide Web conference, April 97, Santa Clara CA. [MUK 97] Sougata Mukherjea, Kyoji Hirata, and Yoshinori Hara. «Towards a multimedia World Wide Web Information Retrieval », In the proceeding of the 6th international World wide Web conference, April 97, Santa Clara CA [MIL 97] Tim Mills, Ken Moody, and Kerry Rodden. «Providing World Wide Access to Historical Sources ». In the proceeding of the 6th international World Wide Web conference, April 97, Santa Clara CA. [HER 96] «Teach yourself CGI programming with Perl », Eric Hermann, 2nd edition1996. [SAT 97] «Creating Killer interactive Web sites ». Andrew Satter, Ardith Ibanez, Bernie Dechaut, Pascal, July 1997 (www.adj.com/killer). [LIT 97] «Intranet & Web Databases ». Paul Litwin, 1997 IDG books [MTS-a 98] «Matisse OOS C API Reference ». Copyright © 1996 ADB S.A. 3rd Edition - June 98, France. [MTS-b 98] «Matisse OOS ODL programming ». Copyright © 1996 ADB S.A. 3rd Edition - June 98, France. [MTS-c 98] «Matisse OOS C++ Binding Reference ». Copyright © 1996 ADB S.A. 3rd Edition - June 98, France. [MTS-d 98] «Matisse OOS SQL User Guide ». Copyright © 1996 ADB S.A. 3rd Edition - June 98, France. [MTS-e 98] «Matisse OOS SQL Programming Guide ». Copyright © 1996 ADB S.A. 3rd Edition - June 98, France.
8. Biographies Hamideh Afsarmanesh is an assistant professor at the University of Amsterdam in the Netherlands. She has been involved and has directed the research in several European (ESPRIT, and DUTCH-HPCN) and American funded projects. At the WINS faculty, she coordinates the research in the area of Cooperative and Federated Databases, and Interoperable information management systems. She has served as the Program Chairperson in International Conferences and Workshops in the area of information management and expert systems. Ammar Benabdelkader is a PhD student at the University of Amsterdam, Netherlands, since February 1997. He is working in the area of Information Management Architecture to support Multi-Agent Distributed application domains. His research focuses on the design and prototypical development of the information management systems. In specific, the modelling constructs, and the mechanisms to support the tasks of supervision and distributed control.