A Localness-Filter for Searched Web Pages - Semantic Scholar

A Localness-Filter for Searched Web Pages Qiang Ma1 , Chiyako Matsumoto2 , and Katsumi Tanaka1 1

2

Graduate School of Informatics, Kyoto University. Yoshida Honmachi, Sakyo, Kyoto, 606-8501, Japan {qiang,tanaka}@dl.kuis.kyoto-u.ac.jp http://www.dl.kuis.kyoto-u.ac.jp Graduate School of Science and Technology, Kobe University. Rokkodai, Nada, Kobe, 657-8501, Japan {chiyako}@db.cs.scitec.kobe-u.ac.jp http://www.db.cs.scitec.kobe-u.ac.jp

Abstract. With the spreading of the Internet, information about our daily life and our residential region is becoming to be more and more active on the WWW (World Wide Web). That’s to say, there are a lot of Web pages, whose content is ’local’ and may only interest residents of a narrow region. The conventional information retrieval systems and search engines, such as Google[1], Yahoo[2], etc., are very useful to help users finding interesting information. However, it’s not yet easy to find or exclude ’local’ information about our daily life and residential region. In this paper, we propose a localness-filter for searched Web pages, which can discover and exclude information about our daily life and residential region from the searched Web pages. We compute the localness degree of a Web page by 1) estimating its region dependence: the frequency of geographical words and the content coverage of this Web page, and 2) estimating the ubiquitousness of its topic: in other words, we estimate if it is usual information that appears everyday and everywhere in our daily life.

1 Introduction Everyone, both novice and expert, can access the vast amount of information that is available on the WWW and find what they are looking for. Conventionally, users input keywords or keyword-based user profiles to search for interesting information. Unfortunately, it’s not always easy to specify the keywords for interesting information. That’s to say, some kinds of information are not easy to be found via the conventional information retrieval systems. For example, time-series documents, such as news articles, can not be retrieved effectively because that they are incoming information and it is impossible to clearly specify the search keywords. Some new retrieval mechanisms are necessary for such kinds of information. Ma[3,4] and Miyazaki[5] focused on the time-series feature

This research is partly supported by the Japanese Ministry of Education, Culture, Sports, Science and Technology under Grant-in-Aid for Scientific Research on "New Web Retrieval Services Based on Discovery of Web Semantic Structures", No. 14019048, and "Multimodal Information Retrieval, Presentation, and Generation of Broadcast Contents for Mobile Environments", No. 14208036.

X. Zhou, Y. Zhang, and M.E. Orlowska (Eds.): APWeb 2003, LNCS 2642, pp. 525–536, 2003. c Springer-Verlag Berlin Heidelberg 2003

526

Q. Ma, C. Matsumoto, and K. Tanaka

of information and proposed some meaningful measures, called freshness and popularity, to fetch new valuable information from time-series documents. On the other hand, according to the rapidly progressing and spreading of the World Wide Web (WWW), the information about our daily life and residential region is becoming to be more and more active. However, via conventional methods, it is not yet easy to discover local information from the Web, too. Some portal Web sites[6,7] provide directory type search services for regional information. In such portal Web sites, the regional information is managed manually. The resource may be limited and some valuable local information may be missed. It’s also necessary to define some new retrieval criterions for discovering local information. In our early work[8], we proposed a notion called localness degree for discovering local information from the Web. With the localness notion, we can estimate the dependence of a Web page on a special region. In this paper, we refine our localness estimation methods and propose a localness filter for searched Web page. The localness filter will compute the localness degree of each Web page of search results and select more (or less) local information as the filtering results according to users’ intentions. For example, if a user wants to get more local information, the localness filter will return the Web pages that have high localness degree. On the other hand, if a user wants to exclude local information, the Web pages that have lower localness will be returned. In contrast to conventional system, the localness filter is useful to: – acquire local information, which is not easy to clearly specify in keywords, – exclude the local information, and – discover local information over multi-regions. We say a Web page is local when it only interests the residents (users) of some special regions or organizations. From this perspective, we compute the localness degree of a Web page from two ways: a) estimating its region dependence: the frequency of geographical words and the area of its content coverage, and b) estimating the ubiquitousness of its topic: in other words, we estimate if its information is usual that appeals everywhere and everyday in our daily life. (a) Region Dependence. If a page has high region dependence, it may describe something about a special region and may interest the residents. Therefore, its probability of being local information may be high. At first, to estimate the region dependence, we compute the frequency of geographical words within a Web page. We assume that geographical words are words of region name and organization name. The administration level of each geographical word is considered as a weight value when compute its frequency. If geographical words appear frequently, the localness degree is high. Secondly, we compute the content coverage of a Web page. We plot its geographical words on a map and compute the area of its content coverage based on MBR(Minimum Bounding Rectangle)[9,10]. If this area is small and there are many geographical words, the localness degree of that Web page is considered to be high. In other words, we compute the density of geographical words over the content coverage to estimate the localness degree. If the density is high, localness degree of that page should be high.

A Localness-Filter for Searched Web Pages

527

Moreover, we compute the document frequency of each geographical word to avoid the effects of usual geographical words on localness degree. (b)Ubiquitousness of Topic. Some events occur everywhere and some events occur only in some special regions. Information about the latter is a kind of scoop and may interest all users (of all regions). In other words, it may be global information more than local information. Therefore, if a page describes scoop event (the latter), it may not be local information. On the other hand, if a page describes ubiquitous event (the former type. High similarity between these events excluding the locations and times), it may only interest users of some special region because that it’s usual information. For instance, summer festivals are held everywhere in Japan. These events (summer festivals) are similar although their locations and times may be different. People may be only interested in the summer festival of their living town (or some special places to them, such as his/her hometown). In other words, the summer festival of a special region may interest the residents only and its localness degree could be high. To estimate the ubiquitousness of a Web page, at first, we compute its usual words’ (which are used often in our daily life) frequency by referring a pre-specified usual words dictionary. Secondly, we compare it with other Web pages to compute their similarities without geographical words and proper nouns. If its usual words’ frequency and the number of its similar (excluding geographical words and proper nouns) pages are high, this Web page may represent usual information, such as ubiquitous topic or event. Therefore, its localness degree should be high. The remainder of this paper is organized as follows: in section 2, we give an overview of related works. In section 3, we describe the mathematical definitions of localness of a Web page and introduce the localness filter for local information retrieval. In section 4, we show some preliminary evaluation results. Finally, we conclude this paper with some discussions and a summary in section 5.

2 Related Work Mobile Info Search (MIS)[11] is a project that proposes a mobile-computing methodology and services for utilizing local information from the Internet. Their system[12] exchanges the information bi-directionally between the Web and the real world based on the location information. Our research differs in that we define a new concept, the localness degree of a Web page, for discovering ’local’ information from the Web automatically. Buyukkokten et al[13] discussed how to map a Web site to a geographical location, and studied the use of several geographical keys for the purpose of assigning site-level geographical context. By analyzing "whois" records, they built a database that correlates IP addresses and hostnames to approximate physical locations. By combining this information with the hyperlink structure of the Web, they were able to make inferences about the geography of the Web at the granularity of a site. Geographic Search[14] adds the ability to search for Web pages within a particular geographic locale to traditional keyword searching. To accomplish this, Daniel converted

528


street addresses found within a large corpus of documents to latitude-longitude-based coordinates using the freely available TIGER and FIPS data sources, and built a twodimensional index of these coordinates. Daniel’s system provides an interface that allows the user to augment a keyword search with the ability to restrict matches to within a certain radius of a specified address. In consideration of how much a page has stuck to the area, this point differs from our research. Moreover, our work focuses on how to discover local information from Web contextually: content and correlation with others. In contrast to these systems and services, the main contribution of our work is that our system can estimate not only which region a Web page relates to, but also how ’local’ (or how ’ubiquitous’) a Web page is.

3 A Localness-Filter Based on Regional Dependence and Topic Ubiquitousness In this section, we describe the notion of localness and the localness filter. 3.1

Computing Localness Degree

As we mentioned above, a Web page is local means that its information only interests users of some special regions or organizations. We compute the localness of a Web page from two ways : 1) region dependence and 2) ubiquitousness of its topic. In other words, if its content bears closely to a special region, the localness of that Web page is considered to be high. We also say that a Web page is local if its information (topic) is usual that appears everywhere and everyday in our daily life. 3.1.1 Region Dependence We noticed that a Web page with a high dependence on a region includes much geographical information, such as the names of the country, state (prefecture), city, and so on. Therefore, we compute the frequency of these geographical words of a Web page to estimate its region dependence. The content coverage is another factor to estimate a Web page’s region dependence. If a Web page describes information about a narrow region, its region dependence may be high. In other words, if the content coverage of a Web page is very narrow, we say that it is local. (a) Frequency of Geographical Words. A Web page has many detailed geographical words, it may deeply depend on some regions. Here, ’detailed geographical word’ means that geographical word identifies a detailed location. Usually, we can use the administrative level of each geographical word (region name) as its detailedness1 . For example, "Xi’an" is more detailed than "China". When compute the frequency, we assign weight values to geographical words according to their detailedness. The simplest rule is to set 1

In our current work, we just consider the detailedness of geographical words based on administrative levels. We also observe that the population density is also important. We will discuss this issue in our future work.


529

up weight values in the following order: country name < organization name < state (prefecture) < city < town < street(road). We also compute the document frequency of each geographical word within a Web pages’ corpus (i.e., Web pages in the same Web site as the estimating one.) to estimate region dependence of a Web page. That’s to say, if the document frequency of a geographical word contained in the estimating Web page is high, such geographical word is a common one and less affects the localness of that page. In short, if a Web page includes many detailed geographical words that have lower document frequencies, it is local. The formula for computing localness degree based on detailedness, frequency and document frequency of geographical word is defined as follows. k weight(geowordi ) · tf (geowordi )/df (geowordi ) (1) localg (p) = i=1 words(p) where, weight(geowordi ) is the detailedness of geographical word geowordi , words(p) is the total words number of page p (excluding stop words). k is the number of distinctive geographical words of p. tf (geowordi ) is the frequency of geowordi and df (geowordi ) is the document frequency of geowordi .

Latitude

Latitude

Longitude

Longitude

Fig. 1. Example for Effects of Number and Sizes of Geographical Words on Localness

(b) Content Coverage. We convert the region names and organization names to pairs of location data such as (latitude, longitude) and estimate the content coverage of the Web page based on the MBR (Minimum Bounding Rectangle)[9,10]. Each geographical word is converted to a two-dimensional point (latitude, longitude)2 . We plot all of these points on a map, on which the y-axis and x-axis are latitude and longitude, respectively. The content coverage of a Web page could be approximately computed by the MBR (Minimum Bounding Rectangle), which contains these points. 2

In our current working phase, we use a Japanese location information database[15], which includes latitudinal and longitudinal data for the 3,252 cities, towns, and villages in Japan.

530


The number and sizes of points plotted on the MBR are also used for estimating the localness. Point size represents the detailedness of a geographical word. A motivation example is illustrated in Fig. 1. Four points are plotted on the left MBR, and eight points are plotted on the right MBR. Even if the areas of the two MBRs are equal, their localness degrees should be different because that the right one may describe more detailed information about that region. In short, if the content coverage is narrow and there are many detailed geographical words on it, that Web page is local. As mentioned before, the document frequency of each geographical word is also important to avoid the effects of common geographical words on localness degree. The formula is defined as follows. localde (p) = M BR(p)

k i=1

weight(geowordi )·tf (geowordi )/df (geowordi ) M BR(p)

= (latmax − latmin )(longmax − longmin )

(2) (3)

where localde (p) is the localness degree. weight(geowordi ) is the function that we use to estimate the detailedness of geographical words. tf (geowordi ) is the frequency of geowordi within p. df (geowordi ) is the document frequency of geowordi within a page corpus. M BRp is the content coverage of page p. The maximum latitude and longitude are latmax and longmax , respectively. The minimum latitude and longitude are latmin and longmin , respectively. When only one point exists in a page and M BR(p) computed by Function (3) will be 0. When such exception has been caught, we set M BR(p) to 1. Some detailed place names have no match in the location information database[15]. In this case, we downgrade the detailedness of such place name to find an approximate location data. For example, if the location data of "C street, B city, A state" is not found, we could use the location data of "B city, A state" as an approximate match. Different places may share the same name. To avoid a mismatch, we analyze the page’s context to clearly specify its location data. For example, to match up "Futyu city" to its correct location data, we can examine the page content. If "Hiroshima prefecture" appears around the "Futyu city", we could use the location data of "Futyu city, Hiroshima prefecture" to find the location data of "Futyu city" in this page. On the other hand, if "Tokyo metropolitan" appears, we should use "Futyu city, Tokyo metropolitan" to get the proper location data. 3.1.2 Ubiquitousness of Topic Some events occur everywhere and some events occur only in some special regions. Information about the latter is a scoop and may interest all users (of all regions). Oppositely, a ubiquitous occurrence may have a high localness degree. For example, a summer festival, an athletic meet, and weekend sale are ubiquitous events that appear usually. A ubiquitous occurrence may be a normal part of our daily life. Therefore, if a page describes scoop event, it may not be local information. On the other hand, if a page describes ubiquitous event, it may be local information. We estimate the topic ubiquitousness of a Web page with two factors: the similarity between pages and the locations where events hold. If the pages are similar in content but different in location or time, their topics are usual. Here, the similarity between page


531

Localness Filter Usual Words Dictionary Web Page Corpus

Localness Calculator Region Dependency (a) Geographical Words (b) Content Coverage

Searcher

Search Engine

Web

Search Google Web API

Ubiquitousness of Topic

Filter Manager Q

Result

Fig. 2. Architecture of Localness Filter

A and B is calculated as follows: sim(A, B) =

v(A)v(B) |v(A)||v(B)|

(4)

where, v(A) and v(B) are keyword (excluding geographical words and proper nouns) vectors of page A and B, respectively. We also compute the frequency of daily words within a Web page to estimate its ubiquitousness by referring a pre-specified usual(daily) words dictionary. If the frequency is high, we say that Web page describes usual information and its ubiquitousness is high. The usual words dictionary is constructed from a corpus of user-selected Web pages that describe usual information. In short, the localness degree localu based on the ubiquitousness of topic of Web page p is computed as following: localu (p) = (dailywords(p)/words(p)) ∗ (m/n)

(5)

where, dailywords(p) means the frequency of daily words in page p. words(p) means the total number of words in p. m is the number of pages, whose similarities (with page p, excluding geographical words and proper nouns) are greater than threshold ∆. n is the number of pages, which are used for similarity comparison. 3.2

Localness Filter

Although more and more information about our daily life and residential region becomes to be active on the Internet, it’s not yet easy to acquire and exclude such kind of information by conventional information retrieval systems and services.

532

Q. Ma, C. Matsumoto, and K. Tanaka Keyword Box

Level

Browser

Filter mode Filtering Function parameters

Result Viewer

Fig. 3. Screen Shot of Localness Filtering System

Fig. 4. Running Example: Web Page with High Localness

In this section, we introduce a localness-filter for searched Web pages based on the localness notion. The goal of our localness-filter is to help users to find or exclude local information from the search results. In our localness filter, the localness of page p is computed as follows: local(p) = α · localg (p) + β · localde (p) + γ · localu (p)

(6)

where, 0 ≤ α ≤ 1, 0 ≤ β ≤ 1, 0 ≤ γ ≤ 1 are weight values, which can be specified by a user himself/herself. For example, a user can let α = 1, β = 0, γ = 0 to select the geographical words based localness localg (p) as his/her filtering function. Localness filter has two modes: exclusion mode and inclusion mode. In exclusion mode, the Web pages whose localness degrees (computed by function (6)) are greater than threshold θ will be excluded from the searched Web pages. On the other hand, in the inclusion mode, only the local Web pages whose localness degrees are greater than threshold will be the survivors.


533

Fig. 5. Running Example: Web Page with Low Localness

A Web page may be local for a wide region while it’s considered to be global for a very narrow region. That’s to say, localness is a relative concept. According to this feature of localness, the level of localness is defined as a three-grades value ranging from 1 to 33 , where 1 means normal level (e.g., according to the level of prefecture.), 2 means high local level (e.g., according to the level of city.) and 3 means very high level (e.g., according to the level of county, ward, village.). Fig. 2 illustrates the architecture of localness-filter. The filter manager will accept the request from a user and return the results to the user. The searcher receives search request from the filter manager and accesses the Google Web service[1] to get search results from the Web via the Google Web API[16]. The searched Web pages will be passed to localness calculator to estimate their localness degrees. The parameters for localness estimation are set up by filter manager based on user’s specification. filter manager also generates the filtering results according to the filter mode and level that are specified by a user. Usual Words Dictionary is used to compute the frequency of usual(daily) words within a Web page. Web Page Corpus includes some collections of Web pages to compute the document frequencies of geographical words and the topic ubiquitousness of each page. We are developing the localness filter on .net platform[17]. We use Chasen[18] for Japanese morphological analysis to extract geographical words from a Web page. Fig. 3 is a running snapshot of our current system. A user can input his/her query into the keywords box and customize the localness filter: filter mode (by ratio buttons), parameters of filtering function and localness level (by slider bars). In the result viewer, the results are sorted in descending order of localness at this time according to the inclusion mode. The results also can be sorted in ascending order according to the exclusion mode of localness filter. The number of signals the degree of localness: high degree of localness will bring more . By clicking the URL in the result viewer, a Web browser will automatically start up to view the Web page. 3

In our current work, we can adjust the detailedness of geographical words or can adjust the threshold θ for the level feature of localness.

534


Fig. 4 shows the Web page that is accepted as the most local Web page, which describes some famous restaurants of Chinese noodles in soup. This page has been ranked to the 9th by the Google search service with keyword "lamen"4 . On the other hand, the lowest local Web page, which publishes some information about fortunetelling, is shown in Fig. 5. While it ranked to the 2nd by Google search service, the localness filter succeeded in excluding it because it is not local.

4 Preliminary Evaluations 4.1

Preliminary Evaluations for Localness Computing

In this subsection, we describe our preliminary evaluations for estimating the localness notion. We used 500 Web pages (written in Japanese) from the ASAHI.COM (http://www.asahi.com/), which is a well-known, news Web site in Japan. Initially, we excluded all of the structural information (e.g., HTML tags) and the advertisement content from the HTML sources of these Web pages. In the preliminary evaluations, we compute three kinds of localness degrees (localg (p), localde (p) and localu (p)) from the aspects of region dependence (geographical words and content coverage) and topic ubiquitousness, respectively. One of the limitations of our preliminary evaluation is that we only compare the similarity of a page with different region’s pages to compute its topic ubiquitousness. That’s to say, we let the portion dailywords(p)/words(p) of Formula (5) to be 1 when we compute the localness of each page from the topic ubiquitousness aspect. By the localness estimation from the region dependence aspect (frequency of geographical word), there are 327 pages that have been selected as local pages. The number of local pages that have been selected by a user is 362. The recall ratio and precision ration are 0.619 and 0.685, respectively. On the other hand, 283 pages have been selected as local pages by localness estimation from the region dependence aspect(content coverage), and 362 pages were considered to be local pages by a user. The recall ratio and precision ratios are 0.555 and 0.710, respectively. Moreover, there are 80 pages are selected as local pages by localness estimation from the topic ubiquitousness aspect. The number of Web pages selected as local by a user is 324. This time, the recall ratio and precision ratios are 0.200 and 0.813, respectively. We noticed that recall ratio is very poor. One of the considerable reasons is: we used the news articles as our evaluation targets for the preliminary experiments. Because news articles are more formal and unique than other Web pages, we might fail in estimating the topic ubiquitousness. We are planning to evaluate other kinds of Web pages in the future. The limitations of number and time intervals of evaluation pages are other considerable reasons. Further research to obtain better results need to be carried out. 4.2

Preliminary Evaluation for Localness Filter

We compare the filtering results of localness filter and search results of Google to estimate the localness filter. In the comparison evaluation, we set the parameters of filtering 4

"lamen" is a Japanese word means Chinese noodles in soup.


535

function (6) as follows: α = 1.0, β = 0, γ = 0. The level of localness is set to normal level. The filter mode is set to inclusion mode. In the comparison evaluation, the evaluation targets are limited to the top 50 pages (ranked by Google) of search results. The first query for the comparison evaluation is "lamen"5 . By localness filter, there are 23 pages are accepted as local ones from the top 50 searched pages. The accepted pages are also selected as local pages by a user. On the other hand, there are 27 pages are excluded form the searched Web pages while 15 of them are selected to be local by a user. The recall ratio and precision ratio are 0.605 and 1.0, respectively. The top 10 searched pages’ original ranking order by Google and localness based ranking order are (p1 , p2 , p3 , p4 , p5 , p6 , p7 , p8 , p9 , p10 ) and (p4 , p9 , p10 , p5 , p2 , p6 , p1 , p7 , p8 , p3 ), respectively. This shows that localness filter can help users to find local information quickly. Another query for the comparison evaluation is "Chocolate Kobe". On the normal level of localness, 33 pages are accepted as local pages by the localness filter while 11 of them are not local by estimation of a user. On the other hand, there are 17 pages are excluded while 3 of them are selected to be local by a user. The recall ratio and precision ratio are 0.880 and 0.667, respectively. On the high level of localness, 14 pages are accepted including 5 incorrect (estimated by a user) pages. On the very high level of localness, 7 pages are accepted including 1 incorrect (estimated by a user) page. The ranking order of top 10 searched pages by localness filter is (p8 , p6 , p5 , p7 , p4 , p3 , p10 , p9 , p1 , p2 ). Since our evaluations are limited, there are more improving works needed to be done. Nevertheless, these results can confirm the notion, localness, is useful for discovering local information from massive amount of Web pages. Form the comparison evaluation, at least, we could assume that localness is more useful to filter search results. We also could assume that the level feature of localness is helpful for filtering search results from the query with geographical words. These will be tested in our future evaluations.

5 Conclusion There are many Web pages, which provide information about our daily life and residential region. These pages are very important for residents and may only interest them. In this paper, We proposed a filter for searched Web pages based on the notion of localness. With this filter, users can acquire or exclude more local information from the Web according to their intentions. In this paper, we estimate a Web page’s localness from two aspects: region dependence and topic ubiquitousness. We also noticed that users’location information is useful to estimate the localness of a Web page. If a Web page is often accessed by users from the same region, and there is little access from other regions, we could say that such page only interests users of that special region and its localness is high. On the other hand, if some information is published only from the residents of a same region, and there are few users which report the same information in other regions, we could say that such information is local because that it only interests users of a special region. We will discuss these issues in our future work. 5

"lamen" is a Japanese word means Chinese noodles in soup. In our comparison evaluation, the keywords of query are all in Japanese.

536


Further experiments to estimate the localness notion and the localness filter need to be carried out, such as comparison evaluations with dictionary-type search in portal region Web sites (Yahoo! Region[6], MACHIGoo[7], etc.) and keywords search in agenttype search services (Google, etc.). Further refinements on the the localness notion and localness filter are also planned to do as our future work, such as, considering well the hierarchical relationships among geographical words to estimate the region dependence, etc.

References [1] Google. http://www.google.com/, 2002. [2] Yahoo! Japan. http://www.yahoo.co.jp/, 2002. [3] Qiang Ma, Kazutoshi Sumiya, and Katsumi Tanaka. Information filtering based on time-series features for data dissemination systems (in Japanese). IPSJ TOD7, 41(SIG6(TOD7)):46–57, 2000. [4] Qiang Ma, Shinya Miyazaki, and Katsumi Tanaka. Webscan: Discovering and notifying important changes of web sites. proc. of DEXA2001, LNCS 2113, pages 587–598, 2001. [5] Shinya Miyazaki, Qiang Ma, and Katsumi Tanaka. Webscan: Content-based change discovery and broadcast-notification of web sites (in Japanese). IPSJ TOD10, 42(SIG8(TOD10)):96–107, 2001. [6] Yahoo!regional. http://local.yahoo.co.jp/, 2002. [7] MACHIgoo. http://machi.goo.ne.jp/, 2002. [8] Chiyako Matsumoto, Ma Qiang, and Katsumi Tanaka. Web information retrieval based on the localness degree. proc. of DEXA 2002, LNCS 2453, pages 172–181, 2002. [9] Antonin Guttman. R-trees: A dynamic index structure for spatial searching. Proc. ACM SIGMOD Conference on Management of Data, 14(2):47–57, 1984. [10] Carlo Zaniolo, Stefano Ceri, Christos Faloutsos, Richard T. Snodgrass, V. S. Subrahmanian, and Roberto Zicari. Advanced Database Systems. The Morgan Kaufmann, 1997. [11] Nobuyuki Miura, Katsumi Takahashi, Seiji Yokoji, and Kenichi Shima. Location oriented information integration - mobile info search 2 experiment - (in Japanese). The 57th National Convention of IPSJ, 3:637–638, 1998. [12] KOKONONET. http://www.kokono.net/, 2002. [13] Orkut Buyukkokten, Junghoo Cho, Hector Garcia-Molina, Luis Gravano, and Narayanan Shivakumar. Exploiting geographical location information of web pages. proc. of WebDB (Informal Proceedings), pages 91–96, 1999. [14] Daniel Egnor. Google programing contest, 2002. [15] T. Takeda. The latitude / longitude position database of all-prefectures cities, towns and villages in japan, 2000. [16] Google Web API. http://www.google.com/apis/, 2002. [17] Microsoft .net. http://www.microsoft.com/net/, 2002. [18] Chasen. http://chasen.aist-nara.ac.jp/chasen/whatis.html.en, 2002.

A Localness-Filter for Searched Web Pages - Semantic Scholar

A Localness-Filter for Searched Web Pages - Semantic Scholar

Suggest Documents

A Localness-Filter for Searched Web Pages - Semantic Scholar

Alipi: A framework for re-narrating web pages - Semantic Scholar

Adaptive Ranking of Web Pages - Semantic Scholar

Automatic Web pages author extraction - Semantic Scholar

Graph Neural Networks for Ranking Web Pages - Semantic Scholar

Collecting Hidden Web Pages for Data Extraction - Semantic Scholar

Link Processing for Fuzzy Web Pages Clustering ... - Semantic Scholar

Spidering and Filtering Web Pages for Vertical ... - Semantic Scholar

Refinement of TF-IDF Schemes for Web Pages ... - Semantic Scholar

A Comparative Study of Web-pages Classification ... - Semantic Scholar

A Method for Indexing Web Pages Using Web Bots - Semantic Scholar

A Semantic Metadata Generator for Web Pages Based on Keyphrase ...

Searched and Known

Inside Pages - Semantic Scholar

A Roadmap for Web Mining: From Web to ... - Semantic Scholar

STANFORD pages - Semantic Scholar

8 pages - Semantic Scholar

Data Extraction from Web Pages Based on ... - Semantic Scholar

Authoring and Annotation of Web Pages in CREAM - Semantic Scholar

Rediscovering Missing Web Pages Using Link ... - Semantic Scholar

Modelling Good Entry Pages on the Web - Semantic Scholar

classifying web pages with visual features - Semantic Scholar

Analysing the Visual Complexity of Web Pages ... - Semantic Scholar

Information Extraction from Web Pages Using ... - Semantic Scholar