Copyright © 1998 Drug Information Association Inc. ... and Visiting Scientist, College of Pharmacy, University of Kentucky, Lexington, Kentucky, and Division of .... NOT. Search. Nesting. Truncation. Search. Alta Vista yes1 yes2 yes3 yes4 yes.
Drug Information Journal, Vol. 32, pp. 921–932, 1998 Printed in the USA. All rights reserved.
0092-8615/98 Copyright 1998 Drug Information Association Inc.
A COMPARATIVE STUDY OF INTERNET SEARCH ENGINES BY APPLYING “COST EFFECTIVE TREATMENT FOR MYOCARDIAL INFARCTION” AS A SEARCH TOPIC EIICHI AKAHO, RPH, PHD Associate Professor of Pharmacy, Faculty of Pharmaceutical Sciences, Kobe Gakuin University, Kobe, Japan, and Visiting Scientist, College of Pharmacy, University of Kentucky, Lexington, Kentucky, and Division of Information Development, United States Pharmacopoeia, Rockville, Maryland
SYED RIZWANUDDIN AHMAD, MD, MPH Drug Information Specialist, Division of Information Development, United States Pharmacopeia, Rockville, Maryland
The Internet has become one of the necessities in the medical sciences and has created a global information village. It is essential to know various features of the Internet and to utilize it efficiently in each field of science. A comparative study of Internet search engines was conducted for a specific search topic, “cost effective treatment for myocardial infarction.” The data sources were published conference reports, journal articles, personal communications, and Internet resources. Data on features of the Internet search engines were obtained by browsing each of the 20 search engines selected and by communicating with individuals in charge of each search engine when needed. Actual searches on the topic were conducted by using the 20 search engines. Search term variety for cost effectiveness ranges from pharmacoeconomics to medical economics, providing a total number of more than 10. Search term variety for myocardial infarction ranged from heart failure to coronary heart disease, and at least five major terms exist. Basic search features differ from one search engine to another and a user should be familiar with those differences. The number of Uniform Resource Locators (URLs) and contents of each search engine are also different. The number of records retrieved ranged from 0–22,000 among different search engines. Lycos and Magellan gave a fairly good hit percentage or relevance factor. The system design of the Internet search engines is based on open search architecture (key word indexes) or it allows the creator of the program to index his/her own page (subject directories). These different approaches resulted in the existence of varying degrees of function of the Internet search engines. Aims and scopes of this paper include not only to retrieve scientific papers and articles but also to locate information on scientific meetings, scientific presentations, and other scientific resources. Some commercial search engines enable users to access those information sites. Key Words: Internet; Search engine; Myocardial infarction; Cost effectiveness; Uniform resource locators
Reprint address: Eiichi Akaho, PhD, Associate Professor of Pharmacy, Faculty of Pharmaceutical Sciences, Kobe Gakuin University, 518 Ikawadani-cho, Nishi-ku, Kobe 651–2180 Japan. E-mail
921
922
Eiichi Akaho and Syed Rizwanuddin Ahmad
INTRODUCTION THE INTERNET IS A worldwide network of computers storing immense quantities of various types of information (1). The number of host computers connected to the Internet was only 213 in 1981 but increased to 16146000 in January 1997, according to the survey conducted by Network Wizards (2). It is growing very rapidly, having approximately doubled its number since 1989 (2). It is one of the greatest innovations in the history of communication technology. The impact of the Internet may be as profound as the invention of the printing press (2), for it has the potential to make scientific work easier and more productive if utilized in an efficient and proper way. The Internet is playing a major role in medicine (3,4). Searching the Internet is becoming critically important in medicine and pharmacy (3–6) and its usefulness and application has been reported in various fields of the health sciences (7–20). In this project, cost effectiveness for the treatment of myocardial infarction was selected as a search topic to investigate fundamental features of the Internet. Possible search terms for “cost effectiveness” and “myocardial infarction” were pointed out, then potential search equations using the various search terms were constructed. Actual searches were performed for 20 selected search engines by using one of the search equations. At the same time, characteristics of the 20 search engines were also studied. Overall results could be utilized for a search of other related medical topics. It should be noted that the majority of Internet search engines do not have a structured thesaurus; one typical example is MeSH. The system design of the Internet search engines is based on open search architecture (key word indexes) or they allow the creator of the program to index his/her own page (subject directories). These different approaches resulted in the existence of varying degrees of function of the Internet search engines (21). The aims of this paper were not only to retrieve scientific papers and articles but also to locate information on scientific meetings,
scientific presentations, and other scientific resources. Some commercial search engines enable researchers to access those information sites. The evolution of the Internet can be traced to the United States Department of Defensefunded Advance Research Projects Agency Network (ARPANET), built in 1969 which linked, during the 1970s and 1980s, several large regional networks, creating a nationwide system of computers (22,23). In 1986, the National Science Foundation Net (NSFNET) replaced ARPANET, and as connections to NSFNET were gradually established with other regional, statewide, and academic networks, the concept of the Internet arose as a network of networks. It should be noted here that no administrative headquarters or management office exists. Therefore, information exchanged through the Internet is not screened or evaluated. Anybody may put his/her home page on the Internet and it is up to users to determine how to use them (24,25). SEARCH METHODS 1. Twenty popular search engines were selected because they were general search engines and they were cited and documented in literature listed in references (Table 1), 2. Characteristics of those search engines were compared and discussed. Search equations were constructed and compared for each search engine. Relevance factors of those search engines were calculated and compared, and 3. A search was conducted using the 20 search engines for the retrieval of information relevant to “cost effectiveness for the treatment of myocardial infarction.” The World Wide Web (WWW) browser used for the search of the 20 search engines was Netscape version 3.0. The relevance of each record was determined by examining the content of the output description (normal style if several output options existed) appearing after the initial search. No further evaluation of quality of the record was performed.
Comparing Internet Search Engines
923
TABLE 1 A Comparison of Search Functions of Internet Search Engines Internet Search Engines/Tools
AND
OR
NOT
Phrase Search
Alta Vista Aqui DejaNews Excite Galaxy Harvest Hot Bot InfoMine Infoseek Inktomi Lycos Magellan MetaCrawler NlightN OpenText Index SavvySearch Veronica WebCrawler WWW Worm Yahoo
yes1 yes yes6 yes yes yes yes10 yes y&n14 yes10 yes16 yes yes10 yes20 yes24 yes9 yes yes yes yes
yes2 yes yes7 yes yes yes yes11 yes yes2 yes yes17 yes yes yes21 yes24 yes9 yes yes yes yes
yes3 NA no yes yes no yes3 no yes3 yes3 no3 no yes3 yes22 yes24 no yes no no no
yes4 NA no no yes yes yes NA yes4 no no18 yes yes4 yes2 yes2 yes yes4 no yes yes
Nesting
Truncation
yes no no yes yes NA yes12 yes no no no no yes no no no yes no no no
yes5 NA yes5 yes9 yes5 yes no yes13 no yes15 yes9 yes yes19 no no no yes5 y&n yes yes26
Field Search yes yes yes8 no yes yes yes no yes no no yes yes yes23 yes24 no yes25 no no yes27
Notes: NA = information not available. 1 = A plus (+) or ‘and’ before a word requires that the word be included. 2 = Default. 3 = A minus (−) or ‘not’ before a word means to exclude records with the word. 4 = Phrase must be included in double quotes (“ ”). 5 = End truncation with an asterisk*. 6 = Supports both AND and the symbol ‘&.’ 7 = Supports both OR and the symbol ‘*’. 8 = Newsgroups, date, author. 9 = Automatic. 10 = A plus (+) before a word requires that the word be included. 11 = Can search for “any of the words” or add words that “should” be included. 12 = Yes but subtle. 13 = To truncate a phrase, simply insert a “#” sign. 14 = Brackets ([ ]) for terms appearing within 100 words of each other. 15 = Common endings removed and searched for. 16 = Match all terms. 17 = Match any term. 18 = N/A but can use loose-strong match menu. 19 = If the other search engines that it accesses support “truncation” so does MetaCrawler. 20 = Supports the symbol ‘&.’ 21 = Supports the symbol ‘*’. 22 = Supports the symbol ‘ ˆ ’. 23 = Select from limit menu. 24 = Select from drop-down menu. 25 = Titles, directions and gopher type. Also permits restriction of search to any arbitrarily-identified Internet domain. 26 = All words or none. 27 = Title, URL, comments.
The search concept could be divided into two sections: “cost effectiveness” and “myocardial infarction.” The concept of cost effectiveness could be expressed in the following ways:
• Cost benefit, • Cost analysis, • Pharmacoeconomics, • Cost evaluation, • Cost minimization,
924
Eiichi Akaho and Syed Rizwanuddin Ahmad
• Economic factor, • Cost effective, • Cost of medicine and benefit, • Cost efficiency, • Cost-benefit analysis, • Cost reduction, and • Medical economics. The concept of myocardial infarction could be expressed in the following ways: • Coronary heart disease, • Cardiovascular disease, • Myocardial infarction, • Heart failure, • Heart disease, and • Cardiac disorder. SEARCH EQUATIONS The search equation can be constructed as follows: 1. Cost AND “myocardial infarction”— Concept A is nonspecific and B is very specific and emphasized, 2. Cost AND (“coronary heart” OR cardiovascular)—Concept A is nonspecific and B is broad, 3. Pharmacoeconomic OR economic AND myocardial—Concepts A and B can be nonspecific, and 4. Cost AND myocardial—Both concepts A and B are nonspecific, but good enough for search engines which are not equipped with sophisticated search techniques such as nesting and phrase searches. A rule of thumb of computer literature searching is that the more specific the search term used is the smaller the recall factor and the greater the relevance factor obtained, and that the less specific or broader the search term used is, the greater the recall factor and the less relevance factor obtained. Oftentimes, a term in between the two extremes will be chosen in order to satisfy both recall and relevance factors moderately well. Therefore, “cost” was chosen as a search term candidate from concept A and “myocar-
dial infarction” was chosen as a phrase term candidate from concept B to compromise the two extremes. Then the two terms were combined by a Boolean operator AND. For search engines which do not have phrase search function, “myocardial” and “infarction” are combined by a Boolean operator AND. Although some of the search engines are constructed based on the controlled term, this free term search technique was adopted in order to keep the search strategy consistent. In addition to the above search equations, the following search can be considered: 1. “Drug therapy” OR treatment OR management AND “myocardial infarction,” 2. “Drug therapy” OR treatment OR management AND coronary OR cardiovascular OR heart, or 3. A two-step search: perform a Netscape primary search for “myocardial infarction” first by using a search engine in question, and when some records are retrieved, use the [FIND] function under [EDIT] in Netscape (other major browsers have a similar function) to find the word “cost” as a secondary search. ANALYSIS OF THE OUTCOMES Internet Search Engines and Their High Noise Ratio The current issues related to Internet information retrieval are well summarized by Edelberg (26): “At this point there is no one unified tool to search all of the Web. There are, however, many different search tools with different advantages and disadvantages, and more are becoming available all the time. The commercial search tools are becoming very good, and may at some point be necessary for quality searching. The problems inherent in key word searching of vast numbers of documents are obvious and will continue to be problematic as the Web grows. In fact, the rapid growth of the Web, and its high noise factor due to the total lack of quality control, along with the difficulty that scale poses for the search tools, will present a real challenge to the user trying to efficiently locate information.”
Comparing Internet Search Engines
As the complexity of the Internet increased, it was soon recognized that special provisions were required to search out information storage sites by topic (27). In any case, the concept that the world has become a global village can be demonstrated dramatically through the electronic links provided by the Internet in various fields (23,28). Comparison of Search Features of Search Engines Table 1 presents query function features of the 20 popular search engines investigated. As is clear in Table 1, each search engine has its own search function. For example, in the case of the Boolean operator AND function, Inktomi uses a ‘+’ operator, while NlightN uses an ‘&’ operator. Instead of using Boolean operators directly, the Boolean operator functions can be exercised in some search engines such as Lycos and Open Text Index, by selecting, for example, “all of the terms” (equivalent to AND function), or “any of the terms” (equivalent to OR function) from the menu appearing in the search screen. As is evident from Table 1, to conduct a proper search a user has to be familiar with the different search functions associated with each search engine. Otherwise, the user is likely to become confused and miss information. By referring to summary search tables such as Table 1, one can perform information retrieval more efficiently and easily (29).
925
claimed by Internet providers or reported by third parties were found to be user-friendliness (Inktomi, Lycos, Open Text Index, and WWW Worm), search speed (Infoseek, Inktomi, and Lycos), frequent updates (Hot Bot and Yahoo), summary display (Harvest), and subject search (Yahoo). These attributes are changing rapidly with the phenomenal advance in the features of search engines. Preliminary Searches Conducted Using Lycos and Alta Vista A preliminary search on Lycos was performed by using the equation {myocardial AND cost}. The relevance factor, which is mentioned below, of the first 10 retrieved articles was 0.7. number of relevant articles relevance factor = total number of retrieved articles The search on AltaVista resulted in a relevance factor of 0.2 by applying the same search equation {myocardial AND cost} as above. Both searches were performed in July 1996. It was discovered that there are is some useful information on “cost effectiveness of myocardial infarction,” and the previously proposed search equations were found to work to some extent. Therefore, further investigation was conducted for all 20 search engines.
Characteristics of Search Engines Each search engine has its own unique attributes. Those attributes, the total number of URLs, contents included in the search engine in question, and comments on each of the 20 search engines examined are listed in Table 2. Search engines with more than 60 million URLs include DejaNews, Lycos, and NlightN. The next largest group, whose URLs number approximately 50 million, includes Excite, Hot Bot, and Infoseek. The contents of search engines can be full-text, hypertext, abstract, title, and article. Unique and/or preferred attributes of search engines
Selection of Search Terms and the Logic Behind the Selection In order to compare and evaluate the various search engines, an actual search was performed on the topic of “cost-effectiveness for the treatment of myocardial infarction” in August and September 1996. Since the concept of cost-effectiveness can be expressed by a wide range of key words and many of the Internet search engines do not support sophisticated and complex search equations, a major key word “cost” only was used. Another reason for this is that since this one-
926
Eiichi Akaho and Syed Rizwanuddin Ahmad TABLE 2 Internet Search Engines/Tools Characteristics
Internet Search Engines/Tools
No. of URLs(M)
Content
Alta Vista
30
Full-text
Aqui
NA
NA
DejaNews
60*
Usenet articles
Excite
50
Full-text, titles
Galaxy
NA
Full-text
Harvest
NA
NA
Hot Bot
54
Full-text
InfoMine
NA
NA
Infoseek
50
Title
Inktomi
30
Full-text
Comments Offers compact detailed searches through the world’s “largest” web index. Since no editorial decisions have been made regarding content, it also has the largest “noise to signal” ratio Keeps track of links between web pages Searches Usenet newsgroups, finding the topic of interest by a weighted search criteria that sorts information based on the number of occurrences of the target word It searches the WWW, Usenet, Usenet classifieds, and its own database of Web site reviews. It can do a proper word search which works on two word combinations. The first letter in each word has to be capitalized then the words are searched in order Includes a search engine that supports limiting by type of site and fulltext searching Applies different search engines on the same data. Displays exhaustive content summaries of the documents Capable of indexing the entire WWW every week. Allows limiting searches by date, domain, or continent It is a unique web resource featuring well organized access to important university level research and educational tools on the Internet. In the life sciences, it provides interactive access to approximately 300 databases Its virtues are speed, accuracy, and ease of use. With each search it gives the most relevant matches, related topics to explore, and timely news. Its biggest strength is that it is a terrific way to search Usenet It is a quick, easy, and fairly accurate search engine. Its limitation is a tendency to list multiple related documents (continued)
Comparing Internet Search Engines
927 TABLE 2 Continued
Internet Search Engines/Tools
No. of URLs(M)
Content
Lycos
66
Titles
Magellan
10
Titles, review
MetaCrawler
200+
Titles, abstracts
NlightN
66+
Full-text, abstracts
OpenText Index
1.5
Full-text
SavvySearch
30
Full-text
Veronica
24
Titles, abstracts
Webcrawler
1.6
Full-text
WWW Worm
4
Full-text, hypertext
Yahoo
NA
Titles, abstracts
Notes: URLs = Uniform Resource Locators. M = million. NA = Information Not Available.
Comments Its strong points are its speed, ease of use, huge database, and abstract of each hit that is displayed It is an online guide to the Internet that contains a directory of rated and reviewed sites, along with an index of lots of unreviewed sites Extraordinarily powerful, multithreaded site that takes search terms to multiple sites including Alta Vista, Excite, Galaxy, InfoSeek, Inktomi, Lycos, OpenText, and Yahoo. Selection by region (country, domain), continent, or domain (.com,.edu, etc.) It has the world’s largest table of contents. It indexes not only the web, but reference works, news wires, books, dissertations, and many public and private databases It has an easy-to-use interface and offers plenty of powerful tools. It produces clear and helpful results It is a useful metasearch site that compiles results from 23 different Web search engines but its sophistication cannot match that of MetaCrawler It is an index and retrieval system which can locate items on most of the gopher servers in the Internet It offers a random-links feature to find new and unusual sites. An indexing program filters out identical (or very close to identical) documents Good for simple, one or two-word topic searching as well as generating lists of URLs in a certain area, eg, lists of different organizations. It is possible to search for references and multimedia products Pioneer Internet guide which allows users to both browse and search subject categories. Features include upto-the-minute sports scores, weather, headlines, and stock quotes
928
Eiichi Akaho and Syed Rizwanuddin Ahmad
word (cost) search is rather nonspecific, a better recall factor, which is mentioned below, is usually obtained, although it is likely that irrelevant records are retrieved such as “cost of myocardial infarction on the health care system.” Of course, irrelevant records will be eliminated later on by checking each retrieved record in order to calculate the relevance factor as seen in Table 3. number of relevant articles recall factor = total number of relevant articles in the source As for the representation of the latter concept, that is, myocardial infarction, these two words should be treated as a phrase, because it is very likely that “myocardial infarction” appears as such in the literature. Therefore, the typical search equations (Dialog style, and SilverPlatter style) to fulfill the need of the next search are as follows: • Dialog style: 1. Select myocardial(w)infarction, 2. Select cost, and 3. Combine 1 and 2. • SilverPlatter style: 1. Find ‘myocardial infarction,’ 2. Find ‘cost,’ and 3. Find #1 AND #2. Selection of Search Engines In order to achieve this search strategy by using the 20 search engines, different types of search equations should be used. Table 3 summarizes the possible search equations as well as search results of each search engine. Although many search engines allow simple and advanced queries, a simple query function was used and tested as a general rule. When satisfactory function cannot be found in a simple query, an advanced query was considered. It was found that as many as 10 different search equations should be constructed for the search of this topic in the 20 search engines examined (see Table 4).
The most common pattern was to use the three key words (myocardial, infarction, and cost) in any order and to select from the menu one of the suggested search options which is equivalent to use of a Boolean operator AND. The next common pattern was to use a Boolean operator ‘and’ between each word as was seen in InfoMine, Galaxy, and Veronica. It is important to note that in the case of Excite this was done by ‘AND,’ not by ‘and.’ Therefore, if a researcher intends to perform Boolean ‘AND’ search by typing “myocardial and infarction and cost” in Excite, he/she ends up performing only an ‘OR’ search, not an ‘AND’ search. As shown in Table 3, the ‘AND’ search in Excite retrieved 1454 records, but a search by “myocardial and infarction and cost,” which is an ‘OR’ search, retrieved an amazingly large number of records, 805840. Thus, special attention should be paid to the ‘AND’ search in Excite. Another striking but subtle difference in ‘AND,’ ‘OR’ searches was found in Alta Vista. This was the difference between two similar searches: ‘+’ “myocardial infarction” ‘+’ cost and “myocardial infarction” ‘+’ cost. The former was an ‘AND’ search and the latter was a modified form of ‘OR’ search. In this modified ‘OR’ search, the word after ‘+’ should always be included in the document but the word without ‘+’ (in this case “myocardial infarction”) may or may not be included in the document. In fact, the former search retrieved only 800 records, while the latter retrieved about 30000. In MetaCrawler, Inktomi, and Magellan, ‘+’ acts as a Boolean operator ‘AND.’ Therefore, it should be noted that in the Internet ‘+’ can be used as Boolean ‘AND’ or Boolean ‘OR,’ depending upon which search engine is used. Checking Records to Calculate the Relevance Factor Some search engines rank the results in the order of relevance and the other search engines list articles in order based on posting dates. Since it is impossible to go through all of the postings, the first 20 postings were selected for an evaluation of relevance.
Comparing Internet Search Engines
929
TABLE 3 Searches Conducted on Internet Search Engines on “Cost Effectiveness of Myocardial Infarction” Retrieved Records
Hits
% Hits
Alta Vista Aqui
800 0
2/20 –
10 –
DejaNews Excite Galaxy Harvest Hot Bot
8 1454 0 0 2208
NA(1) 1/20 – – 1/20
– 5 – – 5
InfoMine Infoseek Inktomi Lycos(a)
0 23 121 1
– 2/23 2/20 1/1
– 9 10 100
Lycos(b)
2
2/2
50
Lycos(c)
16
4/16
25
Magellan MetaCrawler NlightN Open Text Index SavvySearch
948 66 2 99
4/20 NA(2) 2/2 0/20
20 – 100 0
65
NA(3)
–
Veronica WebCrawler WWW Worm
0 32 0
– 2/32 –
– 9 –
0[22020]
[1/20]
[5]
Search Engines
Yahoo
Search Strategies Applied + “myocardial infarction” + cost myocardial infarction cost (selected “match all words”) by Eureka “myocardial infarction” & cost myocardial AND infarction AND cost myocardial and infarction and cost “myocardial infarction” AND cost myocardial infarction cost (selected “all words”) myocardial and infarction and cost [“myocardial infarction” cost] myocardial + infarction + cost myocardial infarction cost (used “all terms and strong match”) myocardial infarction cost (used “all terms and close match”) myocardial infarction cost (used “all terms and fair match”) myocardial + infarction + cost (myocardial infarction) + cost myocardial&infarction&cost performed “powered search by selecting AND” myocardial infarction cost (selected “all query terms”) myocardial and infarction and cost “myocardial infarction” AND cost myocardial infarction cost (selected “match all key words”) myocardial infarction cost (selected “all key words”), [powered by Alta Vista]
Notes: 1. (1) The output description of the search result was too short to evaluate. 2. (2,3) The first 20 retrieved results were from some other search engines listed in this table. 3. When the number of retrieved records exceeded 50, only the first 20 records were evaluated. The evaluation for relevance of the retrieved record is mainly based on the examination of the general output of the search result.
Therefore, percent hit (or relevance factor) is not an absolute evaluation. Based on this condition, the results on the percent hit or relevance factor are presented. Lycos and Magellan turned out to be fairly good. It is surprising that Galaxy, Veronica, Harvest, WWW Worm, InfoMine, and Aqui did not
retrieve any records. Yahoo also showed no matched records but showed 22020 records by selecting and searching an option in which Yahoo is powered by Alta Vista; in other words, Yahoo’s search capacity is powered up by Alta Vista. Since the direct search by Alta Vista using a search equation “+myo-
930
Eiichi Akaho and Syed Rizwanuddin Ahmad TABLE 4 Types of Search Equations Needed to Search 20 Search Engines/Tools for a Combination of “Phrase” Search (Myocardial Infarction) and Boolean AND Search (AND Cost)
Search Equation
Search Engines/Tools
Myocardial infarction cost (select one of options such as “all key words,” “all query terms,” “match all words” Myocardial and infarction and cost Myocardial + infarction + cost “Myocardial infarction” AND cost + “Myocardial infarction” + cost (simple search) [“Myocardial infarction” cost] Myocardial AND infarction AND cost (Myocardial infarction) + cost Myocardial & infarction & cost “Myocardial infarction” & cost
Aqui, Hot Bot, Lycos, Open Text Index, SavvySearch, WWW Worm, Yahoo
cardial +infarction +cost” retrieved about 800 records, there is a big discrepancy between the Yahoo search powered by Alta Vista and the Alta Vista search by itself. It seems that “all key word” function, which usually means AND search, in Yahoo is not a real ‘AND’ search, but an ‘OR’ search or a mixture of ‘AND’ and ‘OR’ searches. This type of discrepancy between the Internet literal search menu and the real search function is often seen in the Internet.
two multisearch engines studied, several new relevant records were found which had not been found by a search conducted on an individual search engine. Since the search was restricted to one particular search topic, that is, “cost effectiveness of myocardial infarction,” however, it is too soon to judge whether SavvySearch and MetaCrawler meet this criterion, that is, a function in which multisearch engines retrieve a collection of good records of each constituent search engine, or not. Therefore, further study by using various search topics is needed to demonstrate the evidence for the above hypothesis. There is a convenient function in SavvySearch that displays a group of three other search engines after the initial search; the most relevant group first, the moderately relevant group next, and the least relevant group last. Those are recommended search engines to use for the search of a particular topic. One can simply move from one group to another to further conduct the search if the current search does not produce a satisfactory result.
Multi-Search Sites/Engines SavvySearch and MetaCrawler are termed multisearch sites/engines, since they allow simultaneous searches through several different search engines. The former showed 65 records and the latter 66. The examination of the first 20 records indicated that all of them are from one of the other search engines investigated in this study. This type of search tool is useful, especially when it retrieves a collection of good records of each constituent search engine, that is, “pick only good ones from other good engines” or “get the most by the least work.” If this criterion is met, then it is not necessary to spend time in searching many other search engines for a comprehensive Internet search, and it eventually saves quite a bit of time. In each of the
InfoMine, Galaxy, Veronica Inktomi, Magellan Harvest, WebCrawler Alta Vista InfoSeek Excite MetaCrawler NlightN DejaNews
SUMMARY AND SUGGESTIONS 1. There are many different types of expressions for cost effectiveness. Therefore, it
Comparing Internet Search Engines
is suggested that one start with just “cost.” It is always possible to narrow down the scope by adding a secondary term such as “effectiveness,” “benefit,” and “efficiency” to it, 2. Taking into account both authors’ search experience and the search output of the current study, it is evident that a concept of “cost effectiveness” can be expressed without having the word cost using, for example, the words pharmacoeconomics, economic factor, or medical economics. Therefore, it is suggested that one should consider an alternative approach by using such search terms as “pharmacoeconomic*” and “economic*” where the symbol * represents truncation, 3. Although the MeSH thesaurus term used in Medline is “cost benefit,” “cost effectiveness” is used more frequently in the scientific literature and a search by “cost effectiveness” results in retrieving more relevant articles than by “cost benefit.” It would be more realistic that “cost effectiveness” be listed as the MeSH thesaurus term, 4. As far as a search for myocardial infarction is concerned, “myocardial infarction” should be searched as a phrase provided that the search engine has a phrase search capability. If one just uses “myocardial” one may end up with retrieval of articles on “myocardial ischemia” and “myocardial revascularization” which are not directly of interest. In other words, a search by “myocardial infarction” reduces the ‘noise’ factor or increases the relevance factor. In general, on the other hand, a phrase search reduces a recall factor because the search concept is further narrowed in detail. In the case of a search by “myocardial infarction,” however, the reduction of recall factor may not be significant, because myocardial infarction can be treated as a single word term rather than a two-word term. This type of phenomenon can be seen in other search terms such as angina pectoris, Hodgkin’s disease, and multiple sclerosis, 5. A number of articles report comparative
931
studies on search functions of Internet search engines. There are no studies, however, on a specific topic such as “cost effectiveness for the treatment of a certain disease.” Once the preferred methodology is established in this type of search, the technique can be utilized for other diseases as well. Examples of those searches will be “cost effectiveness of hypertension,” “cost effectiveness of asthma,” and “cost effectiveness of angina pectoris,” 6. The search features of the selected 20 search engines differed considerably. For just a Boolean AND function, a user needs to choose one of at least six different patterns such as &, and, + between the terms, + in front of each term, AND, and, a selection of an appropriate item from the menu, depending upon the search engine used. This is rather surprising and it indicates that Internet search is not as easy as it is reported in some media, and 7. When one constructs such a search equation using three search terms A, B, and C as (A B) AND C, where AND is a Boolean AND, one should be able to make about 10 different search equations for 20 search engines. This means that, roughly speaking, every other time one changes the Internet search engine one should construct a new search equation. Acknowledgments—The authors thank Mr. Keith W. Johnson, Dr. David C. Pang, and Mr. Kirill A. Burimski, all from the USP, for their suggestions and cooperation in this study. The authors are grateful to Dr. Anwar A. Hussain of the University of Kentucky for his cooperation on this project.
REFERENCES 1. Fiker CR. The Internet and the pediatrician: should there be a connection. Clin Pediatrics. 1996;38: 229–235. 2. Horbar JD, Sack J. Pediatrics electronic pages http:// www.pediatrics.org. Pediatrics. 1996;98:1193–1194. 3. Erhardt-Domino K, Pletcher T, Wilson, W, Atkins D, Panko WB. The Internet: will this highway serve the digital library? Bull Med Libr Assoc. 1994;82: 426–433. 4. Glowniak JV. Medical resources on the Internet. Ann Int Med. 1995;123:123–131.
932 5. Goldwein JW, Benjamin I. Internet-based medical information: time to take charge. Ann Int Med. 1995; 123:152–153. 6. Kramer JM, Cath A. Medical resources and the Internet: making the connection. Arch Int Med. 1996; 156:833–842. 7. Kassirer JP. The next transformation in the delivery of health care. N Engl J Med. 1995;332:52–54. 8. Marra CA, Lynd LD, McKerrow R, Carleton BC. Drug and poison information resources on the Internet, Part 1: An introduction. Pharmacother. 1996; 16:537–546. 9. Yentis SM, Ooi R. Anaesthesia and the Internet. Anaesthesia. 1996;51:677–682. 10. Block EFJ, Mire EJ. Trauma in the Internet: early experience with a world wide web server dedicated to trauma and critical care. J Trauma. 1996;42: 265–270. 11. Luckley RH, Sweet J, Knupp B. The Internet: an essential tool for college health networking. J Am Coll Health. 1996;45:6–10. 12. Anonymous. Internet surfing for ease of access to AHCPR-clinical practice guideline. Home Health Nurse. 1996;14:560. 13. Wink DM. Electronic education: an introduction to nursing education (part one). Nurse Educ. 1995;20: 9–13. 14. Wink DM. Electronic education: an introduction to nursing education (part two). Nurse Educ. 1996;21: 8–12. 15. Spooner SA. The Pediatric Internet. Pediatrics. 1996; 98:1185–1192. 16. Shorr RI, Orick JT. Surf’s up! A geriatrician’s introduction to the Internet. J Am Geriatr Soc. 1995;43: 1298–1302. 17. Wallis JW, Miller MM, Miller TR, Vreeland TH. An
Eiichi Akaho and Syed Rizwanuddin Ahmad Internet-based nuclear medicine teaching file. J Nucl Med. 1995;36:1520–1527. 18. Richardson ML. A world-wide web radiology teaching file server on the Internet. AJR Am J Roentgenol. 1995;164:479–483. 19. Buhle EL Jr, Goldwein JW, Benjamin I. OncoLink: a multimedia oncology information resource on the Internet. Proc Annu Symp Comput Appl Med Care. 1994;103–107. 20. Ho K, Grunfeld A. The Internet in emergency medicine: An Internet primer for emergency physicians: Part 1. J Emergency Med. 1996;14:771–776. 21. Anonymous. Inside the Internet. COBB. 1997;4(3) http://www.cobb.com/int. 22. Glowniak JV, Bushway MK. Computer networks as a medical resource: accessing and using the Internet. JAMA. 1994;271:1934–1939. 23. Lincoln TL. Traveling the new information highway. JAMA. 1994;271:1955–1956. 24. Ho K. Security and accuracy of medical information on the Internet. Can Med Assoc J. 1996;154:1621– 1622. 25. Doyle DJ. A clinician’s experiences on the Internet. Can Med Assoc J. 1996;154:382–384. 26. Edelberg S. Web Search Tools: a literature review. http://www.amherst.edu/,seedelbe/search.html. Retrieved on December 21, 1996. 27. Regennitter FJ, Silveira AM. Ortho bytes: Searching the Internet. Am J Ortho Dentofacial Orthop. 1996; 109:565–569. 28. McKinney WP, Bunton G. Exploring the medical application of the Internet: a guide for beginner users. Am J Med Sci. 1993;306:141–144. 29. The Netsearcher’s cheat sheet. PC Magazine. 1996; 15:209–249.