Are custom built trauma and orthopaedic internet search engines ...

4 downloads 244 Views 517KB Size Report
Specialist sites even now have unevaluated information due to the volume of material ... useful than commercial search engines for identifying orthopaedic ...
HOME HELP FEEDBACK BROWSE ARTICLES BROWSE BY AUTHOR

Warning: This article has not yet been accepted for publication by a peer reviewed journal. It is presented here mainly for the benefit of fellow researchers. Casual readers should not act on its findings, and journalists should be wary of reporting them. clinmed/1999120015v1 (December 22, 1999) Contact author(s) for copyright information

This Article Abstract HTML Page - Search engine table 5.htslp

Services

Are custom built trauma and orthopaedic internet search engines any use?

Similar articles in this netprints Download to citation manager

Citing Articles Citing Articles via Google Scholar

Adriano M. Sala Tenna, Patrick D.R. Addison, and Christopher W. Oliver

Google Scholar Articles by Sala Tenna, A. M.

INTRODUCTION

Articles by Oliver, C. W. Search for Related Content

The Internet has now become a popular tool for the acquisition of medical information. There are an estimated 800 million pages of data available to the health professional and layperson. It is now impossible to identify the total number of websites providing specialist information on orthopaedics and trauma surgery. There are vast

PubMed Articles by Sala Tenna, A. M. Articles by Oliver, C. W.

Related Collections Surgery: Orthopaedic and Trauma Surgery Medical informatics: World Wide Web

differences in the quality of the information presented, therefore, making it difficult to determine its accuracy and credibility. As an expanding medium for information technology the use of the Internet by certain groups such as general practitioners is increasing rapidly. Perceived benefits are the ease of use and provision of up-to-date relevant clinical information. A recent National Opinion Poll (NOP) study1 showed that 15% of general practitioners use the Internet; a separate NOP study showing that 81% of general practitioners had accessed the Internet in the previous month spending on average a 30 minute period on their most recently visited website. Forty-six percent had tried to obtain diagnostic information by e-mail and 41% had participated in a discussion group.

One of the strategies for finding medical information on the Internet is via a search engine. These may either be general or specialist, some having positively-vetted, academic and peer reviewed articles, others having online opinions or support groups. Thus, the evaluation of these sites by the general public becomes more difficult and poor quality information becomes a genuine problem, as at present, there is no control as to the credibility of the information accessed or supplied on the World Wide Web.2 The type of information media available includes text in the form of letters, articles and textbooks and extends through to pictures, videos and online discussion groups. Due to the vastness of the data, approximately 20 million pages added to the Internet each month, 3 there is the problem of disorganisation, misinformation and incorrect information being presented as fact.4 Specialist sites even now have unevaluated information due to the volume of material available or the potential to click through a peer reviewed website. The Health on the Net5 (HON) code of conduct has been developed by a Swiss non-profit organisation for attempting to give guidance as to the quality of information supplied on the Internet. This is a set of rules, which aims to identify relevant websites. Information must only be given by medically or health trained and qualified professionals unless a clear statement is made that a piece of advice offered is from a non-medically or health qualified individual or organisation. Sites complying with the code are then allowed to display the HON logo on their site. Although sites are awarded this privilege it does not mean that the site is worth using, as there is a disclaimer clause within the code. Each site must still be assessed on its own merit. This system has, however, helped to evaluate website information. In this study we aim to determine whether custom built orthopaedic search engines are more useful than commercial search engines for identifying orthopaedic information on the Internet for various clinical problems relevant to the general public. METHODS Three orthopaedic scenarios with which we thought the general public would commonly frequent the general practitioners or be admitted to hospital suffering from were used to test the search engines. Each scenario used keywords to identify specific information on the Internet. The scenarios chosen were as follows:

1.

A relative has sustained a femoral fracture and they want to find information on a proposed operation - keyword "femoral fracture"

2.

A relative wants to know the outcomes of a total hip replacement - keyword "total hip replacement"

3.

A patient wants to know the natural history of osteoporosis and its treatment - keyword "osteoporosis".

Five Internet search engines were used to access the information for the three scenarios; two commercially available search engines - Altavista6 and Yahoo7 - and three specialist orthopaedic search engines - Orthoguide8, Orthotraining9 and OMNI (Organised Medical Network Information Gateway10). The study was undertaken over a three-week period between 7th April 1999 and 28th April 1999. Sites were all identified on the 7th April 1999 to allow for a comparison at a fixed point in time. These sites were then browsed over the three-week period. Although the originally identified site may have been replaced or deleted on subsequent lists of sites, the Internet address was still available and used to search that site. Three hundred websites were examined in total. The first twenty websites for each of the five search engines were recorded and inspected in detail. The total number of sites found by each search engine for each clinical scenario was recorded. Duplicate websites were included in the total count and facilitated comparison of duplication across the search engines. Duplication rate was calculated within each search engine. Sites unavailable at the time of initial searching were recorded. Unavailable websites at the time of initial searching were reaccessed at the end of the three-week search period and included in the search if subsequently available. Inaccessible websites were recorded. Sites were then assessed for relevance to the orthopaedic scenario. It was only possible to give a subjective opinion on the website. Sites not giving information relating to the scenario were discounted as irrelevant. Irrelevant sites were those including advertising, biomechanical data, foreign language, online opinion, other medical or surgical speciality, personal profile and veterinary material. The remaining websites were then assessed for the following criteria:

Authorship and source: is the author identifiable and where were their professional qualifications. Objectives and disclosure: What was the objective of the website and who had ownership of it. Was web site “ownership” prominently and fully disclosed, was any sponsorship, advertising, underwriting, commercial funding arrangements or support, or potential conflicts of interest indicated. Currency: Was the material on the site up-to-date. Dates that content was posted and updated should be indicated. Attribution: Were references and sources for all content listed clearly, and all relevant copyright information noted. Contacts: Was there a clearly identifiable person on the website who could be contacted. Links: Were the weblinks to other sites relevant. Disclaimers: Did the website have a disclaimer on it indicating the usage of information on the website. HON code: Was the HON Code displayed.

RESULTS Total number of websites found for each clinical scenario by each search engine is shown in Table 1. The most successful commercial search engine is AltaVista as compared to Yahoo. AltaVista is also more successful in identifying scenario websites than any of the specific orthopaedic search engines. Orthotraining identifies more sites than OMNI and Orthoguide. Table 1. Search Engines showing the total number of websites found for each scenario. AltaVista 274

Femoral fracture Total hip 2454 replacement Osteoporosis 110810

Yahoo 89

Orthoguide 8

Orthotraining 851

OMNI 0

1

10

612

3

68

36

298

20

The total number of websites, which were unavailable or irrelevant to the scenarios of the first twenty websites inspected, is shown in table 2. Specific orthopaedic search engines more frequently have unavailable or irrelevant sites than commercial search engines. Table 2. Number of websites unavailable or irrelevant. Femoral fracture Total hip replacement Osteoporosis Total websites

AltaVista 13 12 3 28(47%)

Yahoo 9 0 5 14 (34%)

Orthoguide 7 6 8 21 (55%)

Orthotraining 16 15 18 49(82%)

OMNI 0 3 5 8(35%)

Irrelevant/unavailable The website duplication rate across the search engines is shown in table 3. Consistently higher rates of duplication are found overall in commercial search engines due to the larger number of sites found. Orthopaedic search engines showed zero duplication rates for seven of the nine scenarios, two sites having a 20% and 50% duplication rate respectively which may reflect the immaturity of the search engines. Table 3. Duplication rate of first twenty sites accessed (%). Femoral fracture Total hip

AltaVista 0%

Yahoo 18%

Orthoguide 0%

Orthotraining OMNI 0% 0%

13%

0%

0%

0%

0%

replacement Osteoporosis 24%

13%

50%

0%

20%

The number of relevant sites identified from the first twenty websites found by each search engine is shown in table 4. The totals of relevant sites are seen in Table 4. The results show that in identifying relevant sites overall, AltaVista > Yahoo > OMNI> Orthoguide = Orthotraining. Commercial search engines are more successful in identifying relevant sites in the first twenty "finds" than specific orthopaedic search engines. Table 4. Number of relevant Websites discovered by each engine for first twenty websites of each scenario. Figures in brackets are percentages. AltaVista 7(35)

Femoral fracture Total hip 7(35) replacement Osteoporosis 13(65) Total 27 (45) relevant websites

Yahoo 9(45)

Orthoguide 1(5)

Orthotraining 4(20)

OMNI 0(0)

1(5)

4(20)

5(25)

0(0)

13(65) 23 (38)

6(30) 11 (18)

2(10) 11 (18)

12(60) 12 (20)

Table 5 shows the breakdown of assessment criteria for each of the relevant sites found by each search engine for each specific scenario. Link to Table 5 Table 6. Success rate of search engines for accessing appropriate websites from relevant websites. Search engine Yahoo AltaVista Orthoguide Orthotraining OMNI

% 85 68 67 53 25

DISCUSSION Both commercial and custom Internet search engines found a large number of orthopaedic websites. The academic referencing site OMNI found a fewer number of resources. The vast numbers of discovered web pages create problems with the ability to browse such a large number of sites. AltaVista, for example discovered 110810 sites for osteoporosis, which is impossible to browse effectively. Options for refining the search are available but may not be used by the average user. Internet search engines have several problems. The indexing processes of the commercial engines vary between the engines and vary over time. Therefore it can be very useful to combine the results of multiple engines when searching for recent information. A common defect of search engines is that they return too many pages, and that many of the pages have low relevance to the query. Search engines can be more comprehensive while still returning the same set of pages first. One of the main problems is that the search engines do not rank the relevance of results very well11. Specific orthopaedic search engines appear to be poor and often access only a very small amount of relevant information, some drawing less than twenty sites for review. Refining or altering keywords for the low scoring search engines was attempted to improve the number of sites found i.e. "femoral fracture" altered to "fracture of femur", "fracture + femur" and "fracture of femur + humans". This did not give any better search results with Orthoguide. Commercial search engines used a cruder sub-categorisation that also found the same sites, thus being as effective as specific search engines. Comparison between search engines can be inaccurate and difficult due to some sites having no relevant information.

We did attempt to use the search engine Orthosearch (www.orthosearch.com). This search engine was not possible to use in this study because it sub-divided the database of sites found into subcategories, which were not directly searchable across the whole of Orthosearch. The disadvantage with Orthosearch is that it found 15 subcategories for osteoporosis, which would then need to be searched for the relevant sites across Orthosearch. From the results in Table 4, OMNI appears to be the most successful of the specific orthopaedic search engines when all the relevant sites found for each of the scenarios are added together, albeit by one site. This is probably not the case as in only one category did it managed to find more relevant sites than its counterparts out of the twenty sites. Duplication rate throughout the search is shown to be low, however, there are less sites overall for the specific orthopaedic search engines. This makes comparison inaccurate but the orthopaedic search engines seem to have a lower total duplication rate. This is likely to be due to websites being specifically selected for their content and a smaller number being available limiting the duplicates produced. Searching the listed websites to be able to assess quality of the Internet material is difficult as there is sometimes minimal information at the page indexed by the search engine. Links may then have to be visited across the site to find ownership, update and authorship information. This makes the results difficult to interpret, as the actual page is not being assessed but its home page or subsequent links; in practice this is what would happen when browsing the net. The assessment criteria of the relevant and appropriate sites found show that authorship and objectives are more commonly well represented on the sites found. Currency, update, references, disclaimer and HON code are consistently the most frequently omitted criteria on the web pages. The HON code is poorly used. Our study identifies only two sites with the HON code displayed; these were in fact the same site. This low score could be accounted for by the fact that sites are being produced at a greater rate than they can be recorded by the HON organisation. OMNI also reviews sites according to its own quality criteria but may not have the resources to index quality submitted material. OMNI perhaps for this reason identifies very few sites for femoral fracture and total hip replacement. There are an estimated 800 million pages on the World Wide Web11 at present. The indexing of the Internet does not cover the entire web so no search will be entirely complete. A study by the NEC research institute in December 1997 used a similar method to estimate coverage,

recency, invalid links and document age on commercial search engines. Determination of coverage found that search engines were more likely to find the more popular information indexed. The website duplication rate is relatively low, at present the most reliable method of finding relevant information would be to use multiple search engines. One of the main reasons that search engines return too many results is that Internet users tend to make queries that result in poor precision. In knowledge retrieval there is typically a tradeoff between precision and recall. Internet users tend to make simple queries that can return thousands or millions of documents. Ranking the relevance of these documents is a difficult problem, and the desired documents may not appear near the top of the list but this may be a way forward. Formulating a more precise query can be very helpful. Ways to improve the precision of results include using more query terms, telling the search engines that relevant documents must contain certain terms, using phrases or proximity and using constraints provided by some engines11. The Dublin Core Metadata, a system to improve the internal tagging of HTML may give some improvement of locating information in the future as Internet search engines may be able to search on MeSH headings within the webpages.12. The simple HTML "keywords" and "description" metatags are only used on the homepages of 34% of sites. Only 0.3% of sites use the Dublin Core metadata standard13. Search engine indexing and ranking may have economic, social, political, and scientific effects. For example, indexing and ranking of orthopaedic resources could affect economic viability of a research institution. Delayed indexing of scientific research can lead to the duplication of work. Delayed or biased indexing may affect surgical or political orthopaedic decisions11. REFERENCES 1. http://www.nop.co.uk/internet/surveys/pr19.htm 2. Oliver C.W. Trauma and Orthopaedic Surgery on the Internet. Journal of Bone and Joint Surgery 1999;81-B: 3-6. 3. http://www.research.digital.com/src/whatsnew/sem.html 4. http://omni.ac.uk

5. http://www.hon.ch/HONcode/Conduct.html 6. http://www.altavista.com 7. http://www.yahoo.com 8. http://www.orthoguide.com 9. http://www.orthotraining.com 10. http://www.omni.ac.uk 11. http://www.neci.nj.nec.com/homepages/lawrence/websize.html 12. The Dublin Core: A Simple Content Description Model for Electronic Resources http://purl.org/DC/ 13. http://www.metrics.com/