Multidisciplinary Perspectives on Web Search Engines

6 downloads 3458 Views 138KB Size Report
4Critical responses to PageRank, and Google overall, will be discussed below. 5For a comparison of Web search engine ranking algorithms, see Borodin et al.
01 02 03

Web Search Studies: Multidisciplinary Perspectives on Web Search Engines

OF

04 05 06

Michael Zimmer

07 08

RO

09 10 11 12 13

Introduction

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Perhaps the most significant tool of our Internet age is the Web search engine, providing a powerful interface for accessing the vast amounts of information available on the World Wide Web and beyond.1 While the first Web search engines focused on providing navigation to the various pages across the World Wide Web, search providers have steadily expanded their searchable indexes to include a wide array of online information, such as images, news feeds, Usenet archives, and video files. Additionally, search engines have begun digitizing the “material world,” adding the contents of popular books, university libraries, maps, and satellite images to their growing indexes. Further, users can now search the files on their hard drives, send e-mail and instant messages, shop online, engage in social networking, organize photos, share videos, collaborate on projects, and publish blogs through various Web search engine offerings. Consequently, users increasingly rely on a growing infrastructure of search-related services and tools to locate and access information online, as well as to communicate, collaborate, navigate, and organize their lives. While still in its infancy compared to the knowledge tools that preceded it,2 the impact of Web search engines on society and culture has already received considerable attention from a variety of perspectives. Consistent with most other areas of Internet research, interest in Web search engines as a research topic crosses

TE

16

33 34 35

CO RR EC

15

DP

14

38

M. Zimmer (B) School of Information Studies, University of Wisconsin-Milwaukee, Wisconsin, USA e-mail: [email protected]

39

1 According

37

40 41 42 43 44 45

to the Pew Internet & American Life Project, 84% of American adult Internet users have used a search engine to seek information online (Fallows, 2005: 1), making searching the Web the second most popular online activity (behind using e-mail) (Rainie, 2005). In August 2007, over 750 million people worldwide over the age of 15 conducted a search, totaling more than 61 billion searches (Burns, 2007). 2 The first full-text Web search engine was WebCrawler, launched in 1994 (InfoSpace, 2007), making search engines relative “teenagers” compared to other tools and technologies for organizing and retrieving information. Encyclopedias, for example, date back to the first century AD.

UN

36

J. Hunsinger et al. (eds.), International Handbook of Internet Research, C Springer Science+Business Media B.V. 2010 DOI 10.1007/978-1-4020-9789-8_31, 

507

508

47 48 49 50 51 52

disciplines, ranging from the quite technical areas of computer and information sciences into the social sciences, law, and the humanities, providing a kaleidoscope of perspectives on the significance of these contemporary knowledge tools. This chapter aims to organize a meta-discipline of “Web search studies,” centered around a nucleus of major research on Web search engines from five key perspectives: technical foundations and evaluations; transaction log analyses; user studies; political, ethical, and cultural critiques; and legal and policy analyses.3

OF

46

M. Zimmer

53

55

RO

54

Technical Foundations and Evaluations

56

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84

DP

59

Not surprisingly, some of the earliest research on Web search engines was technical in nature. Numerous computer and information scientists have contributed not only valuable research on improving and enhancing the underlying Web search engine technology but also technical analyses of the extent of coverage achieved by early search engine offerings and how well they met users’ information-seeking needs. The former set includes early research that formed the technical foundation of various Web search engines, some still in use today. For example, several papers focusing on crawling the Web – an essential function for any Web search engine – were presented at the first two World Wide Web conferences (Eichmann, 1994; McBryan, 1994; Pinkerton, 1994). These were followed by research describing more robust crawlers, such as Mercator (Heydon & Najork, 1999), one of the first scalable Web crawlers supporting the AltaVista search engine, and the distributed crawler that fuels the search engine Google (Brin & Page, 1998). Brin and Page’s paper, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” introduced Google to the world and its unique “PageRank” method of using the link structure of the Web to calculate a quality ranking for each page in its index (described in further detail at Page et al., 1998).4 Research into similar ranking schema was performed by Kleinberg (1999), who created the “Hypertext-Induced Topic Selection” (HITS) link analysis algorithm, and the similar “Stochastic Approach for Link-Structure Analysis” (SALSA) developed by Lempel and Moran (2000).5 A large body of technical research continues to be produced related to all aspects of Web search engine architecture and design,6 including clustering (see Steinbach et al., 2000), collaborative filtering (see Herlocker et al., 2004), personalization of results (see Khopkar et al., 2003; Keenoy & Levene, 2005; Teevan et al., 2005), understanding the structure of the Web (see Kleinberg et al., 1999; Broder et al., 2000; Kleinberg & Lawrence, 2001), and methods of

TE

58

CO RR EC

57

3 These categories are not necessarily mutually exclusive and are not put forth as airtight ontological

86

divisions. They are meant simply to help organize this interdisciplinary collection of studies to aid discussion. 4 Critical responses to PageRank, and Google overall, will be discussed below. 5 For a comparison of Web search engine ranking algorithms, see Borodin et al. (2001). 6 For a summary of literature on search engine design, see Arasu et al. (2001).

87 88 89 90

UN

85

Web Search Studies: Multidisciplinary Perspectives on Web Search Engines

97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121

OF

96

RO

95

DP

94

TE

93

crawling and indexing the hidden Web (see Bergman, 2001; Raghavan & GarciaMolina, 2001). Efforts by Bergman or Raghavan and Garcia-Molina to help search engines crawl and index the hidden Web – Web content that might be dynamically created only after a user query, password-protected pages, unlinked pages, and so on – reveal a key challenge faced by Web search engines: keeping up with the rapid growth and increased complexity of the World Wide Web. Early on, computer and information scientists understood the importance of evaluating the effectiveness of search engines in terms of their coverage: how much of the Web has a particular search engine indexed. Lawrence and Giles (1998, 2000) were among the first to address this key evaluative dimension of Web search engines, finding that any single search engine (at that time) had indexed only about one-third of the indexable Web and only 16% of the entire World Wide Web. A recent attempt to update this research estimated that Google has indexed around 68% of the indexable Web, with Yahoo!, MSN, and Ask following with 59%, 49%, and 43%, respectively (Gulli & Signorini, 2005). Moving beyond statistical analyses of coverage attained by Web search engines, other evaluative research has focused on measuring search engine stability and precision (Vaughan, 2004), comparisons of results between different search engines (Bar-Ilan et al., 2006; Spink et al., 2006), and frequency with which search engines update their indexes (Lewandowski et al., 2006). These studies are a subset within the larger area of information retrieval system evaluation,7 which typically focuses on measuring the precision, recall, and reliability of a system to estimate its performance (see, for example, Sanderson, 2005). By their nature, such studies remain focused on the technical properties of an information retrieval system, measuring, for example, how precise a particular search engine’s results are by comparing performance on a fixed set of queries and documents. The characteristics and considerations of the users of search engines remain absent from much of this research design, thereby overlooking the interactions between users and their search tools. To compensate, many Web search engine researchers have turned to transaction log analysis to bring the user into the focus.

CO RR EC

92

122 123 124

Transaction Log Analyses

125 126 127 128 129 130

Transaction log analysis takes advantage of the detailed server logs maintained by Web search engines to learn about users interaction with the search engine, such as the structure of search queries, length of time spent on search tasks, selection of search results. Large-scale transaction log analyses typically rely on large data sets provided by a particular Web search engine; three of the largest studies involved data

131 132 133 134 135

7A

UN

91

509

library of research on information retrieval systems can be found at the Web site for the ACM Special Interest Group on Information Retrieval: http://www.sigir.org/proceedings/Proc-Browse.html.

510

142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180

OF

141

RO

140

DP

139

TE

138

sets from AltaVista, Excite, and the German search engine Fireball. In his study of Fireball, Höelscher (1998) analyzed approximately 16 million queries processed in 1998, providing one of the first detailed examinations of search query structure. Höelscher discovered that the average search query length was 1.66 terms, with over 50% consisting of just one keyword, while less than 2% containing five or more keywords. The study also revealed that over 59% of users never went beyond the first page of search results, and 8 out of 10 users examined only 30 or fewer search results (Fireball presented 10 search results per page). A similar transaction log analysis was performed by Silverstein et al. (1999), this time taking advantage of a transaction log from AltaVista with almost 1 billion queries from a 42-day period of search activity. Along with a query structure analysis similar to the Fireball study (the average query length for the AltaVista data was 2.35 terms), Silverstein was also able to provide insights into user sessions with search engines, the series of queries made by a single user over a short period of time (in this study, the time cutoff was 5 minutes). The vast majority of sessions, over 77%, contained only one query, and for 85% of all queries, only the first screen of results was viewed. On average, users submitted just over two queries per session, seldom modified the original query, and only examined 1.39 screens of search results. A third major transaction log study by Jansen, Spink, and Saracevic (2000) analyzed one day’s worth of query data (over 51,000 queries from about 18,000 searchers) from the then popular Excite search engine. They reported on the number of queries per user, number of search terms per query, number of results clicked, query structure and modification, and the relative occurrence and distribution of particular search terms and subject areas. This research was expanded into a 6-year study, benefiting from additional search engine transaction logs, with updated results published a few years later (see Spink & Jansen, 2004). This longitudinal transaction log analysis provided new insights into various factors involving users search behaviors, including a growing diversity of search topics, increasing search query complexity, and an increase in the percentage of searchers viewing only the first page of results. Complementing these large-scale transaction log analyses of users search behavior are various secondary studies that rely on smaller sets of user data or focus on narrower aspects of user search activity. A summary and comparison of these smaller-scale studies is provided by Jansen and Pooch (2001), while Jansen and Spink (2005) have consolidated results of nine separate transaction log analyses to provide a valuable insight into Web searching trends. Yet, while transaction log analysis provides greater insight into users’ actual experience with Web search engines than formal technical analysis of search engine design, their ability to fully understand user behavior remains limited. Reliance on transaction logs to measure how many pages of results are viewed or how often search queries are modified or fails is unable to understand a user’s actual motivations or reasoning behind such actions. User studies, whether in a laboratory setting or ethnographic in nature, help reveal these missing insights.

CO RR EC

137

UN

136

M. Zimmer

Web Search Studies: Multidisciplinary Perspectives on Web Search Engines 181

511

User Studies

182

189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225

OF

188

RO

187

DP

186

TE

185

Notwithstanding the value of transaction log data analysis, these types of studies offer limited insights into the behavior of Web searchers beyond the decontextualized extraction of search queries and other session-based data that reside in the logs. Many researchers have turned to user studies, either in the form of controlled experiments, installation of tracking software, personal surveys, or qualitative ethnographies, in order to better understand usage of Web search engines and whether the needs of users are being met. Choo et al. (1998, 2000) combined three methods of data collection to better understand user search behavior: questionnaire survey, a software application that recorded Web browser actions, and personal interviews with participants. They collected Web search data from employees of seven different companies over the course of 2 weeks and provided insights into how a user’s motivations (the strategies and reasons for viewing and searching) and moves (the tactics used to find and use information) are important factors when studying and evaluating Web search activities. The work of Machill et al. (2004) and Höelscher and Strube (2000) similarly combined surveys, interviews, and transaction log analysis to characterize a number of information-seeking behaviors of Web search engine users. Hargittai’s (2002, 2004b) extensive use of surveys and in-person observation of search engine usage has provided insights into how people find information online in the context of their other media use, their general Internet use patterns, and their social support networks. Broadening the analysis of user behavior beyond transaction logs also allowed Hargittai (2004a) to contextualize user search behavior, revealing the ways that factors such as age, gender, education level, and time spent online are relevant predictors of a user’s Web searching skills and chance of successfully finding the information they seek. Other recent user studies include Wirth et al. (2007) whose experiments combined a client-oriented Web content analysis, a think-aloud protocol, and an online questionnaire to determine how users make decisions concerning the results with which they are presented after performing a search; various experiments to measure how users view and interact with the results page of a search engine using eye-tracking technology (see, for example, Pan et al., 2004); Dennis et al.’s (2002) controlled experiment to compare search effectiveness between query-based Internet search (via the Google search engine), directory-based search (via Yahoo), and phrase-based query reformulation-assisted search (via the Hyperindex browser), and Martey (2008) and Roy and Chi’s (2003) examinations of gendered differences in Web search engine use suggest that males and females demonstrate different Web navigation patterns. Thus far, the studies discussed in this section (as well as the previous one) sought to measure the particular use and effectiveness of Web search engines. Another important area of user-based research has focused instead on measuring user awareness of how Web search engines operate (such as whether users understand how sites are indexed and ranked by search engines) and how this awareness might impact

CO RR EC

184

UN

183

512

228 229 230 231 232 233 234 235

OF

227

their successful use of search tools.8 For example, recognizing that users ability to benefit fully from Web search engines (and mitigate any possible negative effects) is dependent on their understanding of how they function, Hendry and Efthimiadis (2008) examine the conceptual understandings that people have of search engines by performing a content analysis on the sketches that 200 undergraduate and graduate students drew when asked to draw a sketch of how a search engine works. Their user study reveals a diverse range of conceptual approaches, metaphors, representations, and misconceptions, calling attention to the importance of improving students’ technical knowledge of how search engines work so they can be better equipped to fully utilize these important tools.

236 237 238

Political, Ethical, and Cultural Critiques

243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266

TE

242

In their major research publication of longitudinal search engine transaction log analysis, Spink and Jansen (2004) note that the “overwhelming research focus in the scientific literature is on the technological aspects of Web search” and that when studies do venture beyond the technology itself, they are “generally focused on the individual level of analysis” (p. 181). This is evidenced above, where most user studies aim to construct the “typical user” in terms of their usage of Web search engines or awareness of typical business practices within the search industry. While understanding these issues from the user perspective is crucial, implicit within the questions these studies ask (“are searchers finding the information they’re looking for?”; “do searchers understand that some results might appear due to a business relationship with the search engine”?; “do searchers know their search activity might be tracked by the search provider?”) are broader concerns, such as access to knowledge, bias, and privacy. In response, a wide array of political, ethical, and cultural critiques of Web search engines have emerged. Introna and Nissenbaum’s (2000) seminal study, “Shaping the Web: Why the Politics of Search Engines Matter,” was among the first to analyze search engines from such a critical perspective. They noted that while search engines have been heralded as a powerful source of access and accessibility, they “systematically exclude certain sites, and certain types of sites, in favor of others, systematically giving prominence to some at the expense of others” (p. 169), thereby undermining the potential of the World Wide Web – and search engines as the portal to the Web – to be an inclusive democratic space. Expanding upon Introna and Nissenbaum’s foundation, similar critiques have been applied to Web search engines from a variety of standpoints. For example, Hargittai (2004b) has extended her user studies to include investigations of how financial and organizational considerations within the Web search engine industry impact the way in which content is organized, presented, and distributed to users.

CO RR EC

241

267 268 269 270

8 These

UN

240

DP

239

RO

226

M. Zimmer

concerns relate to some of the cultural and social issues that will be discussed in more detail below.

Web Search Studies: Multidisciplinary Perspectives on Web Search Engines

277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315

OF

276

RO

275

DP

274

TE

273

Van Couvering (2004, 2008) has engaged in extensive research on the political economy of the search engine industry in terms of its ownership, its revenues, the products it sells, its geographic spread, and the politics and regulations that govern it. Drawing comparisons to concerns over market consolidations in the mass media industry, Van Couvering fears that the market concentration and business practices of the search engine industry might limit its ability to serve “the public interest in the information society” (Van Couvering, 2004, p. 25). Diaz (2008) arrives at a similar conclusion, noting that the inherent limitations of a commercialized, advertising-based search engine – Google, in this case – inevitably lead to a failure to support deliberative democratic ideals. Lev-On (2008), however, offers a different view, arguing that Web search engines indirectly contribute to political organization and deliberative discourse by generating unintentional exposures to diverse and opposing views. In doing so, Web search engines “cater to the concerns of both deliberative democrats aiming at enriching the deliberative qualities of democratic discussion, and pluralist democrats who are concerned about making the political marketplace more open, inclusive and competitive” (Lev-On, 2008). Additional contributions to this debate include Hindman et al. (2003), Fortunato et al. (2005), and Menczer et al. (2006). Extending from these political critiques about their role in supporting deliberative discourse, Web search engines have also been scrutinized from a broader moral and ethical perspective. A recent panel discussion at the Santa Clara University Markkula Center for Applied Ethics was one of the first to bring together ethicists, computer scientists, and social scientists for the express purpose of confronting some of the “unavoidable ethical questions about search engines,” including concerns of search engine bias, transparency, censorship, trust, and privacy (Norvig et al., 2006). A special issue of the International Review of Information Ethics on “The Ethics of Search Engines” (Nagenborg, 2005) brought into focus many of the ethical issues highlighted in the Santa Clara panel. For example, Rieder (2005) and Welp and Machill (2005) argue for more openness and transparency in Web search engine practices, while Tavani (2005) expresses concern about the power of search engines to acquire information about persons online, narrowing the distinction between public and private information and threatening the privacy and liberty of individuals. Finally, Hinman (2005) provides a succinct summary of the persistent ethical issues in search engines, including the lack of transparency in ranking algorithms; concerns over the censorship of information by local authorities; and the problem of privacy with regard to the ability to track user search activity. Separately, Zimmer (2006) provides a brief introduction to how the practice of paid insertion and paid placement of search results presents challenges to a host of moral values, including freedom from bias, trust, and privacy. Tavani, Hinman, and Zimmer all touch on key privacy issues surrounding Web search: the routine practice of monitoring and archiving users’ search activities. This threat to user privacy has drawn considerable attention, especially in the wake of recent news events where Google resisted a government request to turn over records of search activity (see Kopytoff, 2006) and AOL’s release to the research community of over 20 million search queries from 658,000 of its users which

CO RR EC

272

UN

271

513

514

322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360

OF

321

RO

320

DP

319

TE

318

were insufficiently anonymized and exposing user queries (see Hansell, 2006). For example, Chopra and White (2007) have criticized Google’s defense of Gmail’s “reading” of incoming e-mail messages in order to place relevant advertising – the claim that no human reads the message – by arguing that as natural language processing and semantic extraction used in artificial agents become increasingly sophisticated, the corporations that deploy those agents, such as Google, will be more likely to be attributed with knowledge of their users’ personal information, thus triggering legitimate privacy concerns. Röhle (2007) expresses concern over how the push for the personalization of search results and related advertising results in the commercial exploitation of users’ personal information. Zimmer (2008a,b) shares Röhle’s anxiety over the privacy implications of the drive for personalization of results and advertising, and elsewhere he utilizes the theory of privacy as “contextual integrity” to help clarify these privacy concerns, arguing that Google’s widespread collection of user activity across all their products and services reflects a significant shift in the existing norms of personal information flows. Finally, Albrechtslund (2006) outlines the ethical problems and dilemmas that emerge as searching the Web increasingly becomes a form of surveillance, both for purposes of control as well as social play. Along with these political and ethical critiques, a number of scholars have studied and scrutinized Web search engines from a broader cultural studies perspective, revealing, for example, how they impact notions of time, cognition, and the construction of knowledge. Iina Hellsten and her colleagues (Wouters et al., 2004; Hellsten et al., 2006) have explored the ways in which search engines “re-write the past” due to the frequent updating of their indices, leading to both a loss of a historical record of content on the Web and a disruption in the “temporal structure” of the Web itself. Heintz (2006) explores the epistemological impact of Web search engines, arguing that the distributed nature of the link structure of the Web – and the related distributed assessment of Web pages based on incoming links – leads to a new cognitive and epistemic processes that are, themselves, distributive. Various cultural and critical theorists build upon Deleuze and Guattari’s (1987) notion of the rhizome when describing the potential of search engines to reflect the random interconnectedness of the Internet and to foster non-hierarchical information spaces.

CO RR EC

317

Legal and Policy Analyses

As the number of political, ethical, and cultural critiques of Web search engines increases, so do calls for solutions to the dilemmas they present. A key avenue for such remedies is through law and policy. Scholarly contributions to this pursuit include two key articles establishing the area of “search engine law”: Gasser’s (2006) detailed outline of the legislation and litigation that has emerged in the US legal system alongside the rapid rise of Web search engines, along with Grimmelmann’s (2007) systematic taxonomy of search engine law, broken down into the interests of the users, providers, third parties, and the search engines

UN

316

M. Zimmer

Web Search Studies: Multidisciplinary Perspectives on Web Search Engines

367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405

OF

366

RO

365

DP

364

TE

363

themselves. Together, these two articles provide the necessary legal backdrop to inform any attempt to regulate or legislate search engine practices. Other legal scholars have focused their attention on particular issues related to Web search engines, such as access control, bias, and manipulation of results. For example, Elkin-Koren (2001) focuses on the potential role of search engines as gatekeepers of information online, fearing that creating a right to exclude indexers opens up the door for increased control of access to content. Chandler (2008) argues that long-established free speech rights also extend to the right to reach an audience free from the discriminatory influence intermediaries, such as a search engine’s decision as to how to rank a site (if at all). Like Elkin-Koren, Chandler proposes a mix of regulatory and legislative approaches to address her concerns and protect the flow of information mediated by Web search engines. Goldman (2006), however, disagrees. In his view search engine bias is a beneficial consequence of search engines optimizing content for their users, and any attempt to regulate perceived biases would diminish search providers’ ability to provide relevant results to their users. Adding to this debate, Pasquale and Bracha (2007) make perhaps the boldest call for regulating what they describe as the “manipulation” of search engine results. Comparing search engines to common carriers, such as phone companies or public utilities, the authors reject market- or technology-based solutions, and instead call for the creation of a regulatory framework that would balance secrecy with transparency and, in the end, prevent “improper behavior by search engines.” Additional legal and policy scholarship addressing Web search engines continues to be generated, much of it focusing on copyright issues that emerge with the widespread indexing of online content, along with the expansion of Web search services to include video and books. For example, O’Brien and Fitzgerald (2006; Fitzgerald et al., 2008) examine search engine liability for possible copyright infringement and argue for changes in copyright law to better accommodate the unique value search engines offer society. Travis (2006) focuses on the particular copyright and fair use implications of Google’s plan to scan and make searchable and viewable online the contents of up to 15 million library books, suggesting that the courts and policymakers should recognize that Google is making fair and permissible uses of copyrighted works when it enhances both their distribution and use. Vaidhyanathan (2007) provides a contrary view, criticizing Google’s book project on three grounds: privacy (the fact that Google’s interface can track what books users search and access), privatization (criticizing the shift from public libraries to a single, for-profit corporation), and property (criticizing Google’s fair use claim for duplicating copyright-protected work). Vaidhyanathan concludes that Google’s book scanning project threatens both the stability and utility currently provided by public libraries, as well as the fair use exception to copyright. Along with this array of legal scholarship, recent workshops and symposia have been convened to focus and harness this array of attention in order to help arrive at workable legal or policy solutions to many of these search engine-related concerns. For example, leading legal scholars, policy makers, and industry lawyers gathered in December 2005 at Yale Law School to discuss the possibility of regulating search

CO RR EC

362

UN

361

515

516

408 409 410 411 412 413 414 415

OF

407

engines and to map out the relevant legal and policy domains,9 while a similar group convened a year later at the Haifa Center of Law and Technology to identify the role of the law and of regulators in governing the performance of search engines.10 Outside of academia, the Federal Trade Commission recently held public meetings to examine the privacy issues that emerge with the growing practice of behaviorally targeting Web advertising often performed by search engines based on keywords or other user profiling (Federal Trade Commission, 2007). Additionally, various legal journals are paying close attention to the ongoing debates, such as a special issue of the Journal of Business and Technology Law dedicated to the numerous legal issues surrounding Google.11

RO

406

M. Zimmer

416 417 418

Directions for Future Research

423 424 425 426 427 428 429 430 431

TE

422

While far from exhaustive, the preceding sections reveal the growing interest – and importance – of studying Web search engines from a variety of disciplinary approaches. Along with the necessary technical design and evaluative research, significant contributions have been made to understand Web search engines within the context of transaction log analysis and user studies, within political, ethical, and cultural perspectives, and to utilize legal and policy analysis to help understand where possible remedies to many search-related concerns might exist. Future research must ensure continued progress in the multidisciplinary understanding of the design, use, and implications of Web search engines. Four research areas can quickly be identified that deserve particular attention: search engine bias, search engines as gatekeepers of information, values and ethics of search engines, and framing the legal constraints and obligations.

CO RR EC

421

432 433 434 435 436 437 438 439 440 441 442 443 444

(1) Search Engine Bias: Technical and evaluative studies must be undertaken to identify possible instances of bias in search engines, and additional user studies must attempt to measure its effects on users’ experiences searching the Web. Only when armed with such additional data can we begin to address the normative dimensions of the bias itself. (2) Search Engines as Gatekeepers of Information: Future research must focus on reducing the opacity regarding how Web search engines work, identifying whether any intentional gatekeeping functions exist. While we are aware of some gatekeeping functions of search engines, such as Google’s complicity with China’s desire to censor certain search results, the extent to which gatekeeping might occur in versions of Web search engines that exist in more open societies must be explored in more detail.

445 446 447 448 449 450

9 See

UN

420

DP

419

“Regulating Search?: A Symposium on Search Engines, Law, and Public Policy” (http://isp.law.yale.edu/regulatingsearch/overview/). 10 See “The Law of Search Engines” (http://law.haifa.ac.il/events/event_sites/se/). 11 See http://www.law.umaryland.edu/journal/jbtl/index.asp.

Web Search Studies: Multidisciplinary Perspectives on Web Search Engines

454 455 456 457 458 459 460 461 462 463 464 465

OF

453

(3) Values and Ethics of Search Engines: Concerns over bias and gatekeeping point to the ways in which Web search engines have particular value and ethical implications for society. Additional work needs to take place to not only understand conceptually what values are at play with Web searching, such as privacy, autonomy and liberty, but also how users’ search activities actually impact the values they experience in the real world. (4) Framing the Legal Constraints and Obligations: It is clear from the sections above that law and policy can play a large role in mitigating many of the concerns expressed. For example, legal and regulatory frameworks could be constructed to ensure Web search engines do not contain bias, for example, or to ensure the rights of copyright holders are protected, or that user privacy is protected. Yet, many argue against any attempt to regulate the search industry and instead insist that the marketplace will ensure users’ needs are adequately fulfilled and rights are properly respected. Determining which approach is best requires further study and debate.

RO

452

DP

451

466

468 469 470 471 472 473

While Web search studies appear to be out of its infancy, vast amounts of research remain to be undertaken for it to reach maturity as a discipline. I have provided only a few waypoints to guide that journey, and others will certainly take the reins and illuminate new perspectives, theories, and methodologies into understanding the design, uses, and wide-ranging impacts of Web search engines on society, culture, politics, and law. As new scholarships into the implications of knowledge tools from our past continue to emerge, the field of Web search studies has much room to grow.

TE

467

475

478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495

References

Albrechtslund, A. (2006). Surveillance in searching. Paper presented at the EASST 2006. Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., & Raghavan, S. (2001). Searching the Web. ACM Transactions on Internet Technology, 1(1), 2–43. Bar-Ilan, J., Mat-Hassan, M., & Levene, M. (2006). Methods for comparing rankings of search engine results. Computer Networks, 50(10), 1448–1463. Bergman, M. (2001). The deep Web: Surfacing hidden value. Journal of Electronic Publishing, 7(1). http://dx.doi.org/10.3998/3336451.0007.104. Borodin, A., Roberts, G. O., Rosenthal, J. S., & Tsaparas, P. (2001). Finding authorities and hubs from link structures on the world wide Web. Proceedings of the 10th International Conference on World Wide Web, pp. 415–429. Brin, S. & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. WWW7/Computer Networks, 30(1–7), 107–117. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., et al. (2000). Graph structure in the Web. Computer Networks, 33(1–6), 309–320. Burns, E. (2007). Worldwide internet: Now serving 61 billion searches per month. SearchEngineWatch. Retrieved November 2, 2007, from http://searchenginewatch.com/ showPage.html?page=3627304 Chandler, J. (2008). A right to reach an audience: An approach to intermediary bias on the internet. Hofstra Law Review, 35(3), 1095–1138. Choo, C. W., Detlor, B., & Turnbull, D. (1998). A behavioral model of information seeking on the Web: Preliminary results of a study of how managers and IT specialists use the Web.

UN

477

CO RR EC

474

476

517

518

502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540

OF

501

RO

500

DP

499

TE

498

Proceedings of the 61st Annual Meeting of the American Society for Information Science, 35, 290–302. Choo, C. W., Detlor, B., & Turnbull, D. (2000). Information seeking on the Web: An integrated model of browsing and searching. First Monday, 5(2), 2000. Chopra, S., & White, L. (2007). Privacy and artificial agents, or, is Google Reading my email? Paper presented at the IJCAI 2007. Deleuze, G., & Guattari, F. (1987). A thousand plateaus: Capitalism and schizophrenia (B. Massumi, Trans.). Minneapolis: University of Minnesota Press. Dennis, S., Bruza, P., & McArthur, R. (2002). Web searching: A process-oriented experimental study of three interactive search paradigms. Journal of the American Society for Information Science and Technology, 53(2), 120–133. Diaz, A. (2008). Through the Google goggles: Sociopolitical bias in search engine design. In A. Spink & M. Zimmer (Eds.), Web Searching: Multidisciplinary Perspectives, (pp. 11–34). Dordrecht, The Netherlands: Springer. Eichmann, D. (1994). The rbse spider – balancing effective search against Web load. Proceedings of the First International World Wide Web Conference, pp. 113–120. Elkin-Koren, N. (2001). Let the crawlers crawl: On virtual gatekeepers and the right to exclude indexing. University of Dayton Law Review, 26, 180–209. Fallows, D. (2005). Search engine users: Internet searchers are confident, satisfied and trusting – but they are also unaware and naïve. Pew Internet & American Life Project. Retrieved October 15, 2005, from http://www.pewinternet.org/pdfs/PIP_Searchengine_users.pdf Federal Trade Commission (2007). Ftc to host town hall to examine privacy issues and online behavioral advertising. Retrieved November 8, 2007, from http://ftc.gov/opa/2007/08/ ehavioral.shtm Fitzgerald, B., O’Brien, D., & Fitzgerald, A.(2008). Search engine liability for copyright infringement. In A. Spink & M. Zimmer (Eds.), Web Searching: Multidisciplinary Perspectives, (pp. 103–120). Dordrecht, The Netherlands: Springer. Fortunato, S., Flammini, A., Menczer, F., & Vespignani, A. (2005). The egalitarian effect of search engines. Arxiv preprint cs.CY/0511005. http://arxiv.org/pdf/cs.CY/0511005 Gasser, U. (2006). Regulating search engines: Taking stock and looking ahead. Yale Journal of Law & Technology, 9, 124–157. Goldman, E. (2006).Search engine bias and the demise of search engine utopianism. Yale Journal of Law & Technology, 8, 188–200. Grimmelmann, J. (2007). The structure of search engine law. Iowa Law Review, 93(1), 1–63. Gulli, A. & Signorini, A. (2005). The indexable Web is more than 11.5 billion pages. International World Wide Web Conference, pp. 902–903. Hansell, S. (2006). AOL removes search data on vast group of Web users. The New York Times, p. C4. Hargittai, E. (2002). Beyond logs and surveys: In-depth measures of people’s Web use skills. Journal of the American Society for Information Science and Technology, 53(14), 1239–1244. Hargittai, E. (2004a). Informed Web surfing: The social context of user sophistication. Society Online: the Internet in Context, Thousand Oaks: Sage Publications, Inc, pp. 257–274. Hargittai, E. (2004b). The changing online landscape: From free-for-all to commercial gatekeeping. Retrieved October 14, 2006, from http://www.eszter.com/research/c03-onlinelandscape. html Heintz, C. (2006). Web search engines and distributed assessment systems. Pragmatics & Cognition, 14(2), 387–409. Hellsten, I., Leydesdorff, L., & Wouters, P. (2006). Multiple presents: How search engines re-write the past. New Media & Society, 8(6), 901–924. Hendry, D. & Efthimiadis, E. (2008). Conceptual models for search engines. In A. Spink & M. Zimmer (Eds.), Web Searching: Multidisciplinary Perspectives, (pp. 277–307). Dordrecht, The Netherlands: Springer.

CO RR EC

497

UN

496

M. Zimmer

Web Search Studies: Multidisciplinary Perspectives on Web Search Engines

547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585

OF

546

RO

545

DP

544

TE

543

Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS), 22(1), 5–53. Heydon, A. & Najork, M. (1999). Mercator: A scalable, extensible Web crawler. World Wide Web, 2(4), 219–229. Hindman, M., Tsioutsiouliklis, K., & Johnson, J. A. (2003). Googlearchy: How a few heavilylinked sites dominate politics on the Web. Annual Meeting of the Midwest Political Science Association. Hinman, L. (2005). Esse est indicato in Google: Ethical and political issues in search engines. International Review of Information Ethics, 3, 19–25. Höelscher, C. (1998). How internet experts search for information on the Web. World Conference of the World Wide Web, Internet, and Intranet, Orlando, FL. Höelscher, C. & Strube, G. (2000). Web search behavior of internet experts and newbies. Computer Networks, 33(1–6), 337–346. InfoSpace (2007). About webcrawler. Retrieved November 3, 2007, from http://www.webcrawler. com/webcrawler/ws/about/_iceUrlFlag=11?_IceUrl=true Introna, L. & Nissenbaum, H. (2000). Shaping the Web: Why the politics of search engines matters. The Information Society, 16(3), 169–185. Jansen, B. J. & Pooch, U. (2001). A review of Web searching studies and a framework for future research. Journal of the American Society for Information Science and Technology, 52(3), 235–246. Jansen, B. J. & Spink, A. (2005). How are we searching the world wide Web? A comparison of nine search engine transaction logs. Information Processing & Management, 42(1), 248–263. Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the Web. Information Processing and Management, 36(2), 207–227. Keenoy, K. & Levene, M.(2005). Personalisation of Web search. Intelligent Techniques for Web Personalization, 201–228. Khopkar, Y., Spink, A., Giles, C. L., Shah, P., & Debnath, S. (2003). Search engine personalization: An exploratory study. First Monday. Retrieved October 23, 2007, from http://www.firstmonday. org/issues/issue8_7/khopkar/ Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604–632. Kleinberg, J. & Lawrence, S. (2001). The structure of the Web. Science, 294, 1849–1850. Kleinberg, J. M., Kumar, R., Raghavan, P., Rajagopalan, S., & Tomkins, A. (1999). The Web as a graph: Measurements, models and methods. Proceedings of the International Conference on Combinatorics and Computing, 6(1), 1–18. Kopytoff, V. (2006). Google says no to data demand. San Francisco Chronicle, p. A1. Lawrence, S. & Giles, C. L. (1998). Searching the world wide Web. Science, 280(5360), 98–100. Lawrence, S. & Giles, L. (2000). Accessibility of information on the Web. Intelligence, 11(1), 32–39. Lempel, R. & Moran, S. (2000). The stochastic approach for link-structure analysis (salsa) and the tkc effect. Computer Networks, 33(1–6), 387–401. Lev-On, A. (2008). The democratizing effects of search engine use: On chance exposures and organizational hubs. In A. Spink & M. Zimmer (Eds.), Web Searching: Multidisciplinary Perspectives, (pp. 135–149). Dordrecht, The Netherlands: Springer. Lewandowski, D., Wahlig, H., & Meyer-Bautor, G. (2006). The freshness of Web search engine databases. Journal of Information Science, 32(2), 131. Machill, M., Neuberger, C., Schweiger, W., & Wirth, W. (2004). Navigating the internet. European Journal of Communication, 19(3), 321–347. Martey, R. M. (2008). Exploring gendered notions: Gender, job hunting and Web search engines. In A. Spink & M. Zimmer (Eds.), Web Searching: Multidisciplinary Perspectives, (pp. 51–65). Dordrecht, The Netherlands: Springer.

CO RR EC

542

UN

541

519

520

592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630

OF

591

RO

590

DP

589

TE

588

McBryan, O. A. (1994). Genvl and wwww: Tools for taming the Web. Proceedings of the First International World Wide Web Conference, pp. 79–90. Menczer, F., Fortunato, S., Flammini, A., & Vespignani, A. (2006). Googlearchy or googlocracy? IEEE Spectrum, 43(2). http://www. spectrum.ieee.org/feb06/2787. Nagenborg, M. (2005). The ethics of search engines (special issue). International Review of Information Ethics, 3. Norvig, P., Winograd, T., & Bowker, G. (2006). The ethics and politics of search engines. Panel at Santa Clara University Markkula Center for Applied Ethics. Retrieved March 1, 2006, from http://www.scu.edu/sts/Search-Engine-Event.cfm O’Brien, D. & Fitzgerald, B. (2006). Digital copyright law in a YouTube world. Internet Law Bulletin, 9(6/7), 71, 73–74. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: Bringing order to the Web. Retrieved January 12, 2007, from http://dbpubs.stanford.edu/pub/1999-66 Pan, B., Hembrooke, H., Joachims, T., Lorigo, L., Gay, G., & Granka, L. (2004). In Google We trust: Users’ decisions on rank, position, and relevance. Journal of Computer-Mediated Communication, 12(3), 801–823. Pasquale, F. & Bracha, O. (2007). Federal search commission? Access, fairness and accountability in the law of search. U of Texas Law, Public Law Research Paper No. 123. Retrieved August 15, 2007, from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1002453 Pinkerton, B. (1994). Finding what people want: Experiences with the webcrawler. Proceedings of the Second International World Wide Web Conference. Raghavan, S. & Garcia-Molina, H. (2001). Crawling the hidden Web. Proceedings of the 27th International Conference on Very Large Data Bases, pp. 129–138. Rainie, L. (2005). Search engine use shoots up in the past year and edges towards e-mail as the primary internet application. Pew Internet and American Life Project. Retrieved September 15, 2006, from http://www.pewinternet.org/pdfs/PIP_SearchData_1105.pdf Rieder, B. (2005). Networked control: Search engines and the symmetry of confidence. International Review of Information Ethics, 3, 26–32. Röhle, T. (2007). Desperately seeking the consumer: Personalized search engines and the commercial exploitation of user data. First Monday. Retrieved October 23, 2007, from http://www. firstmonday.org/issues/issue12_9/rohle/index.html Roy, M. & Chi, M. T. C. (2003). Gender differences in patterns of searching the Web. Journal of Educational Computing Research, 29(3), 335–348. Sanderson, M. (2005). Information retrieval system evaluation: Effort, sensitivity, and reliability. Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 162–169. Silverstein, C., Henzinger, M. R., Marais, H., & Moricz, M. (1999). Analysis of a very large Web search engine query log. SIGIR Forum, 33(1), 6–12. Spink, A. & Jansen, B. J. (2004). Web Search: Public Searching of the Web. New York: Kluwer Academic Publishers. Spink, A., Jansen, B. J., Blakely, C., & Koshman, S. (2006). A study of results overlap and uniqueness among major Web search engines. Information Processing & Management, 42(5), 1379—1391. Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques. KDD Workshop on Text Mining, 34, 35. Tavani, H. T. (2005). Search engines, personal information and the problem of privacy in public. International Review of Information Ethics, 3, 39–45. Teevan, J., Dumais, S. T., & Horvitz, E. (2005). Personalizing search via automated analysis of interests and activities. Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 449–456. Travis, H. (2006). Google Book search and fair use. University of Miami Law Review, 61, 601–681. Vaidhyanathan, S. (2007). The googlization of everything and the future of copyright. University of California Davis Law Review, 40(3), 1207–1231.

CO RR EC

587

UN

586

M. Zimmer

Web Search Studies: Multidisciplinary Perspectives on Web Search Engines

635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650

OF

634

RO

633

Van Couvering, E. (2004). New media? The political economy of internet search engines. Annual Conference of the International Association of Media & Communications Researchers, Porto Alegre, Brazil, pp. 7–14. Van Couvering, E. (2008).The history of the internet search engine: Navigational media and the traffic commodity. In A. Spink & M. Zimmer (Eds.), Web Searching: Multidisciplinary Perspectives, (pp. 77–206). Dordrecht, The Netherlands: Springer. Vaughan, L. (2004). New measurements for search engine evaluation proposed and tested. Information Processing and Management: An International Journal, 40(4), 677–691. Welp, C. & Machill, M. (2005). Code of conduct: Transparency in the net: Search engines. International Review of Information Ethics, 3, 18. Wirth, W., Böcking, T., Karnowski, V., & von, Pape, T. (2007). Heuristic and systematic use of search engines. Journal of Computer-Mediated Communication, 12(3), 778–800. Wouters, P., Hellsten, I., & Leydesdorff, L. (2004). Internet time and the reliability of search engines. First Monday. Retrieved December 24, 2006, from http://www.firstmonday.org/issues/ issue9_10/wouters/index.html Zimmer, M. (2006). The value implications of the practice of paid search. Bulletin of the American Society for Information Science and Technology. Retrieved April 3, 2006, from http://www. asis.org/Bulletin/Dec-05/zimmer.html Zimmer, M. (2008a). Privacy on planet Google: Using the theory of “contextual integrity” to clarify the privacy threats of Google’s quest for the perfect search engine. Journal of Business & Technology Law, 3(1), 109–126. Zimmer, M. (2008b). The gaze of the perfect search engine: Google as an infrastructure of dataveillance. In A. Spink & M. Zimmer (Eds.), Web Searching: Multidisciplinary Perspectives (pp. 77–99). Dordrecht, The Netherlands: Springer.

DP

632

TE

631

651 652 653

657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675

UN

656

CO RR EC

654 655

521