copyright protection and cultural heritage - CiteSeerX

7 downloads 0 Views 66KB Size Report
was based on the work and the beliefs of the pioneers of the Internet and the web .... production of history, and the list of all digitized books is a central part of this.
EVA 2006 London Conference ~ 26-28 July Teresa Numerico and Jonathan P. Bowen _____________________________________________________________________________________

COPYRIGHT PROTECTION AND CULTURAL HERITAGE: STEERING DIGITIZATION BETWEEN PUBLIC ARCHIVES AND SEARCH ENGINES Teresa Numerico Departmento di Scienze della Commicazione Università degli Studi di Salerno Via Ponte Don Melillo, 84084 Fisciano (SA) Italy [email protected] http://www.scienzecom.unisa.it Jonathan P. Bowen Museophile Limited Oak Barn, Sonning Eye, Reading RG4 6TN United Kingdom [email protected] http://www.jpbowen.com Abstract – This paper explores the tensions between universal access to cultural heritage online and the varying copyright and other issues that may impinge on this apparent utopia. In particular, we argue for multiple routes to accessing information to avoid monopolistic or commercial dominance of the field. As largely altruistic institutions, cultural organizations should ensure the issues are discussed openly before it is too late to have any effect. As a specific example, Google Book Search is an excellent project that involves both libraries and editors, but institutions should not stop their digitization programs and rely passively on the initiative of a commercial enterprise.

INTRODUCTION Typical rhetoric about the digital era often concentrates on the accessibility of content because it consists of bits rather than atoms [9]. However, international regulations about copyright protection [10] are increasingly restrictive with respect to the availability of protected content (and the quantity of such content is growing dramatically). We are facing a contradictory period in which, while it was possible to allow everybody who comes to a library to borrow a book, without infringing any copyright rule, it may be forbidden to create a digital edition of the same content for the same library user. Moreover, library users can wander around the physical book shelves and look through any book without the risk of the librarian controlling their behaviour or taking note of their preferences. In contrast, if the same person accesses a digital library, all their activities can be controlled and archived for years. The very role of a library as a repository for the diffusion of culture is under discussion for two important reasons. Firstly, this is because of all the restrictions imposed by copyright protection. Secondly, this is because there are other ventures that are trying to offer the same services as libraries via the web. We consider a new and potentially very significant initiative: one of the most important search engines, Google 1

EVA 2006 London Conference ~ 26-28 July Teresa Numerico and Jonathan P. Bowen _____________________________________________________________________________________

[8], has launched a huge project (originally Google Print [1], now Google Book Search, http://books.google.com) to digitize, archive and make available a large amount of cultural content included in books, in partnership with some mainly American libraries and with editors that are willing to cooperate. We will consider the initiative with a view to understanding the issues at stake in the project. If a single service provider is to make the majority of available content accessible, it will be a unique challenge, but whatever is omitted from that archive could be missing forever and for the vast majority of people. According to a recent survey [6], it appears that many students use the web as their only source of information. They rely mainly on one search engine and rarely check the information they find from the first source they access. We are not raising the controversial issue of censorship here, but even without an explicit will to control information, a single source for all available literature seems like an undesirable and potentially dangerous situation. Is not a core part of the web revolution the possibility of a multiplicity of information sources? What can be done to ensure information pluralism? How can we encourage this? According to Derrida [4], we have to face the fact that an archive is both revolutionary and conservative, and that those who control the archive exercise a “violent” effect on history and consequently on society in its entirety. The cultural heritage community, such as archivists, curators, historians and librarians, should become aware of their power and use it in order to preserve the multifaceted and heterogeneous information products of humanity.

THE GOOGLE BOOK SEARCH PROJECT The idea behind the new Google project of scanning and making available all of a book’s content to all web users is very good. We could imagine that it will be very useful for users to search through the content of books looking for some expression and compare its uses among various authors. The scope of the project is very interesting and promising. There have been many rumours about the exact perspective of the project and we will try to illustrate the characteristics of the project, and to consider the opportunities and risks of such a huge enterprise. The project is very simple in concept. The idea is to scan as many books as possible, providing the largest repository of book content available online and allowing expression searching of the entire resource. The texts will not be displayed under the same conditions; there are instead various ways of showing books, according to the various different situations with respect to copyright protection, or according to the agreement in place with the relevant publishing house. For example, the book can be shown in “snippet” view, “like a card catalogue, shows information about the book plus a few snippets – a few sentences of your search term in context” (http://books.google.com/intl/en/googlebooks/about.html). Another viewing possibility is to show the book using the Sample Pages View; this solution implies the agreement of the publishing company and allows the users to access some pages of the book. In order to guarantee that only some of the pages are read by each user, the users are obliged to identify themselves using one of their Google accounts, such as their Gmail address or some other Google subscription. This procedure, though necessary to protect copyrighted material, poses some questions about the privacy of the users who are obliged to declare their identity if they wish to browse the pages of the books, since this means that all the books they consult can be associated with their profiles. 2

EVA 2006 London Conference ~ 26-28 July Teresa Numerico and Jonathan P. Bowen _____________________________________________________________________________________

A third viewing possibility is to access the entire contents of the book, but this is only available for public domain content. Google is very conservative with what can be considered as public domain content, using copyright protection rules chosen according to the country of the user if possible. This Google project could not be accomplished without the help of the most important internationally well-known libraries, some of which have already started their partnership with Google. Among the most important are at Harvard University, University of Michigan, Oxford University and Stanford University. This project is definitely a good opportunity for the preservation of and the access to cultural heritage, but it raises many unanswered ethical, social and political questions. We cannot blame Google for its commercial and cultural strategy; however sometimes things that are apparently free are not actually without cost. Society itself together, with its cultural heritage institutions, needs to consider the implicit cost of this project, if only to start the debate. The problems that need to be solved include, among others, copyright protection policies, privacy protection, web democracy and fairness. In summary, how should the balance between public good and commercial exploitation of such a good by made? Google Book Search is not just a cultural non-commercial project; Google correctly declares its commercially legitimate intent to use the content of books for contextually selected advertising. The advertising revenues will be shared with editors and eventually authors; however there is an ethical question here that demands careful consideration. Are we sure that it is fair to consult a book and obtain the pages with the relevant citation together with advertisements of a related product? Google at the moment declares that there will be no advertisements for public domain content, but are we sure that Google will never use such content for advertising space? Moreover, there is the very problematic question of user privacy: in order to preserve copyright, Google will need to know the account of the users who consult pages, which corresponds to an identity. Can we be sure that this information, which forms a relationship between the users’ profiles and their reading preferences, is fair and permissible, in order to guarantee the protection of the private choices of users? These are significant questions that cannot be answered quickly, but the questions remain there, and the web user community, especially culturally involved professionals, should start thinking about some possible answers to them.

THE WEB AS A PUBLIC GOOD There is much discussion among observers concerning the perception of the web (and more generally the Internet) as a public space and a public good. It was exactly this approach to the web that stimulated much investment in web-oriented enterprises over the past few years. The high expectations around the possibilities opened up by the web was based on the work and the beliefs of the pioneers of the Internet and the web, who struggled to keep the resource as open, free and universal as possible. If we accept that the web should be a public space, we have to acknowledge the risk that it can be restricted by inadequate applications. Search engines are a major driving force in this regard. The Google Book Search project is a very clear case where we have to face and if possible control such a “shrinking” of the public space, represented by the collection of all printed books. If Google can allow the access, it can control the access, potentially preventing free information access. A single powerful company could control a publicly 3

EVA 2006 London Conference ~ 26-28 July Teresa Numerico and Jonathan P. Bowen _____________________________________________________________________________________

available resource of information, which means that its behaviour could have “political” and social consequences even if the company refuses to act according to a precise policy. This risk is clearly described in a paper by Lucas Introna and Helen Nissenbaum [7] (page 180): If search engines systematically highlight Web sites with popular appeal and mainstream commercial purpose, as well as Web sites backed by entrenched economic powers, they amplify these presences on the Web at the expenses of others. Many of the neglected venues and sources of information, suffering from lack of traffic, perhaps actually disappear, further narrowing the options to Web participants.

The issue is complicated because search engines are acting legitimately according to their commercial interests in emphasising some mainstream results and are also fulfilling the requests of their users, who are mostly interested in major websites. As a direct consequence of the legitimate behaviour of all private web server providers, there is a reduction in public space that results in a potential loss for everybody, including search engines, whose earnings are based on the collectively shared belief of the validity and of the richness of the largest repository. This state of affairs is especially problematic when the attention is concentrated on the preservation and the access to culture, as in Google Book Search. We cannot agree, in fact, on the assumption that search engines are managing a product like other goods. Information is not like toothpaste or casual clothes. It is a very special and important resource for democracy and freedom. That is why its distribution should be guaranteed by different mechanisms from the mere functioning of the marketplace, which is, to some extent, an imperfect tool. According to John Battelle, a well-known journalist, co-founder of Wired and an expert of the web who interviewed most of the main people involved with web search enterprises during his successful career, we are searching for more than an answer. We are searching to find also what we do not know. In his prominent opinion then, there are two reasons for searching online “to recover that which we know exists on the Web, and to discover that which we assume must be there” [1] (page 32). In cases where users do not know for what it is that they are looking, there are only a small chance that they were able to evaluate the returned results successfully. The procedure for evaluating results is similar to guesswork. However, it is likely that the obtained results will be accepted if a user is not an expert in the subject matter of the request. On what basis can we judge and refute the quality of the proposed citations? Google Book Search is a wonderful resource; however we have to be cautious, because there is the risk that if we accept the quasi-monopolistic situation in which search engines like Google finds themselves, we risk losing some of the cultural heritage that Google and the other libraries that are partners in the project are trying to protect. What, if a proportion of books are not included in the biggest repository of books available online? Will people consult other sources or will they rely on a single accredited source? What about minority languages, such as Maori or Italian, which are not spoken in the countries of the libraries which are prepared to cooperate with the project?

THE RISK OF USERS’ LACK OF AWARENESS The lack of awareness of users when evaluating web sources is well exemplified by research conducted on college students in the course on “computers and internet” at Wellesley College [6]. The students were asked to find some information concerning 4

EVA 2006 London Conference ~ 26-28 July Teresa Numerico and Jonathan P. Bowen _____________________________________________________________________________________

various subjects. The survey showed some remarkable results regarding the reliance of students on the web and in particular their full confidence in search engines as the privileged tool to access information. According to the survey’s authors, students were “overwhelmingly susceptible to three types of misinformation – advertising claims, government misinformation and propaganda” [6] (page 73). Specifically, they showed a concerning inability to distinguish between facts and advertising by famous brands. There was no great difference between students of the first or of the last year of college; older students performed no better than younger ones. The situation can only worsen considering that increasing amounts of information will be found only on the web, including books, unless there is a change in the training methods of students to stress the importance of source evaluation. The search engine queries should be only the start of a complex research and evaluation process. The web will remain a public good only as a consequence of such training for students and more mature adults.

THE VIOLENCE OF THE ARCHIVE One of the most remarkable consequences of the web has been the creation of an enormous quantity of private archives full of useful information that could be of more general interest to the public as well. The network has become a repository of collective private and social memories that could be recorded, catalogued and retrieved by everybody. It was the first distributed contribution to the creation of a world archive. At the beginning, the free accessibility to all the resources was part of the very nature of the web; it was the biggest self-organized enterprise for the sharing of information. However, all this material was chaotically accumulated and needed to be organized in order to be retrieved, interpreted and to produce new collective knowledge. According to Michel Foucault [5], who wrote a book on the organization of knowledge and its the related power: The archive is first the law of what can be said [...] But the archive is also that which determines that all these things said do not accumulate endlessly in an amorphous mass, […] but they are grouped together in accordance with multiple relations, maintained or blurred in accordance with specific regularities.

If we follow this definition, the web is an archive in its double sense: it is the rule that allows information to be published and accessible online and with all the available tools it controls the stream of information and influences what is believed to be worthwhile and what is not. In this Foucaultian sense, Google Book Search has an enormous value in terms of knowledge organization and consequently in terms of control. If it is true that for the first time in the history of mankind the general “archive” is no more in the hands of the authorities and institutions, but under the control of all the people who could access it, the presence of crucial intermediaries to access such content creates some new crucial issues that need to be addressed. This promising system could be really innovative in the consequences it has on society and power distribution, though there are new risks and vulnerabilities to face. All the tools used to organize and retrieve content in order to produce new knowledge are extremely powerful within such an archive because they determine how things that are “published” on the web can be accessible and retrieved in one form or another. In this sense, they play the traditional role of institutional authorities, without being publicly controlled and without being officially in charge of this duty. We note that political power is always involved in the control of the collective archive and often with 5

EVA 2006 London Conference ~ 26-28 July Teresa Numerico and Jonathan P. Bowen _____________________________________________________________________________________

the preservation of the memory of the people. The control of the archive affects the production of history, and the list of all digitized books is a central part of this memory/identity creation. Whenever there is a new power, it includes a classification of what is public and what is private, what is secret and what is commonly discussed. After the establishment of the new rule however, there is the necessity of a conservation policy. The archive in this sense is at the same time revolutionary and conservative. Following Derrida’s lecture given in London in 1994 on “archive fever” [4], we can assert that the archive has a contradictory essence. At the same time it preserves and saves documents and, consequently, the possibility of knowledge creation, but it does so in an unnatural way. Interpreting the role of the network from this perspective, we can draw ambiguous consequences in understanding the link between the web and its archival role. On one hand we can argue that because nobody is completely in control of the content available online there is no central power that can establish full control over the present archive of society. However if a huge archival project takes place, the lack of control is threatened. So the risks of archival control are even more relevant than with state-controlled archives. Derrida also suggested that in order for an archive to be possible there must be something that is not included in it. There must be something that is “outside” the archive. However, if the web stores all the available documents there is no exteriority to guarantee the preservation of the archive and the possibility of evolving it into something different. In fact, the archive “always works, and a priori, against itself.” For the archive to be changed we need the external world and a consignation space. The web presents some problematic characteristics to fulfil this condition; because it is virtually everywhere, it keeps potentially everything that is produced and it is in a constant process of self-modification. The Google Book Search project could create the “no exteriority” risk.

CONCLUSION In order to preserve the promise of a revolutionary way of preserving and distributing information of the web, it is crucial to maintain an area “outside” it. This means the possibility of thinking of different solutions from the present ones in accessing online information. The spirit of the web revolution seems to be maintained by a continuous evolution of tools for storing and retrieving online content, and in the choice of users and developers of a multiplicity of devices that guarantee that the online archive would not be univocally interrogated and would not give everybody the same answers. The web was originally based on the freedom of all users to share interesting information and we need to pursue the same objective when inventing new tools for accessing it now and in the future to continue the original spirit of the web and the social power of collectively creating an online archive. We think that Google Book Search is an interesting project, but that for it not to become a danger to the diversity of cultures, libraries and archive in different countries should not abandon their digitization projects; instead they should increase their effort, in a cooperative, social and collective effort of elaborate a sustainable but active process of digitizing cultural resources in order to preserve multiplicity and the collectivity of culture as a social good. In addition, open initiatives such as Wikipedia (http://www.wikipedia.org), and to a lesser extent so far Wikibooks, are allowing a certain amount of increased democratization of knowledge by individuals in the cultural and other fields [3]. 6

EVA 2006 London Conference ~ 26-28 July Teresa Numerico and Jonathan P. Bowen _____________________________________________________________________________________

ACKNOWLEDGMENTS Thank you to Museophile Limited (http://www.museophile.com) for support of the second author.

REFERENCES [1]

Band, J., The Google Print Library Project: A Copyright Analysis. Policybandwidth.com, Washington DC, USA, August 2005. URL: http://www.policybandwidth.com/doc/googleprint.pdf [2] Battelle, J., The Search. Boston/London: Nicholas Brealey Publishing, 2005. [3] Bowen, J. P. and Angus, J., Museums and Wikipedia. In D. Bearman and J. Trant (eds.), MW2006: Museums and the Web 2006, Albuquerque, New Mexico, USA, 22–25 March 2006. Archives & Museum Informatics, 2006. URL: http://www.archimuse.com/mw2006/papers/bowen/bowen.html [4] Derrida, J., Archive Fever. Chicago: University of Chicago Press, 1996. [5] Foucault, M., The Archaeology of Knowledge. Routledge, London, 1989. (Last reprinted 2005.) [6] Graham, L. and Metaxas, P. T., “Of course it’s true; I saw it on the Internet!” Communications of the ACM, vol. 46, no. 5, pp. 71–75, 2003. [7] Introna, L. D. and Nissenbaum H., “Shaping the Web: Why the politics of search engines matters”. The Information Society, vol. 16, no. 3 pp. 161–185, 2000. [8] Langville, A. N. and Meyer, C. D., Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton, NJ: Princeton University Press, 2006. [9] Negroponte, N., Being Digital. London: Hodder & Stoughton, 1995. [10] Numerico, T. and Bowen, J. P., Copyright and promotion: Oxymoron or opportunity? In J. Hemsley, V. Cappellini and G. Stanke (eds.), EVA 2005 London Conference Proceedings, University College London, UK, 25–29 July, pages 25.1–25.10. URL: http://arxiv.org/abs/cs.CY/0508067

7