Almost four hundred years ago the Dutchman Dirk Hartog ...

5 downloads 36 Views 49KB Size Report
1. Almost four hundred years ago the Dutchman Dirk Hartog discovered the Western coast of. Australia by accident. And after him many more ships of the ...
Almost four hundred years ago the Dutchman Dirk Hartog discovered the Western coast of Australia by accident. And after him many more ships of the famous Dutch East India Company touched at the Australian coast. The Dutch named Australia after their homeland, Nieuw Holland or Hollandia Nova.

For the Dutch it has always been a very risky undertaking to travel to this part of the world. The rocky coastal waters of Western Australia formed a severe danger to early Dutch adventurers, looking for trade and fortune. This map is one of the beautiful maps in the collection of the Australian national library. Let me turn it 90 degrees to get a more familiar representation. On leaving the Cape of Good Hope, the vessels of the Dutch East India Company had to sail an Easterly course for about a thousand miles and then turn to the north in order to reach their final destination, Batavia. Due to the fact that the determination of longitude was only very approximate until well into the eighteenth century, ships often sailed too far to the East and landed or even wrecked on the Australian coast.

Ladies and gentlemen, I come from the Netherlands, but I did not have to undertake a risky adventure to get here, nor did I get here by accident. And the purpose of my visit is not adventure and fortune, but much more down to earth: to give a presentation on digital preservation issues at the Koninklijke Bibliotheek.

First of all I am afraid I will have to disappoint you. This presentation is not be about webarchiving. One of the reasons for me being here is to see if we can stand on the shoulders of others to take the first serious steps in this field.

My presentation consists of three components: •

a short overview of the history of our electronic deposit system and our agreements with publishers;



strategies for permanent access;



requirements for permanent archives.

I will conclude with some observations on the future developments..

1

But let me start by giving you a brief description of the Koninklijke Bibliotheek. It is one of the medium-sized national libraries in Europe and was founded in 1798. The annual budget of 2004 which we receive from the Minister of Education, Culture and Science is approx. M€ 37 and today our staff number comprises around 260 full time equivalents. Of course we have a mission statement and it states that we should ‘give everyone access to the knowledge and culture of the past and the present by providing high-quality services for research, study and cultural experiences’.

Core business for every national library is the deposit task for publications. We have to acquire, register, preserve and give access to every printed publication that is published in the Netherlands. In most countries it is a legal obligation for publishers to deliver one or more copies to the national library. In our country we have voluntary agreements with publishers and publishers’ organisations. In addition to our deposit collections (from 1974 onwards) we have of course valuable special collections of old and rare books, manuscripts, letters, etc.. All of these can be used for research in our reading rooms, which provide space for 500 customers, but increasingly materials can be found virtually through numerous services we offer on our web-site www.kb.nl. History e-Depot

About ten years ago we realised that the publication market started moving towards electronic publications. We knew that we had to adapt our policies and our processes, and in 1994 we decided to include electronic publications in our deposit collection. We considered this as a logical extension of the deposit for printed publications already in place. At the time we were quite alone in doing so.

With this extension of tasks we were confronted with the dilemma of electronic media: its short life expectancy. Printed information can be accessed directly and easily, provided you have obtained a physical copy. However, to access digital information you always need an intermediary instrument. You always need a computer, consisting of hardware and software that change rapidly over time. This

2

basic fact, the rapid change of formats, software and hardware, embodies the principal enemy of permanent accessibility. Since 1994 research and development on long-term digital archiving has been a top priority for the KB. On the basis of the results of experimental pilot systems we worked in close collaboration with IBM to develop an operational electronic deposit systeem. In December 2002 IBM delivered the system, which is also available on the market under the commercial name of DIAS, Digital Information Archiving System. And in March 2003 our socalled e-Depot, electronic Depot became fully operational.

Along the way we did a lot of research. A crucial step forward was the NEDLIB project, a European project co-funded by the European Commission, initiated and managed by the KB. In the project eight national libraries and three international publishers participated. NEDLIB defined the general architecture for an electronic deposit system, on the basis of OAIS, the Open Archival Information System Reference Model. This reference model served as a blueprint for our e-Depot.

The primary goal of our e-Depot is future access. The system is dedicated specifically to long term preservation and safeguarding long term accessibility. This makes the system quite different from usual storage facilities. We could not purchase it "off the shelf". Furthermore the system is dedicated to protecting the authenticity and integrity of its content. In order to reach these goals, we have committed ourselves to the permanent development of an ever-changing preservation and accessibility toolbox. Because one thing is certain: the rapid change of formats, software and hardware is no temporary phenomenon.

Agreements with publishers

What about the content of this storage system? Electronic publications are deposited at the KB on the basis of two types of agreements. In the first place there is a general agreement with the Dutch Publishers Association. It is similar to the agreement we already had for printed publications. Apart from this we make individual, socalled "archiving agreements" with

3

international publishers of scholarly journals. Currently there are six achiving agreements in place.

The first two publishers were Elsevier and Kluwer. Both Elsevier and Kluwer, although evidently international publishers, are of Dutch origin. Their headquarters are in The Netherlands, at least until now. They represent a long and impressive history of publishing in my country. The third publisher we entered into an agreement with was BioMed Central. This contract signified a major step in two ways. Firstly it underlined the international role of our national deposit system. BioMed has no Dutch origin. Furthermore it was established as an "open access" publisher right from the start. This also was new to us. For these two reasons, the BioMed agreement represented a major strategic step. I will elaborate on this further on. In the last months these contracts were followed by agreements with Blackwell Publishing, Oxford University Press and most recently with the Taylor & Francis Group.

There is a minimum set of conditions to be fulfilled if we are to enter into an archiving agreement. Publishers must deposit their publications free of charge. On the other hand, we have to accept restrictions on access. There is however a bottom-line. The minimum we require is on site access for any registered user and availability for interlibrary document supply within The Netherlands. These conditions are similar to those applied for deposited printed material. Of course on-site access is a stone age way of doing in the digital world. We are optimistic about the possibilities to introduce remote access as well, be it on a limited basis. Last but not least, our archive serves as a guarantee to all licensees all over the globe. In case of calamities or in case the publisher does not meet his obligations, we safeguard the access that licensees have payed for.

Finally some figures. The system is able to load up to 50,000 articles a day. And we expect it to hold 4 million articles at the end of this year.

This in short is the story of our e-Depot.

4

It takes a lot of energy, creativity and resources. Why are we doing this? As a national library, it is our task to collect and preserve the printed publications in our country. As I said, the e-Depot is a logical extension of this task into the digital world. Yet, there is more to it.

In the printed paper world deposit collections are built on a national and geographical basis. It's a world-wide system, providing a clear assignment of responsibilities. It is based on global arrangements supported by IFLA and UNESCO. But can this model be maintained in the digital era?

Especially in relation to international scholarly publications we think an alternative model might emerge. The more so because the current international journals have no longer a fatherland that can be identified easily. This brings me to the issue of permanent access strategies in the digital world.

Strategies for permanent access

What makes the digital world different? To identify the differences we must look into the basic characteristics of digital objects.

Digital objects are omnipresent. A single copy is sufficient to serve all readers world-wide. Omnipresence apparently is a mixed blessing. On the one hand, it makes life easier, both for publishers and libraries. On the other hand, it makes them feel uneasy about potential consequences. It makes publishers anxious about unauthorised use. It makes libraries worry about future access. Therefore they demand guarantees for so called 'perpetual access'. Omnipresence paradoxically leads to anxiety about future presence.

Digital objects are volatile. They can be changed and adapted easily. That is a serious weakness in the case of scholarly publications, where adaptation comes down to manipulation. Safeguarding authenticity and integrity is a prominent component of the archiving challenge.

5

Digital objects are extremely perishable. If there are floppy discs still lingering in your office, you should hope they don't contain essential information. The floppy will not fit into the machine on your desk. This is the preservation problem in a nutshell.

Digital objects are also very fertile. The volume of digital publications is growing very rapidly. This, however, does not necessarily pose a major problem. Experience indicates that there are considerable economies of scale. We should exploit them to counter rapid growth of the volume.

I will dwell a bit longer on the issue of this extremely short lifespan. Deterioration of the storage medium and obsolescence of hard- and software are the problem. What can be done about this?

The key concepts are: refreshing, migration and emulation. Beware however, because there is confusion on terminology. What I mean by refreshing, sometimes is called migration. To make things worse, what I mean by migration, sometimes is known as conversion. So I will try to clearly define my terminology.

Refreshing means transferring the bits and bytes to a fresh physical storage medium. This is the least part of the problem. As to the format and the rendering tool there are several strategies you might follow. The most widespread method is migration to a new format, but you have to accept then as a drawback that some information might get lost on the way. The alternative is emulation: instructing a new rendering tool to behave like an obsolete one. Experiments done jointly by the Koninklijke Bibliotheek and Rand Corporation indicated that this is a viable technique. The method however is labour intensive and therefore costly. Currently a new method is being developed together with IBM: the Universal Virtual Computer. This method is based on a combination of migration and emulation

Whatever strategy you choose to follow, it will always imply repeated actions. That's a certainty. There is uncertainty however as to what your actions exactly will have to be, because we don't know what future technology will be like. Therefore a permanent R&D effort is needed.

6

From these characteristics and fundamental facts, we can derive the requirements for a permanent archive.

Requirements for permanent archives The first one looks a bit self-evident: permanent archives presuppose permanent commitment. Self-evident or not, it is a fundamental requirement. A permanent archive should provide reasonable guarantee for continuity.

Permanent archiving takes substantial resources, both organisational and technical ones.

Moreover, sustained R&D efforts are required. Technology will keep changing. Whenever new platforms or new formats emerge, you will have to prepare for counter attack. Again and again you will have to devise the means for maintaining accessibility. It's a never-ending story to keep your ever-changing toolbox up-to-date.

The good news is that there are also considerable economies of scale. The fixed costs of a permanent archiving system are relatively substantial. Once your system is working well, you can expand the storage capacity relatively easily, costs per unit will go down.

These requirements are our beacons in a permanently evolving and uncertain environment. The future may prove that we are wrong, but this is the best we can offer now. These are the insights on which the strategy of the KB is based.

From the requirements one can draw two basic conclusions. First, they tend to narrow down the number of possible candidates for permanent archiving. Candidates should have the resources and the will to engage in a major long-term commitment. Long term preservation should feature prominently among their strategic goals and be part of their mission. Permanent archiving cannot be a sideline activity or by-product.

7

The second conclusion is related to the economies of scale. It's an economic law that economies of scale inevitably result in a degree of concentration. Exploiting economies of scale therefore calls for co-operative efforts. R&D efforts should also be shared. It wouldn't make sense for each research or university library to try to establish its own permanent archiving system. In the case of international scholarly journals a handful of permanent archives, wisely spread around the globe, might suffice.

It appears to me that three strategies for permanent access are currently emerging. Yet, each of these strategies is still, one could say, in its pilot phase. I will describe them as: •

the Safe Place Strategy



the LOCKSS Strategy



the Institutional Repositories Strategy.

In all three of them storage of digital publications is a core issue, but they differ in how they emphasise long term preservation. I will argue that only the first one, the Safe Place Strategy, makes from long term preservation its primary goal. This being said, you will not be surprised to hear that we adhere to the first model, the Safe Place Strategy.

The Safe Place Strategy is directly derived from the requirements I stated earlier. From these requirements it follows that permanent archiving should be taken care of by a limited number of institutions, dedicated to this task. Permanent archiving should be prominent in their mission. The model clearly draws its inspiration from the deposit system in the printed paper world. In this view national libraries are natural candidates for permanent archiving. This has been their mission all the way through. Although other institutions also may qualify, provided they meet the requirements and provided they are willing to take part in the global arrangements that are needed. Large library co-operatives could be an example of such institutions.

The next model is quite the opposite of the Safe Place Strategy. Instead of relying on a number of dedicated institutions, they seek safeguard in large numbers. I named this strategy after the LOCKSS initiative, a co-operative venture supported by the Mellon

8

Foundation. The least you can say of it, is that their acronym is ingenious: Lots of Copies Keep Stuff Safe. In order to safeguard future access, libraries should request their own copy of digital publications to be stored in their own electronic stack room. The more libraries do so, the better chances are that future availability can be guaranteed. There is some guarantee in large numbers. Those libraries will not burn down all on the same day. (If they do so, we will probably have lots of other problems to worry about). It's an elegant strategy. It has the attractiveness of any decentralised model, guaranteeing the common good as the outcome of free decisions made by autonomous agents. There is however a serious drawback. Long-term preservation implies permanent development and application of a preservation toolbox. As far as I can see, this is missing in the model. LOCKSS primarily responds to the anxiety of librarians about dependency on publishers for future access. It neglects the intricacies of long-term preservation technology.

The Institutional Repositories Strategy is closely related to the Open Archives Initiative. Its primary goals are not in the realm of permanent archiving. To begin with, academic institutions want to display the intellectual output of their faculty. That's only a natural and legitimate ambition. Yet, for the advocates of this strategy there is more at stake. They claim a role in the dissemination process of scholarly information. However, the Institutional Repositories Strategy tends to underestimate or to neglect the requirements for permanent archiving. Safeguarding future accessibility is no by-product that automatically derives from establishing repositories. Universities indeed should take their responsibility for future access. May be in doing so, they might rely a bit more on co-operation with Safe Places as hosts and guardians of their intellectual output.

Concluding remarks

Finally, when we look into the near future, actions and decisions are needed in at least three areas. •

The development of global arrangements

9

As I stated earlier, in the printed paper world there is a clear allotment of responsibilities. Similarly, we will have to develop global arrangements for the deposit, permanent archiving and permanent accessibility of digital publications. At an early stage the Koninklijke Bibliotheek chose to establish a permanent archive for electronic publications. I think I may say that ever since we have been in the forefront of developments. We see it as a logical extension of the deposit of printed publications. It’s our ambition to be part of this global network of trusted repositories or safe places. We are prepared to expand our international task, for two reasons. Because the national geographical model no longer holds for international scholarly publications. And because we want to make full use of the phenomenon of economies of scale.

Within this context, only last week we organised an international conference on permanent access to the records of science within the framework of the Dutch presidency of the European Union. The conference was attended by some 150 representatives from all sectors involved: science, publishers, the library world and the IT-sector. At the conference a resolution was adopted that stressed the urgency of the challenge of longterm preservation and permanent access. The meeting urged the Koninklijke Bibliotheek as host of the conference to take the initiative to form a task force of representatives of the different sectors involved. This task force will be asked to define a research agenda and to develop scenarios for a European networked infrastructure for long-term preservation and permanent access. The results of the task force will be presented to the European Commission with a view to the coming seventh Framework Programme for Research. So, also within the European arena, we take initiatives and try to stay in the forefront of developments. •

The need for ongoing R&D

As I argued, a sustained commitment to the development of preservation and permanent access techniques is fundamental. What is called for now, is a shared R&D effort in long-term preservation.

10



The need for a business model to recover the costs of archiving

Finally, we need to explore business models that help to recover the costs of archiving. Indirectly my government is now funding an international task. There's nothing wrong with that. In the long run however, we will have to find a balanced solution for cost recovery.

11