Oct 9, 2011 - http://www.informationr.net/tdw/publ/papers/ASIST2011.html. 1/8 .... in which to save files, and Photoshop also offers me nineteen different.
01/02/2017
Preservation: the final frontier?
Let me begin with a fairly tale from the future, told to children in the year 3,511. It begins in the traditional style. Once upon a time, many, many years ago, it was normal to keep records of what we had done on paper (laughter from the children) and people became worried that the paper might decay or be burnt or suffer from floods and so they decided to transfer their records to machines called 'computers' (puzzled noises from the group). Ssh, children: I'll explain. These 'computers' were not like the intelligent robots we have today, although they were a kind of ancestor. To begin with, they had rather limited memories, were unable to speak and could only communicate by sending electronic messages to one another, which, of course, we humans could not directly understand. Also, most of the time, the messages had to be sent along 'wires' - thin strands of metal - or glass. (more laughter from the children) As you can imagine, much of what was recorded has now been lost, although because scholars use earlier work to build on, much of what was useful has actually survived, although in many cases we don't know who created the original ideas. There are various, perhaps mythical, figures in the mists of time with names like Copernicus and Newton and Einstein and Darwin, which you may have seen in children's books, but we can't be sure that such persons ever existed - these may be the names of research projects or machines. It must have been difficult then, before our quantum memory packs were developed, which are constantly updated and which we access directly from our brains, and I think that now, what we know will be preserved for ever... because everyone knows everything!
I shall leave it to your imagination to decide what kind of fairy tale will be told in the year 6,511!
Introduction What we know of the past is largely a matter of accident; no previous society has managed deliberately to preserve the record of its culture. What has survived is a small proportion of what existed and we have only clues to the remainder. For example we have complete plays (but not all plays) by the three tragedians, Aeschylus, Sophocles and Euripides, and the comedies of Aristophenes - but there are references to more than 100 other playwrights, of whom we know next to nothing.
Many of you will know Shelley's poem Ozymadias, which tells of the discovery of a statue in the desert bearing the name of the king - otherwise known as Rameses the Great. The poem concludes: And on the pedestal these words appear: 'My name is Ozymandias, King of Kings: Look on my works, ye mighty, and despair!' Nothing beside remains. Round the decay Of that colossal wreck, boundless and bare,
http://www.informationr.net/tdw/publ/papers/ASIST2011.html
1/8
01/02/2017
Preservation: the final frontier?
The lone and level sands stretch far away. (Heaney and Hughes, 1982: 333)
Those 'lone and level sands' have given up other memories of the pharaohs: things that were designed to be hidden, to be viewed only by the gods and the spirit of the departed. The tomb paintings of Tutankhamun survive not because they were designed for future generations to see, but because they were hidden.
The interesting thing about the survival of the records of ancient cultures is that cave paintings and stone carvings appear to be excellent preservation media! The paintings of Lascaux are estimated to be more than 17,000 years old, while paintings more than 30,000 years old have been discovered elsewhere.
The Rosetta stone, which was erected 2,207 years ago - and was discovered in the course of Napoleon's expedition to Egypt, is of interest for more than the decree it reports, which seeks to establish the divinity of the new Pharaoh, known to us as Ptolemy V. The inscription ends by stating: and the decree should be written on a stela of hard stone, in sacred writing, document writing, and Greek writing, and it should be set up in the first-class temples, the second-class temples and the third-class temples, next to the statue of the King, living forever (British Museum, n.d.)
The three writings were hieroglyphics, the demotic script of the language, and ancient Greek. And it was the latter that enabled the Stone to act as a translation key - ancient Greek was known to scholars and it was possible to use this knowledge to unravel the mystery of hieroglyphics. Papyrus records have also survived from at least 2,600 BC and clay tablets found in Roumania date to 5,300 BC. Although most of those that survive are from 1,000 years later in Sumeria.
Wood is also good, under the right circumstances.
http://www.informationr.net/tdw/publ/papers/ASIST2011.html
2/8
01/02/2017
Preservation: the final frontier?
This is a stretch of Hadrian's Wall in Northumberland and when the legions were patrolling the Wall about 2,000 years ago they wrote letters, lists, supply orders and other things on wooden tablets, more than 700 of which have so far been found, preserved in the mud of the Roman fort of Vindolanda. The Roman cursive script has been recovered by infra-red photography and the tablet shown here is part of an invitation from Claudia Severa to Sulpicia Lepidina to attend her birthday celebration. The text is written by a scribe—identified by his writing on other tablets—but the ending is written in Claudia's own hand and believed to be the only surviving example of the handwriting of a Roman woman from anywhere in the Roman Empire, indicating, of course, that she was an educated woman. (British Museum, n.d.) And then there's paper. Invented in China in about the 2nd century AD, paper spread through Samarkand to Bagdad and thence into Europe by the 13th century. Artistic paper cut-outs from China have survived for 1500 years (Chinese..., 2011), so we know that paper can survive for at least this long, if it is made from the right materials and maintained in conditions of appropriate temperature and humidity. There are several points to summarise here: first, physical materials can have very long lifetimes, if the conditions are right - up to tens of thousands of years for cave paintings and, we can assume, for clay tablets and stone inscriptions; secondly, preservation is of little value if what is preserved cannot be interpreted - the Rosetta stone was a stroke of luck in this respect, since it provided the Greek key to the hieroglyphic content (and we have not been so lucky with other lost languages); finally, we can see that the biggest threats to preservation come from natural calamities such as the eruption of Vesuvius, wars and invasions, and the collapse of societies - we would all be speaking some variety of Latin if the Roman Empire had lasted for another thousand years!
Digital preservation When we turn to the present, the notion of digital preservation clearly arises in the context of seeking to preserve not simply for our own use in the future, although that is part of it, but for an indeterminate future. At this point, we need to be clear that preservation is not the same as archiving: however well protected, physical archives are potentially subject to decay, destruction or corruption and preservation suggests that steps are taken to try to prevent these happening. The example of hieroglyphics and the Rosettta stone also raises the issue of the intelligibility of the preserved material. Clearly, the digital record is also subject to the possibility of decay of the physical material upon which it is recorded and our understanding of how durable different digital media may be is fairly limited. We know that magnetic tape has a relatively short life-span; perhaps as short as ten years for some kinds, although archive quality might last for a hundred years. It is also estimated that CDs and DVDs will also last for a hundred years if of appropriate quality and stored properly. But these media have not been in existence for long enough for us to be certain, whereas we can be sure that acid free paper will last for 500 years; and, given those 1500-year old Chinese cut-outs, probably longer. Apart from the natural decline of the medium, digital storage is potentially subject to the same problems as physical media: destruction as a result of natural calamities. When we look at the intelligibility issue the problem may be even more difficult than discovering the equivalent of the Rosetta stone for lost languages. http://www.informationr.net/tdw/publ/papers/ASIST2011.html
3/8
01/02/2017
Preservation: the final frontier?
There are two aspects to the problem: first, obsolete file formats may prove unreadable with the latest version of the relevant software. My word-processor, Neo-Office, offers me nineteen different formats in which to save files, and Photoshop also offers me nineteen different formats in which to save my pictures. How many more formats there are, I cannot imagine - the Wikipedia article lists more than eighty image formats; how many more are obsolete, I can only guess. And let's not go into video and film formats. Intelligibility has another dimension, however: what if we can read the files technically, but the language itself is obsolete: after all, how many English speakers can read Anglo Saxon? And that was only a thousand years ago. If we manage to preserve our records for that long, what are the chances that what we call modern English - or French, or German, or Russian - will be intelligible to those who then try to decipher those texts? Language changes, forms of expression become obsolete, the referents disappear from use and the words lose any meaning at all. The slide shows names of occupations that were common a hundred years ago: who knows them now? [Farmer, Baker, Prepared clay balls for potter, Licensed pauper, Water diviner, Inn keeper, Corn merchant, Weaver, Hairdresser, Carpenter, Sheep stealer, Hedge layer, Thief, Shoe maker]. (Hall, n.d.) To take a more recent example, 3D television was known as stereoscopic television, when the first demonstrations were shown in August 1928 (Tiltman, 1928), in the Baird television laboratories in London, and when Philips was experimenting about 30 years ago, it was still known by that name. Even modest changes in the language can result in a failure to find what we want, because we do not know the words that were used to describe the phenomenon in the past. Imagine a couple of thousand years from now: Mother Nature has finally taken her revenge on Man and wiped out the human race. A spaceship from a distant civilization lands near one of Google's server farms and is able by some means or other to fire up the systems. How on earth do they make sense of what they find? Perhaps, 3000 years from now, it won't be aliens but humans discovering a lost civilization: would they be able to 'read' the data any more easily than the aliens? What kind of Rosetta stone would they need to decode the files? Genuine preservation of the digital record means taking note of the: the software employed in the creation of the object; the hardware employed; the record format or formats (e.g., html files with embedded video files); the hardware designed for displaying the file; the software for displaying the file; the standards observed by all of these and, of course, the physical security of storage medium and hardware.
The aim of preservation is then to ensure indefinite, long-term access and use of the digital object regardless of changes in hardware and software technologies or file formats. We know what happens when we get this wrong. You may have heard of the BBC's Domesday project, the idea of which was to replicate, after a fashion, the Domesday Book created after the Norman conquest of 1066. Rather than recording all taxable http://www.informationr.net/tdw/publ/papers/ASIST2011.html
4/8
01/02/2017
Preservation: the final frontier?
properties - down to the merest chicken - the idea was that schoolchildren and community groups should record the current state of their towns and villages. Ultimately, almost 148,000 pages of text and more than 23,000 photographs were recorded on what was the latest technological innovation - video discs. This was in 1986, at the beginning of the PC revolution and this was just about the only technology that could cope with the material. However, video discs were not a commercial success and, as a result of their withdrawal from the market, the material was essentially lost. It took various projects from 1999 to this year, the 25th anniversary of the project to extract the material from the original video discs and make part of it available on the Domesday Reloaded Website. Those interested in the techniques can find an interesting account on another site devoted to the project, where the site author, Andy Finney, argues that the only way to avoid digital data loss, is for data to be regularly copied on to whatever is the current storage medium of choice. This isn't only a digital problem. Vast archives of valuable television programmes stored on obsolete videotape formats… have been and are being copied on to current tape formats and even on to servers... It's not only Domesday's videodiscs that are obsolete: the videotape machines the masters were recorded on have also gone the way of the Betamax.
Migration is not new—papyrus rolls decayed and had to be re-copied and when a new medium, parchment, came along, they were copied to parchment and in ancient China, when bamboo strips decayed, the texts had to be copied on to new strips and then, when it was invented, on to paper. Finney suggests that migration is the only way to prevent data loss but serial migration of a text or image from one format to another could, in fact, result in data loss through corruption of the file in the process or through inaccurate copying. A further problem lies in the interactivity built into many Web pages today: how do we ensure that the interactivity is still possible in the migrated copy? The alternative to migration is emulation, the process whereby computer systems emulate the original hardware and software combinations to access the preserved material. It is agreed for this to happen it will be necessary to preserve not only the original bit-stream but sufficient metadata to define the nature of the file, the software and hardware used to produce the file, and the file format, or, in the case of multimedia files, formats. Again, ensuring 100% accuracy in emulation, that is, presenting the file exactly as if it was being read by the original software, is not going to be easy.
What should be preserved? The period of time for which material needs to be preserved also brings to mind the problem of what should be preserved. It seems that the unspoken assumption underlying many projects is that everything should be preserved, but is this the case? We get by quite happily knowing that vast swathes of the records of previous cultures have been lost, presumably future generations will get on with their lives in an equally satisfactory manner. Take Victorian literature, we've all heard of Dickens, Trollope, Thackeray, and George Elliot, but who recalls Harriet Smythies, Rhoda Broughton, or George Mogridge and dozens more who never made it into the critical consciousness? As far as digitally-recorded documents and records are concerned, three years ago IDC produced a report on the 'exploding digital universe' (that is, 'information that is either created, captured, or replicated in digital form') which forecast that by this year, 2011, the digital universe would be 10 times the size it was in 2006 - 1,800 exabytes - one thousand eight hundred billion gigabytes. The number of digital bits involved is now bigger than the number of stars in the universe and the report estimated that we shall soon (i.e., by 2026) reach Avogadro's number - the estimate of the number of carbon atoms in 12 grams - or 602,200,000,000,000,000,000,000, or 6.022 x 1023 - it may be puzzling as to how this is http://www.informationr.net/tdw/publ/papers/ASIST2011.html
5/8
01/02/2017
Preservation: the final frontier?
happening, but when you consider that every RAW image shot by my 18 megapixel Canon camera measures between 40 and 45 Mb, we can have some idea of how digital photography alone is contributing to that number. (IDC, 2008) In 2008 we had already reached the point at which more digital material was being produced than it was possible to find permanent storage for. Much of what is produced is transient, of course, but that still leaves decisions about what we can actually afford to preserve. What criteria might we evolve to determine what should be preserved? The historians and cultural anthropologists and sociologists would say, 'Preserve everything' - but much is transient and is lost almost as soon as it is produced, and so is not available to be preserved in any event. Much of recorded business activity is kept for varying periods of time depending upon the legal requirements, and the same is increasingly true of government business. So we can't preserver everything, because not everything is captured. Generally we think of future value and cultural or historical interest as the primary criteria for preservation, but these suffer from the same problem: the impossibility of knowing what future generations may find of value or interest. We can only use our ideas of what is of value and interest to us in making these decisions and the difficulty here lies in a) getting agreement on what is presently valuable and interesting; and b) finding the time to implement the criteria we evolve. My guess is that what digital material gets preserved in the very distant future will continue to be as much a matter of accident as what has been preserved from the past.
Conclusion We live with a paradox: we wish to preserve the record of our culture for the future, but we do not know the nature of that future and we do not know what will be valued in the future. Previous societies have generally stored cultural objects for their own future use, the assumption has been that things will continue as they are. But they never do continue as they are. Empires decline and fall, societies implode, languages and their associated cultures die - the only constant is change, a phenomenon known at least since Heraclitus pronounced upon it two and a half thousand years ago. Should we, then, give up on the idea of preserving for the future? I think not: the problem we face is that the relevant technologies now change with amazing speed. The whole of computer development from the first programmable, electronic computer, the Colossus machine used to crack the Enigma codes, to the iPad is contained within the life-time of some of us here today. I have worked with the mainframe, the minicomputer, the microcomputer, several models of PDA, the laptop, the tablet computer, the mobile phone, different forms of computer input devices, storage devices and output modes. I've even worked in an organization that had an analogue computer as well as one of the first commercially produced digital computers. Is it likely, then, that the technology will remain as it is now over the next fifty, sixty or eighty years? In other words, we need preservation technologies to ensure that what is produced now is useable in the immediate future, rather than 2,000 years from now.
http://www.informationr.net/tdw/publ/papers/ASIST2011.html
6/8
01/02/2017
Preservation: the final frontier?
Giving the complexity of the problem, it seems unlikely that one single strategy for preservation will be sufficient and I think it is likely that multiple strategies will be employed in the future: simple transfer of files from a medium that is at risk, to a new copy of the same medium (DVD to new DVD for example), migration from one medium and/or format to another - hoping, in the case of formats, that all characteristics of the file will be retained, and emulation. There is, perhaps, an emerging realisation that this is inevitable: you may have seen accounts of the action taken by the founder of the Internet Archive, Brewster Kahle, to back up the digital versions of books by creating a physical archive of books (Internet... 2011) after becoming concerned about the number of books being pulped after digitization. It's ironic, isn't it, that a digitizing agency should start collecting books in case the digitized record is destroyed! Perhaps not so ironic: one of the assumptions of the very idea of digital preservation is that our technology-based civilization with endure. But, the technology requires energy and the dominant fuel for energy production is oil. The US Geological Survey estimated in 2004 that oil production will peak in 2026 to 2047, depending upon various circumstances, and decline thereafter. Running out completely about the end of the century. (Wood et al., 2004) This is an optimistic estimate: others are more pessimistic: ASPO (the Association for the Study of Peak Oil and Gas) believed that the peak would be reached last year; Deutsche Bank in three years from now; and the International Energy Agency that it will happen in 2030 (Endoil.org, n.d.). Whatever the accuracy of estimates, oil production will peak at some point this century - and then how will our energy-greedy technological civilization carry on? We always assume that human ingenuity will overcome all of the problems we face, but the "what if" scenarios are not comfortable. No matter what we do, we can never guarantee the persistence of our culture over time, we can only make our best efforts to see to it that the record endures—intelligibly. And, perhaps, if there are texts and representations of things that we wish to see endure, believing them to be either of potential value in the future or a necessary record of the present state of our culture, we might follow Brewster Kahle and ensure that we have enough paper, or perhaps start employing stone masons.
References British Museum. (n.d.) The Rosetta stone: translation of the demotic text . Retrieved 6 October, 2011 from http://www.britishmuseum.org/explore/highlights/article_index/r/the_rosetta_stone_translation.aspx (Archived by WebCite® at http://www.webcitation.org/62hX5EhIA) British Museum. (n.d.). Writing-tablet with a letter inviting Sulpicia Lepidina, the commander's wife, to a birthday party. Retrieved 6 October, 2011 from
http://www.informationr.net/tdw/publ/papers/ASIST2011.html
7/8
01/02/2017
Preservation: the final frontier?
http://www.britishmuseum.org/explore/highlights/highlight_objects/pe_prb/w/tablet_with_a_party_invitation.aspx . (Archived by WebCite® at http://www.webcitation.org/62hWxIFcu) Chinese paper cutting. (2011). Wikipedia. Retrieved 6 October, 2011 from http://en.wikipedia.org/wiki/Chinese_paper_cutting . (Archived by WebCite® at http://www.webcitation.org/62hXBqGp9) Endoil.org. (n.d.). Peak oil. Retrieved 6 October, 2011 from http://www.endoil.org/site/c.ddJGKNNnFmG/b.4090057/k.D193/Peak_Oil.htm (Archived by WebCite® at http://www.webcitation.org/62hXHJ5cb) Hall, R. (n.d.). Old occupation names. Retrieved 11 September, 2011 from http://rmhh.co.uk/occup/a.html (Archived by WebCite® at http://www.webcitation.org/62hXOrtRb) Heaney, S. and Hughes, T. (eds.) (1982). The rattle bag. London: Faber and Faber. IDC. (2008). The diverse and exploding digital universe: an updated forecast of worldwide information growth through 2011. Framingham, MA: IDC. Retrieved 6 October, 2011 from http://www.emc.com/collateral/analystreports/expanding-digital-idc-white-paper.pdf (Archived by WebCite® at http://www.webcitation.org/62hXfv8b4) Internet Archive founder turns to new information storage device - the book. (2011, August 1). Guardian. Retrieved 11 September, 2011 from http://www.guardian.co.uk/books/2011/aug/01/internet-archive-booksbrewster-kahle . (Archived by WebCite® at http://www.webcitation.org/62hXksbwG) Tiltman, R.F. (1928, November). How "stereoscopic" television is shown. Radio News, Retrieved 6 October, 2011 from http://www.bairdtelevision.com/stereo.html (Archived by WebCite® at http://www.webcitation.org/62hXrly5J) Wood, J.H., Long, G.R. and Morehouse, D.F. (2004). Long-term world oil supply scenarios: the future is neither as bleak or rosy as some assert. Retrieved 6th October, 2011 from http://www.eia.gov/pub/oil_gas/petroleum/feature_articles/2004/worldoilsupply/oilsupply04.html (Archived by WebCite® at http://www.webcitation.org/62hY4UFt1) How to cite this paper
Wilson, T.D. (2011). Preservation: the final frontier? Keynote paper presented at the ASIST Annual Meeting, New Orleans, 9th October, 2011.
http://www.informationr.net/tdw/publ/papers/ASIST2011.html
8/8