Report on Automated re-Appraisal

6 downloads 0 Views 999KB Size Report
DELOS Deliverable 6.10.1. 2. Catalogue Entry. Title. Report on Automated re-Appraisal: Managing Archives in. Digital Libraries. Creator. Gillian Oliver. Creator.
DELOS Deliverable 6.10.1

Project no.507618 DELOS A Network of Excellence on Digital Libraries Instrument: Network of Excellence Thematic Priority: IST-2002-2.3.1.12 Technology-enhanced Learning and Access to Cultural Heritage

Deliverable 6.10.1: Report on Automated re-Appraisal: Managing Archives in Digital Libraries Due date of deliverable: Actual submission date: 30 January 2008 Start Date of Project: 01 January 2004 Duration: 48 Months Organisation Name of Lead Contractor for this Deliverable: University of Glasgow Final Project co-funded by the European Commission within the Sixth Framework Programme (2002-2006) ______________________________________________________________ Dissemination Level: PU (Public)

1

DELOS Deliverable 6.10.1 Catalogue Entry

Title Creator

Report on Automated re-Appraisal: Managing Archives in Digital Libraries Gillian Oliver

Creator

Seamus Ross

Creator

Maria Guercio

Creator

Cristina Pala

Subject

Appraisal; Digital Records

Description

Report on methodologies for determining the significance of digital information objects, with recommendations for automation of the appraisal function.

Publisher

DELOS NoE for HATII at the University of Glasgow

Contributor

Milena Dobreva

Date ISBN

2-912335-40-X

Type

Text

Format Language

English

Rights

© HATII at the University of Glasgow

Citation Guideline:

Oliver, G., Ross, S., Guercio, M., Pala, C.: Report on Automated reAppraisal: Managing Archives in Digital Libraries (Glasgow: DELOS NoE, January 2008).

2

DELOS Deliverable 6.10.1

Contents Executive Summary .......................................................................................................... 5 1.0 Introduction................................................................................................................ 7 2.0 Key Concepts and Definition of Terms .................................................................. 8 2.1 Determining Significance.................................................................................. 8 2.2 Information Management............................................................................... 10 2.3 Item-Level Appraisal ....................................................................................... 11 3.0 International Standards ............................................................................................ 13 3.1 ISO 14721 OAIS Reference Model .............................................................. 13 3.2 ISO 15489 Records Management.................................................................. 14 3.3 ISO 23081 Metadata for Records .................................................................. 14 4.0 Current Practices ....................................................................................................... 15 4.1 Information for Awareness/Entertainment ................................................ 15 4.2 Information for Accountability ..................................................................... 18 4.2.1 Current Appraisal Strategies........................................................................ 20 4.2.2 The Process ................................................................................................... 23 5.0 Current Research ...................................................................................................... 27 5.1 InterPARES ...................................................................................................... 27 5.2 Paradigm ........................................................................................................... 28 5.3 Metadata ............................................................................................................ 30 5.3 Appraisal in a Digital World........................................................................... 32 6.0 Issues .......................................................................................................................... 32 6.1 Is it Necessary to Determine the Significance of Digital Information?... 32 6.2 Is the Process of Determining Significance Fundamentally Flawed? ...... 34 7.0 Principles, Requirements and Criteria for the Appraisal of Digital Objects .... 36 8.0 Automation ................................................................................................................ 45 8.1 Using Metadata and Genres to Determine Significance ............................ 48 8.1.1 Genres ............................................................................................................ 55 8.1 Models of Automation .................................................................................... 57 9.0 Summary of Recommendations ............................................................................ 58 10.0 Conclusions ............................................................................................................. 61 11.0 References ................................................................................................................ 62 Appendix 1: Summary of Findings from InterPares................................................. 67 Appendix 2: Source Documents Used to Provide Initial List of Criteria .............. 69

3

DELOS Deliverable 6.10.1

Tables Table 1: Specific factors considered in the appraisal of digital information objects ............................................................................................................................................ 41 Table 2: Crosswalk showing relationship between appraisal factors and questions that could be answered using metadata values............................................................ 49 Table 3: Metadata elements that could be used to assist in appraisal decision making ............................................................................................................................... 51

Figures Figure 1 Factors influencing the determination of value of information ................. 9 Figure 2 Differentiation of purpose for the two main communities involved in information management ............................................................................................... 10 Figure 3: Potential appraisal points in relation to OAIS Model .............................. 13 Figure 4: Categories used for the grouping of appraisal criteria............................... 37 Figure 5: Appraisal criteria that comprise each category ........................................... 38 Figure 6: Application areas for appraisal criteria documented in Table 1 .............. 40

4

DELOS Deliverable 6.10.1

Executive Summary This Task will investigate how the process of automatic re‐appraisal of digital  holdings (resulting in either the disposal or the retention of an object) may be  effectively  handled  in  the  context  of  digital  libraries.  At  different  times  after  material has been ingested into a repository (which is what a digital library is)  it  may  be  necessary  to  re‐assess  whether  the  ingested  material  should  be  retained  or  disposed  of.  Given  that  this  activity  will  often  concern  a  substantial quantity of digital objects, it would be sensible to automate such a  re‐appraisal  process,  identifying  what  objects  need  to  be  removed  based  on  pre‐defined  criteria.  This  Task  started  in  JPA3  and  work  on  the  Task  capitalized  on  the  results  of  other  tasks  in  the  cluster  (T6.4,  T6.5,  T6.6  and  T6.7).    Our  first  activity  has  been  to  examine  the  appraisal  function  in  the  digital archives and libraries, in order to identify on which basis the rules for  retention  (or  disposal)  have  been  insofar  defined  and  applied.  A  significant  event contributing to this was the Appraisal in the Digital World Conference  held  in  Rome,  November  15‐17,  2007.  Discussions  at  this  conference,  which  involved  a  range  of  international  speakers,  informed  the  findings  of  this  report.  Given the proliferation of digital information of all types and the challenges of  preservation,  identifying  what  subset  of  that  information  is  actually  worth  keeping  is  critical.    This  reports  investigates  the  relevance  of  processes  to  determine  what  is  ‘worth  keeping’  for  digital  libraries  and  suggests  ways  in  which  technology  can  be  used  to  automate  processes.    The  findings  are  particularly  applicable  to  documents  created  in  uncontrolled  environments  and to libraries.  Appraisal,  the  determination  of  the  worth  of  preserving  information,  continues  to  be  significant  in  the  digital  environment.  Furthermore,  the  concept is applicable beyond the recordkeeping domain in which it originated.   A number of strategies have been identified to undertake appraisal, any one of  which,  or  combination  of,  may  be  appropriate  to  a  specific  information  community  or  domain.    In  considering  the  automation  of  the  appraisal  function in the context of a digital library or archives, it becomes clear that this  will include the assessment of individual items. Automation enables a level of  granularity  that  is  rarely,  if  at  all,  possible  in  the  case  of  manual  appraisal  methods,  without  loss  of  cognizance  of  the  placing  of  items  within  the  aggregation that they belong. The results of item‐level assessment can inform  the overall appraisal determination. 

5

DELOS Deliverable 6.10.1

Research underway on metadata extraction, together with the structurational  view of genres, shows a great deal of promise for the digital library developer  and  user  communities.      In  addition,  the  technological  possibilities  now  present to facilitate input of other voices into the selection of information that  has value for communities open up a way forward to a new information age,  one that need no longer be exclusively defined by dominant societal forces. As  a result of our analysis of the approaches to appraisal we identified a series of  appraisal  criteria  and  structured  these  so  that  we  can  represent  them  as  appraisal  rules.  Rules  are  susceptible  to  representation  as  active  knowledge  components.  In considering the next steps, this representation suggests three  models of automation:  

Hybrid:    A  combination  of  manual  and  automated  decision  making.   For  instance,  application  of  functional  appraisal  methodology  supplemented  by  subsequent  automated  triage  to  determine  the  feasibility of preservation at the item level. 



Appraisal  engine:    Where  a  document  is  submitted  to  an  appraisal  engine for analysis using a combination of text mining and rule‐based  reasoning. 



Profiler:    The  development  of  a  prototype    to  review  a  variety  of  information  object  types  (image,  document,  dataset  for  example)  and  apply  appraisal  rules,  probably  again  using  rule‐based  reasoning  methodologies. 

It  is  though  critical  that  when  digital  objects  such  as  documents  are  selected  for  destruction  or  retention  that  why  the  disposal  decisions  were  taken  be  recorded. One of the strengths of automation is that it  provides  this chain of  evidence.  

6

DELOS Deliverable 6.10.1

1.0 Introduction  Given the proliferation of digital information of all types and the challenges of  preservation,  identifying  what  subset  of  that  information  is  actually  worth  keeping is critical.  The aim of this report is twofold. Firstly, it is to establish  the  relevance  of  processes  to  determine  what  is  ‘worth  keeping’  for  digital  libraries.  Secondly, it is to make recommendations for the use of technology  to  improve  the  efficiency  and  effectiveness  of  decision‐making.    The  report  begins  with  an  overview  of  key  concepts  and  clarification  of  terminology.   This  is  followed  by  an  account  of  current  practices  in  the  main  information  communities and findings from major research projects, including InterPARES  and  Paradigm.    Key  principles  as  distilled  from  the  literature  are  identified.  The  final  section  considers  possibilities  for  automation  and  what  kind  of  experimentation is needed if we are to really develop automated appraisal in  digital libraries and archives.  

7

DELOS Deliverable 6.10.1

2.0 Key Concepts and Definition of Terms Key  concepts  underlying  this  report  relate  to  determining  the  significance  of  information,  the  different  purposes  for  which  information  is  managed  in  digital repositories, and item‐level appraisal. One of the particular challenges  of  this  report  has  been  associated  with  terminology.    Attempting  to  apply  a  methodology  (which  is  not  standardized  even  within  its  “home”  domain  of  archival science) to information managed by other communities provides rich  potential  for  misunderstandings  and  confusion,  not  to  mention  conflict.  The  theoretical  framework  of  the  information  continuum  is  one  device  used  to  address  this  (see  2.2  below).  In  addition,  definition  of  the  most  contentious  terms is provided here and reiterated in the body of the text where necessary.   However, to achieve real progress in this area of determining the significance  of  digital  objects  consideration  should  be  given  to  achieving  consensus  and  agreement between all disciplines as to appropriate terminology.   

R2.0.1  A glossary should be developed of terminology relating to the entities  and  processes  associated  with  determining  the  significance  of  information.   Definitions  should  be  acceptable  from  the  perspective  of  all  information  management occupations. 

2.1 Determining Significance Determining the significance, value or worth of information has always been a  fundamental  concept  for  memory  institutions,  including  libraries,  archives  and museums and continues to be problematic for digital collections (Pymm,  2006).  ‘Appraisal’ is the methodology used in recordkeeping to determine the  significance of records, resulting in the designation of some as worthy of long‐ term preservation.  The term ‘appraisal’ is used in this report to apply to the  process of determining significance of any information object.   Adding  to  a  collection  always  has  resource  implications,  including,  for  instance,  initial  purchase,  cost  of  processing,  storage  and  so  on.    At  various  points, decisions that should be underpinned by an assessment of the value of  an item or an aggregation relative to the costs involved are made – although  this may not be made explicit.  Decisions relate to whether or not to acquire,  and then at different stages whether or not to retain as part of the collection.   The  act  of  determining  what  has  significance,  or  what  is  worth  keeping,  has  been  recognized  as  a  complex  act  influenced  by  ideological,  political,  economic, cultural and social factors (Lloyd, 2007).   

8

DELOS Deliverable 6.10.1

Ideological factors

Social factors

Political factors

Cultural factors

Economic factors

Figure 1 Factors influencing the determination of value of information

The  perspective  taken  in  this  report  recognises  this  complexity  and  suggests  that  it is now  possible to formulate ways of addressing these issues with the  use of technology.     

R2.1.1  Technological solutions to determining the significance of information  must  take  into  account  ideological,  political,  economic,  cultural  and  social  factors. 

9

DELOS Deliverable 6.10.1

2.2 Information Management – Different Purposes, Different Perspectives The  purpose  for  which  information  is  managed  is  critical  in  determining  the  approach  to  its  management.    The  Information  Continuum  model  (IC)  (Schauder,  Stillman,  &  Johanson,  2005)  provides  a  useful  framework  for  analysis of activities undertaken by the different professional communities of  librarians and archivists.    The primary purpose of activities in the library community is the management  of information for awareness or entertainment.  This has been re‐stated in the  community networking context as information to maximise opportunity, and  information to enhance living (Schauder et al., 2005).  The primary purpose of  the  recordkeeping  community  is  the  management  of  information  for  accountability – or information to minimise risk (Schauder et al., 2005).   

Figure 2 Differentiation of purpose for the two main communities involved in information management

The identification of a primary purpose for a community (or information type)  does  not  imply  that  other  purposes  are  excluded  or  are  not  present,  but 

10

DELOS Deliverable 6.10.1

reflects a greater emphasis accorded to the purpose designated as likely to be  primary.    As  digital  repositories  cannot  always  be  as  easily  segmented  into  either  the  library  or  recordkeeping  domain  and  the  situation  becomes  more  confused as the terms ‘digital library’ and ‘digital archive’ often appear to be  used interchangeably, reference will be made in this report to ‘information for  awareness’ and ‘information for accountability’.  An example of the practical  application  of this theory to a university repository  environment is provided  by  Andrew  Treloar  and  colleagues  at  Monash  University  (Treloar,  Groenewegen, & Harboe‐Ree, 2007).   

R2.2.1  Appraisal  methodologies  must  be  “fit  for  purpose”  –  i.e.,  take  into  account  the  purpose(s)  for  which  information  is  being  managed:  accountability, awareness and/ or entertainment.  

2.3 Item-Level Appraisal ‘Item’  is  used  in  this  report  as  term  that  includes  simple  and  complex  information  objects  –  e.g.  document,  record,  website,  image.    The  common  characteristic  of  these  information  objects  is  that  intellectually  they  are  regarded as a single entity.  This concept is taken from the information object  that is defined in the DELOS Digital Library Reference Model (Candela et al.,  2007,  p.75).    Here,  an  information  object  is  described  as  potentially  being  a  multimedia and multi‐type object with parts, such as a sound recording with  slides,  political  and  economic  data  with  interactive  simulations,  or  a  data  stream  representing  the  pool  of  data  continuously  measured  by  a  sensor  Information  objects  belong  to  collections,  or  sets  of  resources  (Candela  et  al.,  2007, p.80).  The  concept  of  item  level  appraisal  is  a  contentious  one  for  many  records  managers  and  archivists  as  a  key  motivator  for  many  of  the  advances  in  appraisal theory (see section 4.2 Information for Accountability) has been the  increase  in  the  sheer  quantity  of  records  generated,  and  the  impossibility  of  reviewing  those  records  individually.    However,  working  from  the  premise  that  processes  can  be  automated  means  that  review  at  item  level  is  feasible.   The opportunity now exists to build pragmatic tools to be used in conjunction  with advances in archival appraisal theory; the challenge lies in achieving the  appropriate balance.    Appraisal at item level does not imply that relationships between items will be  disregarded  or  destroyed.    On  the  contrary,  the  assumption  underlying  all  references  to  item‐level  appraisal  is  that  the  metadata  of  digital  objects  will  reflect those contextual relationships in a way that could not be envisaged in 

11

DELOS Deliverable 6.10.1

the paper world.  As a consequence of this assumption, although not strictly in  scope for this particular report, section 5.3 Metadata and section 8.1.1 Genres  contain recommendations relating to description.   R2.3.1  Item‐level  appraisal  should  be  considered  as  a  tool  to  be  used  in  the  context  of  an  appropriate  theoretical  framework,  and  does  not  imply  the  destruction of contextual relationships.  

12

DELOS Deliverable 6.10.1

3.0 International Standards The  standards  that  are  relevant  to  the  discussion  in  this  report  are  those  relating  to  the  Open  Archival  Information  System  (OAIS)  reference  model  (International  Organization  for  Standardization,  2003),  records  management  (International  Organization  for  Standardization,  2001)  and  metadata  for  records (International Organization for Standardization, 2006). 

3.1 ISO 14721 OAIS Reference Model In  terms  of  the  OAIS  reference  model  (International  Organization  for  Standardization,  2003),  the  process  of  determining  significance  should  commence  as  one  of  the  pre‐ingest  activities.    It  can  be  envisaged  as  taking  place  as  part  of  the  preliminary  phase  of  the  producer‐archive  interface  methodology, although it is not made explicit in this standard  (Consultative  Committee  for  Space  Data  Systems,  2004).    In  addition,  further  checks  on  information objects which will contribute to final appraisal decisions could be  carried out in as part of ingest functionality.  Re‐appraisal, however, could be  considered as an activity associated with Preservation Planning.   

  Figure 3: The red stars indicate potential appraisal points in relation to OAIS Model

 

R3.1.1  Appraisal  may  take  place  prior  to  ingest,  on  ingest  and/or  as  part  of  Preservation Planning functionality. 

13

DELOS Deliverable 6.10.1

3.2 ISO 15489 Records Management This standard (International Organization for Standardization, 2001) provides  a  high‐level  framework  for  recordkeeping  and  establishes  benchmarks  for  good  records  management  practice.    If  digital  records  are  created  and  maintained  in  accordance  with  this  standard  (and  ISO23081),  appraisal  strategies  are  likely  to  be  top‐down  (see  section  4.2.1  below)  and  item‐level  review  may  not  be  required.    In  this  report,  the  standard  has  been  used  to  define  a  record,  and  the  characteristics  of  records  (usability,  authenticity,  reliability, integrity). 

3.3 ISO 23081 Metadata for Records In conjunction with ISO 15489, ISO 23081 is critical for best practice in records  management.   The standard is unequivocal about the importance of metadata  in  recordkeeping,  that  is,  the  records  management  and  archival  areas  of  activity:  “…metadata are structured or semi-structured information that enables the creation, registration, classification, access, preservation and disposition of records through time and within and across domains …[and] can be used to identify, authenticate, and contextualize records and the people, processes and systems that create, manage, maintain and use them and the policies that govern them.” (International Organization for Standardization, 2006).

The standard  describes  how metadata must  be  initially assigned  at the  point  of creation of a record, and then layers should continue to be assigned, either  automatically  or  manually,  reflecting  different  contexts,  usages,  systems,  as  necessary.  Without metadata, authenticity cannot be assessed.     

R3.3.1  Records  created  and  maintained  in  accordance  with  ISO  15489  and  ISO 23081 may not require appraisal at item level.  

 

14

DELOS Deliverable 6.10.1

4.0 Current Practices Whether  information  is  retained  and  managed  in  order  to  provide  awareness/entertainment  or  accountability  has  influenced  the  different  approaches that have been taken to determining significance by librarians and  archivists.    

4.1 Information for Awareness/Entertainment Consideration  of  the  significance  of  items  in  library  collections  has  been  the  subject of very little debate (Lloyd, 2007; Pymm, 2006).  One reason for this is  suggested  to  be  because  very  few  libraries  build  collections  for  permanent  retention (Pymm, 2006).  This points to a key factor that characterises library  selection: there is a likelihood that it will be focused on a current, known, user  group  with  identified  needs.    The  extent  to  which  this  is  the  case  will  vary  according to library type and purpose. A consequence of the focus on current  needs is that determining significance for libraries may be an ongoing matter  consisting  of  two  activities  –  selection  and  de‐selection  (weeding)  when  information is no longer required.    Library  concerns  relating  to  the  selection  of  materials  have  resulted  in  the  development  of  solutions  that  have  focused  on  the  analysis  of  collections.  Conspectus was one such solution, a methodology developed by the Research  Libraries Group (RLG) for the assessment of collections in research libraries in  the  1980s  and  subsequently  adopted  by  many  countries  and  other  library  types.      Libraries  acquire  information  products  –  unlike  records,  information  products are not unique so a key concern has to be the consideration of library  resources at a level higher than the individual institution.  Library collections  in aggregation should encompass the universe of knowledge, but unless there  is systematic and careful collaboration some subject areas will be characterised  by a duplication of resources while others will be poorly represented.  Use of  the  Conspectus  methodology  enabled  libraries  to  both  assess  the  extent  and  level  of  subject  coverage,  and  to  contribute  to  national  assessments.    The  methodology was complex and labour‐intensive and the approach was seen as  becoming  less  and  less  relevant  in  the  context  of  the  increasing  ubiquity  of  digital information (Burke, 1998; OCLC, 2007).    Consequently at the end of the 1990s Conspectus was discontinued and RLG  focused its attention on improving electronic access to collections, and getting  more  resources  online  (OCLC,  2007).    The  primacy  of  concerns  about  access  are  reflected  in  the  literature  relating  to  the  changing  role  of  collection  development  in  the  digital  age,  where  the  establishment  of  purchasing 

15

DELOS Deliverable 6.10.1

consortia  are  a  key  objective  (Dorner,  2004).    Collaborative  approaches  to  collection  development  continue  to  be  encouraged  as  a  means  of  avoiding  duplication  of  effort  while  maintaining  sufficient  technical  and/or  geographical  redundancy  (Day,  Pennock,  &  Allinson,  2007).    These  authors  suggest that collection development policies need to specify object types (file  formats)  as  well  as  content  types  (for  example,  peer  reviewed  articles,  dissertations).    We  need  to  understand  the  nature  of  entities  that  we  are  dealing  with,  but  it  is  more  useful  to  think  about  this  in  terms  of  representation and encoding, rather than file formats.  The  need  to  capture  information  from  the  World  Wide  Web  has  led  to  some  new developments for libraries.  A collection development approach has been  advocated,  and  a  template  for  development  plans  have  been  developed  and  trialled  (Murray  &  Hsieh,  2006;  Murray  &  Phillips,  2007).    Cobb  and  colleagues  distinguish  two  models  for  selection  of  web  content  for  digital  libraries:  those  centred  on  the  item  (bibliocentric)  and  those  centred  on  technology (techno‐centric)1.  In the biblio‐centric model each item is assessed  in accordance  with rigorous criteria relating to its relevance to the collection.   This labour intensive approach results in high quality, low volume collections.   The  techno‐centric  approach  emphasises  comprehensive  collection  building  using software such as a web crawler.  The end result of this, it is suggested,  places  the  burden  of  selection  on  the  end‐user  rather  than  the  curating  institution  (Cobb,  Pearce‐Moses,  &  Surface,  2005).    Applying  the  archival  principles  of  provenance  and  original  order,  they  suggest,  offers  a  middle  ground  worthy  of  exploration.    This  ensures  that  aggregations,  rather  than  individual documents, are the focus of effort.  (See also discussion relating to  national libraries and legal deposit, below.)  In  the  analogue  world,  the  information  collected  and  organised  by  libraries  was likely to be clearly structured and identifiable by bibliographic data. This  bibliographic data has appeared in the publications themselves since the 1960s  (‘cataloguing in publication’).  In the digital environment this may still be the  case  but  there  will  also  be  increasing  volumes  of  much  less  formally 

1

A third model identified by Pearce-Moses and Kaczarek focused on the development of standards and metadata schema and collaboration with webmasters. This was found to be unsuccessful, due in part to lack of understanding on the part of the webmaster and also to high turnover (Pearce-Moses & Kaczmarek, 2005).

16

DELOS Deliverable 6.10.1

structured  information  such  as  websites  that  will  be  of  value  to  library  collections.  It  is  this  increasingly  ubiquitous  nature  of  “library”  information  and the consequent  exponential  increase  in acquisition of digital  information  by libraries that suggest the methodologies of archival science are applicable,  or at least worthy of scrutiny.    Activities  undertaken  by  national  libraries  are  of  particular  interest,  as  the  scope  and  scale  of  operations  imply  that  manual  selection  procedures  are  unlikely  to  be  sustainable.  The  parameters  for  collection  development  for  many national libraries are established by legal deposit regimes – for example,  legal requirements on publishers in that country to deposit one or more copies  of  publications.  However,  in  the  digital  world  not  only  is  the  definition  of  ‘published’ problematic, the locus of publication becomes less and less clear.  “… the concept of ‘national’ publications is becoming increasingly ambiguous in a world in which management and service delivery of publications may occur in a number of locations” (JISC, 2007).

Nonetheless, a state of the art report into current practice by national libraries  in the digital preservation sphere found that most of the 15 countries surveyed  were  at  least  exploring  the  extension  of  existing  legal  deposit  legislation  to  encompass  digital  objects  (Verheul,  2006).    This  survey  also  found  that  although  currently  all  15  libraries  accept  all  formats,  most  libraries  showed  awareness  of  the  need  to  limit  or  regulate  file  formats  accepted  into  their  collections.    An  overall  finding  was  that  in  this  specific  national  library  domain,  there  will  be  an  increasing  emphasis  on  developing  selection  methodologies.  The two main reasons for this are that there will be a need to  establish  limits  from  a  storage  perspective  and  also  because  of  the  costs  involved in long‐term preservation (Verheul, 2006).     New  Zealand’s  legal  deposit  legislation  does  extend  to  digital  objects.    A  strategy  with  the  potential  for  automation  recommended  to  assist  with  collection development in this setting is ‘nominated automated deposit’.  Four  categories of nominated deposit can be identified:  solicited/requested by the  library,  provided  on  a  contractual  basis,  initiated  by  creator  and  initiated  by  the general public (Ross, 2003, p. 21).   The  selection  guidelines  developed  by  the  National  Library  of  Australia  for  online  publications  provide  insight  into  the  particular  challenges  faced  by  libraries.    These  challenges  include,  for  instance,  the  definition  of  a  ‘publication’,  definition  of  Australian  content,  the  problem  of  multiple  versions  and  particularly  the  need  to  define  the  parameters  of  a  publication.  

17

DELOS Deliverable 6.10.1

In this latter case there is a need to establish parameters to take into internal  and external links into account (National Library of Australia, 2005).    David Bearman has recently suggested that universal capture is the only way  in which the costs associated with selection at an individual institutional level  can be minimised and the problem of many copies of some things and none of  others can be addressed (Bearman, 2005). (For further information relating to  the  costs  associated  with  selection  see  Ross,  2003,  p.45  and  discussion  of  the  SEEDS cost estimation  model).   The onus for selection then would rest with  the user, and the librarians would focus on access concerns.  That model does  not  appear  to  be  particularly  far‐fetched  given  initiatives  such  as  Google  Books2  and UNESCO’s World Digital Library3   Even  given  a  universal  capture  model,  a  key  role  can  be  seen  for  continued  evaluation  or  re‐appraisal  of  the  objects  in  that  global  store  to  ensure  that  ongoing preservation is feasible.  

4.2 Information for Accountability Records,  the  information  objects  that  are  the  concern  of  archivists,  can  be  defined  as  the  evidence  of  business  transactions  (International  Organization  for Standardization, 2001).  Records are, therefore, ubiquitous – a record is (or  should  be)  created  each  time  an  interaction  takes  place.    Records  can  range  from the most mundane – bus or train tickets or till receipts for instance ‐ to  the  specialised  and  influential  such  as  high  level  policy  documents.    The  context  of  a  record,  including  (but  not  limited  to)  documentation  of  who  created  it,  why,  and  with  what  purpose  is  as  critical  as  the  informational  content of the record.    A  very  small  percentage  of  these  records  are  preserved  for  long  term.    The  need to decide which those records are, which will be of interest and value to 

2

(“a project to digitize the world's books in order to make them easier for people to find and buy” http://books.google.com/googlebooks/newsviews/) 3 “ The World Digital Library initiative will digitize unique and rare materials from libraries and other cultural institutions around the world and make them available free of charge on the Internet. These materials include manuscripts, maps, books, musical scores, sound recordings, films, prints and photographs.” http://portal.unesco.org/en/ev.phpURL_ID=40277&URL_DO=DO_TOPIC&URL_SECTION=201.html

18

DELOS Deliverable 6.10.1

future  generations,  has  led  to  the  development  of  appraisal  methodologies.   ‘Appraisal’  has  been  defined  as  ‘making  a  judgement  or  estimation  of  the  worthiness of continued preservation of records’ (InterPARES, 2000b, p.69).  A  key  difference  between  archival  appraisal  and  library  selection  is  the  requirement for archivists to predict future usage:  “The essential problem in appraisal is to learn how archivists can more from what we can know to some valid projection of what we apparently cannot know, that is, from what we can know about the documents to suppositions about their continuing value.” (Eastwood, 1993, p.112).

The  need  to  undertake  appraisal  activities  became  increasingly  acute  as  quantities  of  records  being  created  grew  exponentially  in  the  first  half  of  the  20th century.  It simply was neither possible nor desirable to keep everything –  the  resource  implications  for  management  and  storage  would  be  unsustainable.    In  the  United  States  during  the  early  1950s  Theodore  Schellenberg  devised  a  system  of  values  as  the  basis  for  appraisal  of  government  records  (Schellenberg,  2003).      This  appraisal  system  was  enormously influential not only in the United States, but also in other English  speaking  countries,  and  it  is  still  used  today,  despite  vigorous  criticism4.   Schellenberg’s system identified two types of value that could be accorded to  records  –  primary  and  secondary.    The  primary  value  is  the  value  of  the  record  to  the  organisation  that  created  it.    The  nature  of  this  value  could  be  either administrative (to support the  long‐term business of the organisation);  legal  (to  establish  obligations  and  protect  legal  rights);  or  fiscal  (to  provide  evidence of the receipt and use of funds).    The secondary  value  is the  value of the record to other users.   The  nature of  this  secondary  value  could  be  either  evidential  or  informational.    Evidential  value  exists  if  the  record  provides  documentation  of  the  ways  in  which  the  organisation functioned, its history or structure.   Informational value means  that the content would be significant to researchers because of the information  provided  about  persons,  places  or  subjects.    This  multi‐faceted  approach  to  defining value endeavours to take into account the requirements of a variety  of future users.  

4 For example: “I feel it is essential that Canadian archivists realize that the traditional approach to appraisal no longer works …” (Cook, 1992, p. 182).

19

DELOS Deliverable 6.10.1

Schellenberg’s  theory  has  been  the  subject  of  much  debate  in  the  archival  literature, resulting in other attempts to devise appropriate methodologies.  Of  particular interest to our investigation of the automation of appraisal is work  undertaken  to  model  the  elements  that  need  to  be  considered  when  undertaking appraisal (Boles & Young, 1985).  These authors identified three  interrelated  categories  of  elements,  each  of  which  should  be  applied  in  turn.   The  first  of  these  is  value  of  information,  and  it  includes  components  encompassing  circumstances  of  creation,  analysis  of  content  and  use  of  the  records.  The next category introduces consideration of cost implications into  appraisal decision making. (Boles and Young attribute the origins of this idea  to a government  archivist, G. Philip Bauer (Bauer, 1946).)  The final category  considers the implications of the appraisal recommendations, i.e., whether the  impact will be positive or negative on the repository.    In 1989, David Bearman challenged existing appraisal methodologies, arguing  that  these  approaches  to  appraisal  are  doomed  to  failure  because  of  three  factors  (Bearman,  1989).    Firstly,  records  must  have  been  created  and  maintained as records until the archivist appears to conduct appraisal, possibly  at a much later stage.  Secondly, the process is ‘people intensive’ – too much  human expertise is required.  The third and most significant reason for failure  is that   “we cannot know from examining records what societal requirements would be satisfied by their retention or destruction” (Bearman, 1995, p.383).

Bearman’s  proposal  to  address  this  was  that  selection  should  be  based  on  business function (see also 4.2.1.2 below) and guided by the principles of risk  management  (Bearman, 1995).  Other strategies he identified were that others  should  do  the  selecting  (replacing  the  review  of  records  by  archivists  with  high level negotiated agreements of required outcomes), that selection should  be  carried  out  automatically  based  on  metadata,  and  that  public  interests  should inform appraisal decisions (Bearman, 1995, pp399‐400).   

4.2.1 Current Appraisal Strategies The  International  Council  on  Archives  (ICA),  the  professional  organization  representing  the  global  archival  community,  has  identified  five  appraisal  strategies or approaches, which can be used in combination with each other if  required.    These  strategies  are  inventory,  functional,  theme  or  territory,  risk  assessment and business systems design (Committee on Appraisal, 2003).     

20

DELOS Deliverable 6.10.1

4.2.1.1 Inventory This  is  a  bottom‐up,  records‐centric  approach.    It  involves  identifying  and  listing  all  records  created  by  an  organization.    The  listing  will  include  information relating to the creation of the records (who and why), date ranges,  volumes, uses and content.  Retention periods can then be assigned, and those  records worthy of long‐term or permanent retention identified.    Problems  with  this  approach  are  that  it  is  extremely  labour  intensive  and  of  course the resulting schedule or inventory has to be kept up‐to‐date to reflect  changes  in  recordkeeping  practices.    It  is  still  very  widely  applied,  however,  particularly in the United States and it has been adapted for use in the digital  environment.    For  instance,  the  United  States  Geological  Survey  (USGS)  has  developed  an  online  survey  form  to  collect  information  about  individual  record series or data sets5.  Similarly, the United Kingdom’s National Archives  provides guidelines for inventory of digital records (EROS, 1999), despite the  fact  that  their  new  appraisal  policy  (The  National  Archives,  2004)  takes  a  macro approach (see below).  

4.2.1.2 Functional/Macro Approach As discussed above, a functional approach to appraisal was first advocated by  David  Bearman  (Bearman,  1989).    It  is  a  top‐down  approach  and  involves  analysis  of  the  functions  of  an  organisation  or  society  to  determine  which  functions  are  likely  to  create  and  maintain  records  of  long‐term  value.    The  terms  functional  and  macro  appraisal  are  sometimes  used  interchangeably,  but  there  are  key  distinctions.    Functional  appraisal  is  commonly  used  to  specify  analysis  that  takes  place  within  the  organisation.    Macro  appraisal  however  as  the  name  suggests  involves  a  step  back  and  considers  functions  within a broader context.  It has been defined as  “…a planned, strategic, holistic, systematic and comparative approach to researching and identifying society’s need for records.” (Cunningham & Oswald, 2005)

5 The USGS Records Appraisal Tool can be seen at http://eros.usgs.gov/government/RAT/tool.php

21

DELOS Deliverable 6.10.1

This  approach  has  been  prompted  by  the  need  to  develop  appraisal  methodology  that  could  be  applied  to  the  increasing  volumes  of  records  created  in  society,  and  different  variations  have  been  developed  in  different  national  jurisdictions  (see,  for  example,  the  detailed  accounts  of  macroappraisal  practices  in  Australia  (Cunningham  &  Oswald,  2005),  the  Netherlands (Jonker, 2005) and New Zealand (Roberts, 2005)).    A criticism of the functional/macro approach is that records having secondary  informational  value  beyond  the  creating  and  maintaining  organisation  may  not be identified as being appropriate for long‐term preservation (Committee  on  Appraisal,  2003)  –  see  4.2.1.3  Documentation  Strategy  below  for  a  hybrid  solution to this problem. 

4.2.1.3 Documentation of a theme or a territory The  third  approach  identified  in  the  ICA  manual  focuses  on  a  subject  or  geographic  area.    The  strategy  involves  the  identification  of  all  owners  of  relevant recordkeeping systems (for example, public and private archives) and  potential users of the records.  Assessment of this approach  is that it is ‘ slow  and  resource  intensive’  (Committee  on  Appraisal,  2003).    Documentation  strategy has been explored in depth in North America as an alternative to the  Schellenbergian  value  system.  Terry  Cook  provides  a  concise  overview  of  dimensions of this discussion, and pros and cons of the approach by way of a  critique of Helen Samuels’ keynote address on this topic to the Association of  Canadian  Archivists  conference  (Cook,  1992).      Documentation  strategy  is  currently  practiced  in  Germany.    There,  a  cross‐archive  (for  example,  public  and  private)  approach,  also  referred  to  as  ‘vertical  and  horizontal’  takes  into  account  functional  perspectives  as  well  as  ‘textual  and  content‐oriented  aspects’ (Kretschmar, 2005).   

4.2.1.4 Risk Assessment This  approach  uses  the  identification  of  risks  to  set  priorities  and  make  decisions.    The  ICA  manual  highlights  relevant  risks  at  organisational  level,  i.e.  the  risks  to  an  organisation  if  records  of  a  particular  function  are  not  appraised (Committee on Appraisal, 2003).    Although not specifically archival, the work of the preservation community in  assessing  the  degree  of  preservation  risk  associated  with  digital  objects  is  of  particular interest when considering the automation of appraisal.  The Digital  Asset Assessment Tool  (DAAT) Project set out to produce a tool that could be  used  by  collecting  institutions  such  as  libraries  and  archives  to  assess  which 

22

DELOS Deliverable 6.10.1

digital  assets  were  at  greatest  risk  and  to  take  action  accordingly  (Pinsent  &  Ashley,  2006).      The  resulting  tool  is  dependent  on  data  collected  manually.   However,  an  automated  workflow  that  assesses  and  reports  on  preservation  risk  has  been  developed  and  tested  on  a  digital  archive  (Anderson,  Frost,  Hoebelheinrich, & Johnson, 2005). 

4.2.1.5 Business Systems Design The  final  strategy  identified  by  the  ICA  is  a  holistic  approach  to  records  management  in  accordance  with  the  international  standard  for  records  management  ISO154389.    This  approach  involves  incorporating  appraisal  decisions  in  the  design  of  business  systems.    If  this  is  done,  then  as  a  consequence  it  is  possible  to  envisage  disposition  being  carried  out  automatically.    This  is  the  theoretical  basis  underlying  the  development  of  DIRKS  (State Records Authority of  New South  Wales, 2007). (See  also  4.2.2.1  Appraisal as Part of the Business Process.) 

4.2.2 The Process Definitions  of  appraisal  in  the  archival  context  emphasise  the  use  of  clearly  specified  criteria  and  requirements  in  determining  the  value  of  records  in  order to ensure that the process is as objective as possible:   “Records appraisal is the process of determining the archival value and ultimate disposition of records. Appraisal decisions are based on a number of criteria including the historical, legal, administrative, and financial value of the records”6

and according to the Paradigm project7 (see also section 5.2.1 below)   “archival theory extends this definition to include the policies and procedures used by an archivist to identify, evaluate, and authenticate records, in all formats, which have enduring value to records creators, institutions, researchers, and society. Appraisal in a paper-based archive traditionally takes place once a record is no longer current, but determination of how long a record should be retained can take place before creation for some kinds of records of the records”

6 7

http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation.aspx http://www.paradigm.ac.uk/workbook/appraisal/index.html

23

DELOS Deliverable 6.10.1

The process of determining significance consists of these distinct stages:  

appraisal, which includes identifying the selection (or retention) rules  and subsequently the evaluation of the value of information or digital  objects.    Evaluation  can  be  based  on  different  methods,  such  as  analyzing the context in  which the  digital objects will be  created (e.g.  the business activities) or analyzing the content.   



selection, which includes the attribution of values to the digital objects;  could be part of the first stage. 



disposal, which is the actual application of the appraisal and selection  decisions, that is, keeping or destroying the information. 



Re‐appraisal,  which  may  occur  on  ingest,  as  part  of  preservation  planning, or on access. 

As previously indicated, in the archival sector appraisal is part of a selection  process  made  of  specific  activities  (selection,  appraisal,  disposition  as  destruction or preservation). The appraisal should be conducted on the basis of  well defined principles and criteria as further developed with reference to the  non‐electronic  environment.  Specifically,  the  appraisal  should  be  carried  out  when the digital resources are still in their active phase, as near to the time of  creation as possible.   The management of the appraisal function implies the use and the maintenance  of  a  huge  amount  of  information  which  include  the  decisions  taken  in  the  past  (with reference to the various responsibilities involved and the strategies and  procedures developed), the contextual information related to the records (the  juridical, documentary, technological contexts), the values established  for the  records and for their preservation feasibility (in terms of cost and in terms of  preserving the authenticity of the records).   The  feasibility  of  the  records  preservation  is  strictly  based  on  the  capacity  of  preserving  the  essential  digital  components  of  the  records,  those  able  for  the  present and for the future to confer their identity and to ensure their integrity.  This  information  (which  includes  content  and  data/metadata  necessary  to  organise, structure or render the content of the records)  have to be structured  and  articulated  in  a  way  to  enable  the  decisions  related  to  the  present  and  future  capacity  of  preserving  the  digital  components  which  constitute  the  record  identity  and  to  ensure  its  integrity.  This  effort  includes  at  least  three  phases: 

24

DELOS Deliverable 6.10.1



determine which elements can validate authenticity  



identify  where  these  elements  are  manifested  (in  which  digital  components)  and  what  is  the  technical  information  relevant  for  their  preservation 



reconcile  these  preservation  requirements  with  the  financial  and  technical capacities of the repository 

As  clearly  testified  by  the  flexibility  required  in  the  preservation  process,  appraisal  is  a  relevant  active  component  of  this  process  and  it  includes  a  higher level of responsibility than in the past. The quality of the preservation  is strictly connected with the quality of an early appraisal. The more complex  and rich the digital data to be preserved (as in the scientific world), the more  relevant  is  the  active  appraisal  here  described  which  includes  crucial  tools  whose automation8 will ensure the success of the preservation itself:  

criteria and policies able to orient a neutral approach, 



auditing and validating procedures 



contextual information automatically extracted and preserved. 

Appraisal  processes  and  the  resulting  decisions  need  to  be  transparent  and  accountable and could be made at the following points in time:  

before  recordkeeping  systems  are  designed  ‐  Whenever  systems  designers  know  the  requirements  for  creating,  maintaining  and  disposing  of  records over time, appraisal strategies can be built into recordkeeping  systems. 



before  records  creation  ‐Early  appraisal  informs  managers  of  the  risks  they  face  if  records  are  not  created  for  long  term  preservation.  It  also  prevents the accumulation of un‐appraised records. 

8

See http://eros.usgs.gov/government/RAT/tool.asp where the USGS Scientific Records Appraisal Tool is described (see CODATA-ERPANET workshop, The selection, appraisal and retention of digital scientific data…cit., p.13.

25

DELOS Deliverable 6.10.1



before  disposal.  ‐It  is  a  standard  practice  to  require  appraisals  to  be  conducted before it authorises classes of records for disposal. 



when  required  ‐  Some  records  may  be  subject  to  numerous  appraisal  processes over time. 

4.2.2.1 Appraisal as Part of the Business Process When  appraisal  is  carried  out  as  part  of  a  business  process  it  involves  the  following decisions:  

what records should be created to document a business activity 



how long those records should be retained.  

Appraisal  should  be  a  planned,  transparent  and  accountable  process  of  research,  analysis,  evaluation  and  consultation.  As  part  of  business  process  analysis appraisal is done to achieve some benefits such as:  

reduction  of  information  overload  through  identification,  segregation  and elimination of non‐critical records;  



protection  and  awareness  of  legal,  financial  and  community  interests,  rights, entitlements and obligations of organisations and individuals; 



preservation of corporate and cultural memory;  



reduced  storage  and  maintenance  costs  through  timely  disposal  of  records that are no longer required.  

26

DELOS Deliverable 6.10.1

5.0 Current Research Research  exploring  the  issues  relating  to  the  determination  of  significance  of  digital  objects  has  been  largely  carried  out  by  the  information  for  accountability  community,  so  attention  has  focused  on  appraisal  methodology.    In  the  paper  world  appraisal  and  selection  processes  are  conducted  manually  and  require  an  enormous  amount  of  effort.  In  a  digital  environment new approaches are necessary. This section summarises findings  from InterPARES and Paradigm, as well as developments involving metadata.  InterPARES  and  Paradigm  have  been  singled  out  for  particular  attention  because  findings  from  both  these  projects  are  particularly  important.   InterPARES  is  significant  because  this  was  truly  a  global  endeavour,  and  therefore  included  a  multiplicity  of  theoretical  views.    Paradigm  deserves  close  scrutiny  as  its  focus  is  on  the  difficulties  encountered  with  respect  to  personal  papers  –  i.e.,  documents  created  and  stored  in  uncontrolled  environments.  The section concludes by considering the Archives in a Digital  World conference held in Rome in 2007. 

5.1 InterPARES The  concept  of  appraisal  in  the  digital  environment  was  a  major  focus  of  research for InterPARES (The International Research on Permanent Authentic  Records  in  Electronic  Systems).    One  output  was  a  review  of  the  English‐ language literature relating to the appraisal of electronic records (InterPARES,  2000a)  This  literature  review  formed  the  basis  for  subsequent  work,  in  particular  the  development  of  a  model  to  identify  the  activities  involved  in  selection  and  acquisition  and  identification  of  issues  specific  to  the  appraisal  of electronic records.  Of particular interest was the conclusion that only two  specific criteria could be established to cover all possible appraisal situations: 

“…first, the requirements for assessing authenticity as part of assessing the value of electronic records; and, second, the concepts that have been developed for determining the record elements to be preserved and for identifying the digital components to be preserved as part of determining the feasibility of preservation.” (InterPARES, 2000b, p.97) The project also concluded that an appraisal function valuable and necessary  for the digital environment should be seen as a new activity strictly related to  the early control of the creation of digital resources and as integrated part of  the complex preservation processes necessary in the new information systems.  This  activity,  as  part  of  the  preservation  function,  requires  a  clear  and  well 

27

DELOS Deliverable 6.10.1

developed definition of records/resources characteristics (such as authenticity  and usability) and it includes also:  

analysis of the feasibility of preservation, 



regular  monitoring  (as  an  activity  which  includes  continuing  internal  changes in the decision process related to the appraisal and transfer), 



a continuing self‐documented approach with specific reference also to  the technological context (i.e. definition of the formats for transfer and  its requirements). 

In this perspective, appraisal is not only unavoidable, but it is an inextricable  part  of  effort  dedicated  to  preserve  digital  memories.    (See  Appendix  1  for  a  summary of findings from InterPARES).  It is now imperative to validate the  findings of InterPARES by experimentation. 

5.2 Paradigm The  Paradigm  Project  (dedicated  to  the  preservation  of  digital  archives  of  individuals  and  small  organization)  explores  whether  appraisal  of  digital  records is a worthwhile exercise.9:  

“Trends in the digital world seem to reject the practice of actively organising our digital collections by choosing what to keep and what to discard. Declining storage costs and improved discovery seem to have rendered appraisal and disposal needless” Specifically  ‐  as  the  authors  of  this  useful  report  stress  –  when  the  resources  preserved  have  a  poor  structure,  lack  of  a  functional  organization  and  the  detailed  analysis  and  description  at  the  time  of  transfer/acquisition  would 

9 See http://www.paradigm.ac.uk/workbook/appraisal/index.html. The question has been at centre of the discussion within the DELOS workgroup on digital libraries preservation. An active discussion took place and different views were expressed in preparing this study which tries to take into account the different perspectives expressed in this occasion. For an historical perspective of the digital appraisal in US, see Linda J. Henry, An historical perspective on appraisal of electronic records, 1968-1998. SAA Annual meeting, Session 47, September 2000 (non published paper) and Ead.,

Schellenberg in Cyberspace, in “The American Archivist”, 61 (Fall 1998), p. 317.

28

DELOS Deliverable 6.10.1

require  an  enormous  amount  of  effort  and  time.    The  conclusion  is  an  acceptance of the possibility that  

“in future we might only appraise and catalogue very important collections, similar to the way in which only very high value manuscripts are catalogued to piece level”, and  that  the  appraisal  would  affect  only  the  recordkeeping  systems  well  organised through a functional classification based on business processes.   According to this project (and reiterated in the summing up of the ERPANET  and  CODATA  workshop  in  Lisbon10)  the  basic  reason  for  appraisal  is  still  a  pragmatic  question  of  quantity  and  cost.  The  future  need  for  appraisal,  selection and destruction is foreseen as a consequence of unresolved financial  issues.    The  growth  of  digital  content  (per  byte  or  per  object)  will  not  be  in  step  with  the  declining  costs  of  devices  and  of  back  up  routines  and  other  system administration tasks needed to create a huge amount of  “preservation  metadata”  required to sustain the long‐term  life  of the resources,  specifically  in  the  case  of  complex  and  compound  objects  (website,  emails  systems),  of  undocumented,  obscure  or  no  longer  supported  formats  at  the  time  of  acquisition11. 

10

CODATA-ERPANET workshop, The selection, appraisal and retention of digital scientific data, Lisbon 15-17 December 2003. Final report, cit.

11

In the Paradigm project report this aspect is well defined with a list of object whose specific technical characteristic could affect seriously the cost of the preservation and requires an appraisal decision: Objects that might be more expensive to retain include:  “Complex or compound objects, such as websites or email archives  Objects in undocumented formats  Objects in obscure formats  Objects in formats unsupported by a community or vendor at the time of acquisition  Objects in formats for which no migration/emulation tools exist  Objects in formats unknown/unsupported by preservation registries and tools

29

DELOS Deliverable 6.10.1

5.3 Metadata

The  existence  of  metadata,  data  about  data,  is  a  key  concern  in  any  consideration of determining significance or value.  The more information we  have  about  an  object,  the  greater  the  potential  is  for  improved  decision‐ making.      Much  research  into  metadata  has  originated  in  the  archival  or  ‘information for accountability’ community, for instance Monash University’s  Clever  Recordkeeping  Metadata  project,  which  explored  the  automatic  creation  of  metadata,  and  its  reuse  across  systems  (Evans,  McKemmish,  &  Bhoday, 2005).  However, the value of metadata remains unproven (PREMIS  Working Group, 2004).  Appraisal,  determining  the  value  of  a  particular  digital  object  or  class  of  objects, requires metadata that describes  

“the context, content and structure of records and their management through time” (International Organization for Standardization, 2001). Other  information  communities  also  recognize  the  importance  of  having  adequate metadata, and significant steps are being taken by researchers at the  Digital Curation Centre (DCC).   In recognition of the fact that the creation of  digital objects is increasing at an exponential rate and the manual collection of  metadata  is  unsustainable,  the  DCC  research  is  focusing  on  the  automatic  extraction  of  metadata  (see,  for  example,  (Kim  &  Ross,  2006)).    (See  also  section 8.1 Using Metadata and Genres.)   As  discussed  above  in  section  2.3,  the  existence  of  metadata  reflecting  relationships and levels of aggregation is critical in any consideration of item‐

    

Objects for which the repository has no preservation strategy Objects which are encrypted, password protected or subject to digital rights mechanisms Objects on old or obsolete media Objects without metadata Objects which require software licences for access or manipulation”.

30

DELOS Deliverable 6.10.1

level  appraisal  for  records.    David  Bearman  recognized  the  potential  for  the  use of metadata to show relationships in 1985:  "…our current description practices focus on capturing content of records, and on describing existing arrangement and highly general context, when what we need is highly specific metadata about transaction contexts which would provide us with what we need to know about content and structure (including, but not limited to, arrangement). …An archival strategy for documentation is to automatically capture metadata required to ensure evidence, to manage programs and to support use after analysis of functional requirements for recordkeeping, business process, and user needs." (pp 393-394)

R5.3.1  Metadata  showing  relationships  and  levels  of  aggregation  of  records  should be used to automatically generate description for archival repositories  

31

DELOS Deliverable 6.10.1

5.3 Appraisal in a Digital World This conference, held in Rome in 2007, was a ground breaking endeavour.  It  brought together academics and practitioners and most importantly was inter‐ disciplinary,  thus  allowing  the  exchange  of  ideas  and  practice  across  information domains.  International speakers included Luciana Duranti, Terry  Eastwood,  Ken  Thibodeau  and  Jason  Baron.    Papers  are  yet  to  be  published,  but  discussions  at  the  conference  and  subsequently  have  informed  the  development of recommendations in this report.  

6.0 Issues Given  increasing  capabilities  for  search  and  retrieval,  improving  quality  and  decreasing costs of storage devices and the greater complexity and consequent  higher  costs  involved  in  evaluating  a  huge  amount  of  digital  resources,  a  fundamental issue debated is whether or not it is necessary to expend effort in  determining the value or significance of information.  A further concern is that  the  very  act  of  nominating  some  information  as  worthy  of  preservation  and  not  others  means  that  the  resulting  bodies  of  knowledge  will  not  be  truly  representative of all voices within a community.   

6.1 Is it necessary to determine the significance of digital information? Organisations  repeatedly  take  decisions  on  what  information  (or  digital  objects)  should  be  preserved  and  for  how  long.  Criteria  governing  what  to  keep  and  what  to  discard  are  usually  based  upon  such  factors  as  organisational needs/objectives, juridical requirements, and information value  that are relevant to the business context of the organization (whether a library,  a  public  sector  institution  or  a  commercial  company).  This  is  happening  in  government  organizations,  business  companies  and  memory  organisations  including, recently, digital libraries and digital archives.  The reason for this is  that preserving too much digital material is not cost‐effective.    We  could  say,  as  a  consequence  of  this  approach,  that  the  main  reasons  in  favour of appraisal are based on the lack of a detailed control in the creation  phase of digital objects, specifically on the technological contexts. The paradox  could  be  that  a  well  organised  recordkeeping  system,  with  easy  and  well  detailed  retrieval  tools,  even  if  built  at  the  creation  stage  with  an  integrated  preservation  plan  would  not  require  to  be  evaluated  and  disposed  for  preservation.  

32

DELOS Deliverable 6.10.1

A  similar  conclusion  can  be  recognised  in  the  ICA  Guidelines  on  Appraisal  .   Emphasis is on the difficulties of conducting appraisal on the digital heritage  of  the  small  organizations,  often  characterized  by  the  absence  of  systematic  filing and naming rules (p. 22):   “Where electronic documents exist with little organisation or structure linking them together in meaningful collections or groupings, appraisal will be difficult. This will be the case where, for instance:  electronic documents are held on a shared local network drive with no systematic organisation or structure in the filing or folder hierarchy  files and folders are created directly by end users with no established naming conventions, resulting in names that are ambiguous, mysterious or misleading  electronic documents are held in a document management system that relies upon search technology alone to bring together sets of related records.”

The  lack  of  these  relevant  attributes  are  the  result  of  weak  recordkeeping  systems with a consequent   

lack  of  consistency  in  the  allocation  of  individual  records  and  in  the  development of  series,  



loss of information related to the original context, and 



loss of the original organization and its meaning. 

The  only  possibility  for  appraisal  (but  also  for  preservation)  in  this  case  is  a  record‐by‐record  analysis,  “time‐consuming  and  resource‐intensive”  and  of  course “unlikely to be cost effective” (p. 22).   Linda  Henry  from  the  basis  of  the  long  experience  of  the  US    National  Archives  determined  that  the  characteristics  to  evaluate  for  appraisal  (in  absence  of  best  practice  in  the  resources  creation)  include  the  resources’  manipulability,  volume,  linkage  duplication,  micro‐level,  readability,  hardware and software documentation, format independency. In any case the  content  analysis  is  the  most  relevant  aspect  for  taking  the  final  decision  and 

33

DELOS Deliverable 6.10.1

the  lack  of  organization  and  indexing  is,  at  the  end,  the  main  component  to  evaluate12.  A  different  answer  to  the  basic  question  can  be  found  if  the  appraisal  is  not  evaluated  as a pragmatic solution for space and redundancy but as part  of a  digital preservation process.   To  develop  a  more  comprehensive  answer  to  the  question,  it  is  important  to  ensure  a  consistent  theoretical  approach  based  on  the  idea  that,  as  part  of  a  business  process,  appraisal  is  done  –  as  already  stressed  in  this  report  ‐  to  achieve benefits such as:  

reduction  of  information  overload  through  identification,  segregation  and elimination of non‐critical resources;  



protection  and  awareness  of  legal,  financial  and  community  interests,  rights, entitlements and obligations of organisations and individuals; 



preservation of corporate and cultural memory;  



reduction of storage and maintenance costs through timely disposal of  materials that are no longer required by the creator.  

6.2 Is the Process of Determining Significance Fundamentally Flawed? A recent discussion paper which has provoked debate claims that appraisal as  currently practiced is faulty, as it is based on the assumption that it is possible  to select  the information that  will  have greatest value to  a  community in the  future  (Neumayer  &  Rauber,  2007).      The  case  is  expressed  that,  by  its  very  nature, appraisal is an exercise in censorship, and has been used by repressive  regimes  to  control  access  to  knowledge.    Although  a  seasoned  archivist  or  records  manager  can  (and  some  have)  identified  inherent  flaws  in  their  arguments  and  proposed  solution  (random  sampling  of  all  digital  information,  regardless  of  origins  or  purpose),  the  underlying  concern  needs 

12 See Linda Henry, An historical perspective on appraisal of electronic records, 1968-1998. SAA Annual meeting, Session 47, September 2000, cit..

34

DELOS Deliverable 6.10.1

to  be  explored  further,  even  if  this  is  not  a  question  of  the  very  nature  of  the  appraisal but it refers to the social organisation of the memory preservation at  each  national  level  and  with  reference  to  the  specific    mandate  of  the  dedicated institutions.  Of course, minority or disadvantaged groups traditionally have not been well  represented  in  the  historical  record  as  a  consequence  of  the  nature  of  the  records  created  by  public  institutions  and  by  the  weakness  of  protection  for  private  records..    Facilitating  involvement  of  all  societal  stakeholders  in  determining the significance of information is a course of action which would  assist in addressing this issue.  In the archival domain, there has been a call for  participatory appraisal to ensure that the needs of marginalised communities  are  met  (Shilton  &  Srinivasan,  2007).    In  the  case  study  described  in  their  paper,  the  methodology  involves  the  establishment  and  on‐going  development of community ontologies.  For  digital records and other content,  the  fact  that  access  can  be  made  widely  available  now  means  that  views  outside  traditional  libraries  or  archives  can  be  incorporated  into  the  decision  making process.  (See recommendation R8.0.1.) 

35

DELOS Deliverable 6.10.1

7.0 Principles, Requirements and Criteria for the Appraisal of Digital Objects In  the  recordkeeping  context,  if  a  robust  system  is  in  place,  findings  from  a  functional appraisal strategy can be used to provide the framework (a disposal  schedule) for technological controls to distinguish those records which need to  be  retained  for  long  periods  of  time.    The  implementation  of  a  disposal  schedule  however  does  require  manual  intervention  especially  when  a  functional  organization  of  the  records  system  is  not  provided  at  the  creation  phase.  Consequently the emphasis in this part of the report is on determining  significance at item level as it is here that real possibilities for automation can  be seen.    The  final  report  of  the  Paradigm  Project  suggests  that  top  down  appraisal  (macro or functional) will need to be carried out initially, before a bottom up  approach  focusing  on  individual  items  can  proceed.    It  is  also  possible  to  envisage an initial triage stage if the data to be appraised is sufficiently messy.   For instance, a poorly organised shared drive of legacy records could initially  be scanned at item level to identify any digital material that has the potential  for long term preservation, before any other analysis work takes place.  It has  also  been  suggested  that  genre  could  be  used  as  a  basis  for  appraisal  in  the  absence  of  any,  or  limited,  metadata  (Underwood,  Isbell,  &  Underwood,  2007).  Findings  from  InterPARES  Appraisal  Task  Force  indicate  very  clearly  that  only  two  criteria  can  be  established  to  cover  all  situations:    firstly  the  requirements  for  assessing  authenticity  and  secondly  determination  of  the  feasibility  of  preservation  (InterPARES,  2000b).    Both  of  these  criteria  can  be  focused on item level assessment and can therefore be usefully considered to  indicate  significant  potential  areas  for  automation.    To  supplement  this,  a  representative  collection  of  published  policies,  guidelines  and  case  studies  were  consulted  in  order  to  try  to  identify  a  list  of  specific  criteria  that  have  been  used  from  a  range  of  different  settings.    Subsequent  analysis  identified  that criteria identified could be clustered in the following categories:  

Content: comprises appraisal criteria which involve assessment of the  informational content of the item, series or collection.   



Contextual: comprises criteria relating to an assessment of the context  in which the item, series or collection was created. 

36

DELOS Deliverable 6.10.1



Evidence:  comprises  criteria  which  provide  evidence  of  activities  and/or functions 

  Figure 4: Categories used for the grouping of appraisal criteria



Operational:  comprises  appraisal  criteria  which  contribute  to  assessment of implications of long term preservation for the collecting  agency  



Societal:  comprises  appraisal  criteria  which  relate  to  the  external  societal/national  information  management  infrastructure,  including  legislative and ethical concerns 



Technical:  comprises  appraisal  criteria  which  relate  to  technical  characteristics or features of the record or data 

 

37

DELOS Deliverable 6.10.1

    Figure 5: Appraisal criteria that comprise each category

 

38

DELOS Deliverable 6.10.1

The results of this analysis are shown in Table 1.  All criteria either can be or  have  been  applied  at  item  level,  but  it  must  be  emphasized  that  most  are  interrelated  and  will be part  of a contextual  analysis.  In  this sense  items are  rarely  isolated  even  in  a  completely  uncontrolled  environment.  The  source  documents used to compile this table are a selection of publications in English  comprising  policy  and  guidelines  and  reports  of  practice  in  different  information  environments  (see  Appendix  2  for  the  scope  of  each  document  used).  This  combination  of  literature  types  was  used  to  try  to  assemble  a  representative,  rather  than  comprehensive,  list  of  criteria  from  a  range  of  different settings and to provide a basis for starting to identify categories.   In  fact it proved quite challenging to find sources that provided sufficient detail  (as  opposed  to  an  overview  of  principles)  about  the  selection  process  undertaken,  but  there  are  doubtless  others  and  this  may  be  an  area  that  receives increasing attention in the future13.     

13

For instance, subsequent correspondence with John Faundeen of USGS identified the following appraisal factors: The data is raw or minimally processed; Do the files contain nonarchival records? Data has successfully undergone the peer review process; Compression used? Is the data classified (by governments such as 'secret.')? Reputable author (creator)? Usefulness of [scientific] parameters outside of the project that created the data? Are the records in a discernible order? Language used;. If analog and digital exist, which is better or do both have to kept? What are the accession or disposition costs? What was the data collection method? Space available to accommodate the collection? Further analysis is required to determine which should be incorporated into existing factors, and which merit articulation as additional criteria.

39

DELOS Deliverable 6.10.1

Figure 6 Application areas for appraisal criteria documented in Table 1

  R7.0.1  Further analysis should take place of policies, guidelines and reports of  practice to determine a comprehensive database of criteria used in appraisal of  digital objects, linked not only to domain of practice but also to sector of  activity and country.

40

DELOS Deliverable 6.10.1 Table 1: Specific criteria considered in the appraisal of digital information objects

Category

Appraisal Criteria

Content

Comprehensiveness – e.g. whether information covers a complete population or not Coverage - e.g. the spatial area covered Growth – will information continue to grow or is object complete? Relationships – Are there existing relationships to information already held in the repository. This will include consideration of any dependencies/interdependencies. For example, does one item constitute a finding aid to another item? Reliability – whether the information is likely to be accurate or authoritative. ‘Contents can be trusted as a full and accurate representation of the transactions, activities or facts to which they attest’ (ISO15489)

Content Content Content

Content

Content Content Content

Source document domain (policies, guidelines or reports of application) Geospatial data13 Geospatial data13 Geospatial data 9, 13 Social science data3; records4, geospatial data13, websites6 Geospatial data13 publications10

Social science data3; records4, publications10, websites6, geospatial data13 Time - Period of time covered, e.g. creation date and end Social science data3; date. records4; geospatial data13 Uniqueness – whether the resource represents unique Social science data3; information. This includes consideration of whether records4, 5; publications10, Significance – importance of information content for current and future research needs

41

DELOS Deliverable 6.10.1

Content Contextual Contextual Contextual Contextual Evidence Evidence Evidence

Evidence Operational

duplicates exist or information is also available in other media Usability – Accessibility of content, e.g. are appropriate manuals available to decipher information Documentation - Accompanying technical documentation explains how data collected etc Provenance Appropriateness of provenance to collection (e.g. ‘within state’, or existence of relationship with donor) Significance of source/context of data/records Usage - Frequency of use Accountability - Provide defence of agency against charges of fraud/misrepresentation Artefact Provide evidence of way in which organization functioned (e.g. how technology incorporated into business, how web used as communication tool) Authenticity – Whether the object is what it purports to be, to have been created or sent by the person purported to have created or sent it; to have been created or sent at the time purported Precedence - Documentation of decisions that set precedent Costs involved in long-term maintenance

geospatial data13 , websites6 Geospatial data13; social science data3, records4; websites12 Social science data3; geospatial data13 Records; geospatial data9; websites6, 12, publications10 Social science data3; records4 Geospatial data13 Geospatial data13 Websites6 Records1,2, geospatial data13

Records4 Social science data3; records4 42

DELOS Deliverable 6.10.1

Operational Operational Operational Operational Societal Societal Societal Societal Technical Technical Technical Technical

Collection Fit with existing collection policy Mission Fit with organizational mission Potential - e.g. Repurposing - possibilities for use in ‘Value added products’ Replaceability – e.g. can information be replicated, cost of replicating information; value of information vs costs of preservation Ethics – are there ethical implications that will influence decision making, for example any reasons why records or information should not be retained? Intrinsic value – eg aesthetic or artistic quality, experimental use of new technology Legal considerations – eg privacy, data protection legislation prohibiting retention;

Geospatial data13 Geospatial data13 Geospatial data9

Representativeness – either of sectors of community, or statistically Functionality – Retention of behaviour, e.g. has look and feel been retained? Integrity of records – should be complete and unaltered, and have been protected against unauthorized modification Rights issues – eg copyright Risk - Degree of risk to content

Records4; geospatial data13

Geospatial data13

Records4; geospatial data13 publications10 Geospatial data13

Websites12 Records4,8, geospatial data13 Geospatial data13 Geospatial data 9, 13; records Social science data3 43

DELOS Deliverable 6.10.1

Technical Technical

Size of object/volume of records Usability of records - should be able to be located, retrieved, presented and interpreted

Social science data3; records4 Geospatial data13; social science data3 records4; websites12

1 (InterPARES, 2000b) 2 (Eastwood, 2003) 3 (Data Preservation Alliance for the Social Sciences (DataPASS)) 4 (National Archives and Records Administration, 2007)) 5 (Thomas, 2007) 6 (Grotke & Ruth, 2007) 8 (InterPARES, 2002) 9 (Morris, 2006) 10 (National Library of Australia, 2005) 11 (Murray & Phillips, 2007) 12 (Lala & Joe, 2006) 13 (United States Geological Survey, 2007)

44

DELOS Deliverable 6.10.1

8.0 Automation Identification  of  criteria  used  to  make  appraisal  decisions  about  digital  objects,  and  linking those criteria with specific metadata, implies that it may be possible to build  an appraisal engine to automate this process.  The Boles and Young attempt to codify  appraisal  (see  section  4.2)  has  been  criticised  on  the  grounds  that  implementation  would be too difficult (Bearman, 1995, p. 392), rather than because there are inherent  problems  in  the  definition  of  the  appraisal  elements.    Anne  Gilliland‐Swetland  has  commented that there is insufficient agreement in the archiving community to allow  complete  codification  of  appraisal  (Gilliland‐Swetland,  1995).    This  is  echoed  in  the  InterPares  recommendations  that  only  distinguished  two  ‘universal’  factors  –  authenticity  and  feasibility  of  preservation  (InterPARES,  2000b).    Terry  Cook  has  provided trenchant criticism of the taxonomic approach to appraisal on the grounds  that there are simply too many records to appraise (Cook, 1992).  However,  if  a  much  more  flexible  approach  to  codification  is  envisaged,  based  on  genre, configurable to different institutional and cultural contexts, and applied within  an overarching, top‐down appraisal framework, a way forward can begin to be seen with  respect  to  the  information  for  accountability  community.    Automating  implementation of this process will facilitate its application, and a solution starts to  emerge  to  the  impossibility  of  reviewing  excessive  and  ever‐increasing  numbers  of  records on a record by record basis.   In  order  to  enable  appraisal/selection  and  disposal  the  following  activities  are  necessary:  

extract  information  about  the  digital  objects  (either  about  content  or  context  or both) specifically the extraction tools provide excellent mechanism/method  for extracting information  about the  content, the  context of  origin as  well  as  the technical nature of a digital object or any aggregation of object; by using  knowledge representation and processing techniques it would be possible to  automate the appraisal decision making processes; 



analyse this information based upon appraisal criteria14; 



attribute  appropriate  appraisal/selection/disposal  metadata  to  the  digital  object15;  

45

DELOS Deliverable 6.10.1

According to the previous analysis some indicators that will impact on appraisal in  the digital environment can be identified, i.e. impact of the quality of recordkeeping  practice, and impact of timing of appraisal.  Specific factors to be noted are:  

in  a  digital  environment  the  need  to  create  and  maintain  a  huge  amount  of  metadata for the description and preservation of digital objects at the point of  creation  of  the  resources  to  ensure  a  correct  acquisition  from  the  digital  repository  implies  new  procedures  and  a  change  of  the  chain  of  the  responsibilities 



it  may  be  necessary  to  develop  a  re‐appraisal  strategy  within  the  repository  (both  for  ensuring  a  first  rudimentary  process,  and  for  refining  the  process  when  the  archival  analysis  will  provide  the  required  information  for  a  detailed evaluation 



impact of appraisal methods: for instance in case of a snapshots approach (for  the websites) the appraisal policy should be in place at the same time of the  preservation strategy 



financial  and  staffing  constraints:  the  budget  constraints  governing  digital  repositories  should  be  taken  into  account  with  more  attention  than  in  the  paper  world.  The  lack  of  funding  for  cataloguing  and  descriptive  activity  should  imply  that  appraisal  and  disposal  should  be  in  place  as  soon  as  possible  and  before  the  creation  of  a  submission  package  to  the  repository  (i.e.  before  the  preparation  of    Archival  Information  Packages  (AIPs)  as  expressed by the OAIS model) 



the requirements for a higher degree of documentation and information at a  early stage and in the course of the management of the resources. 

All the metadata collected for conducting the electronic records appraisal are crucial  documentation  and  information  for  ensuring  the  adequacy  of  the  preservation 

15

This activity could be considered part of the definition of the functionalities of a digital library system as defined in the Delos report prepared by Volker Herrmann and Manfred Thaller, Integrating preservation aspects into the design process of Digital Libraries, available at http://www.dpc.delos.info/private/output/DELOS_WP6_d651_finalv3_5__cologne.p df. 46

DELOS Deliverable 6.10.1

function.  They  include  all  the  main  information  required  for  carrying  out  the  appraisal as previously described, such as:  

information  on  the  creation  context  of  the  records  (juridical‐administrative,  procedural, provenancial, documentary) 



information  on  the  technological  context  of  the  records,  relevant  both  for  assessing  the  authenticity of the records  and for  evaluating the feasibility of  the preservation processes, 



information  related  to  the  appraisal  decision  itself:  this  documentation  necessary to justify the decision is relevant not only for accountability of the  creator but also to support at any time the preservation process. 



information  related  to  the  continuous  monitoring  of  the  appraised  records,  with specific reference to the technological context and its evolution. 

This information has to be maintained in association with the records themselves and  in  large  part  (with  the  exception  of  the  documentation  related  to  the  technological  context  required  in  the  course  of  the  preservation  process)  exists  while  the  records  are  active  and  disappears  when  the  records  are  removed  from  the  active  recordkeeping system.   For  this  reason  this  information  should  be  packaged  with  the  records/documents  themselves  as  soon  as  possible  and  collected  automatically  in  the  creation  process  (for  instance  in  connection  with  the  classification  schemes)  and  transfer  to  the  preserver.  For  the  same  reason  it  is  more  and  more  necessary  –  as  clearly  stressed  by  all  the  projects – for the implementation of tools or even more meta‐tools able to   

translate the information and the documentation created in the course of the  digital resources production and management in the form of metadata, 



transfer these metadata in a representation and technical environment able to  be shared in various domains and activities by using and mapping metadata  registry systems 



ensure  the  correct  mapping  of  metadata  registries  for  supporting  the  interoperability,  specifically  among  various  levels  of  business  systems  and  different management phases of the digital resources. 

The  other  key  consideration  to  be  explored  in  any  investigation  of  the  benefits  of  automating  appraisal  is  the  establishment  of  channels  to  facilitate  input  into  decision‐making  by  other  relevant  communities.    Whereas  in  the  paper  world  this  would  have  been  extremely  difficult  to  accomplish,  in  the  digital  environment  the  reverse is true.  Not only can access to information under review be enabled beyond 

47

DELOS Deliverable 6.10.1

the  memory  institution,  but  also  current  technologies  allow  for  efficient  data  collection and analysis.  Methodologies used will have to be tailored to suit the needs  of specific communities and stakeholders.    R8.0.1  Strategies  making  use  of  web‐enabled  communication  channels  should  be  investigated  to  enable  input  from  all  community  stakeholders  in  determining  what  information is significant and worth preserving 

8.1 Using Metadata and Genres to Determine Significance Appraisal  criteria  identified  in  Table  1  were  analysed  and  where  possible  broken  down into specific questions about an item that could be answered at least partially  using  metadata16.    To  seasoned  archivists  these  questions  will  no  doubt  appear  too  trivial or simplistic. Our aim is, however, to investigate to what extent they could be  automated.  The  answers  to  these  questions  provide  essential  information  for  the  decision  making  tools  supporting  appraisal  in  the  digital  repository  workflow  and  thus  contribute  to  minimizing  the  human  intervention  required.  Therefore  the  questions shown in the following table should be considered as an initial trial to use  in a future mapping study of existing technologies for digital object processing which  could be used for automated decision making.  

16

The input from students of HATII’s 2007-8 Management, Creation, and Preservation of Digital Materials course into the identification of questions and metadata elements is gratefully acknowledged.

48

DELOS Deliverable 6.10.1 Table 2: Crosswalk showing relationship between appraisal criteria and questions that could be answered using metadata values Category Appraisal Criteria Specific questions arising from this, that could be answered using metadata Content Comprehensiveness Content Coverage What is the spatial area covered? Content Growth Is there an ongoing record of modifications? Content Relationships Are there any relationships to existing items? Have similar objects/records been ingested previously? Content Reliability Was it created by an authoritative person/unit? Content Significance What is it about? What genre is it? Content Time What timeframe does it cover? Content Uniqueness Is the item a duplicate? Is the information available in different media? Content Usability Is the language comprehensible? Contextual Documentation Contextual Provenance Is provenance appropriate? Contextual Significance What business function/organization was the item created for? Contextual Usage How often has it been accessed? Evidence Accountability Evidence Artefact Evidence Authenticity Is it identifiable? Evidence Precedence Operational Collection Operational Costs Are there special hardware requirements? Are there special software requirements? Are additional metadata elements required? Operational Mission Operational Potential Operational Replaceability Societal Ethics Societal Intrinsic Is the creator significant? Was it created at a significant time? Societal Legal Does content include personal information? Is creating agency subject to legislative requirements? Societal Representativeness Technical Functionality Is it static or dynamic? Is it simple or complex? How sticky is the metadata? Technical Integrity Has item been tampered with? Is it complete and uncorrupted? Are there security controls? Technical Rights Can it be accessed? Can it continue to be accessed? Are there restrictions on who can access? Technical Risk What format is it in? Technical Size How big is it? Technical Usability Can it be accessed?

49

DELOS Deliverable 6.10.1

It can be seen from this table that it does not appear possible at this stage to fulfill all  requirements  for  appraisal  at  item  level  by  using  metadata.    More  analysis  is  required  in  order  to  fully  investigate  this,  and  to  determine  whether  there  are  alternative approaches using automation that can be taken to  address the  gaps.   At  this  stage  a  way  forward  would  seem  to  be  to  envisage  multiple  models  for  the  automation of appraisal (see 8.2 below).  R8.1.1  Further  analysis  of  appraisal  criteria  is  required  in  order  to  formulate  appraisal rules. A wide range of input from practitioners and academics in different  domains  is  required,  so  consideration  should  be  given  to  a  Delphi  study  and/or  a  series of focus groups.  Returning  to  those  areas  where  it  appears  that  questions  to  be  answered  with  metadata can be formulated (see Table 2), consideration was given to how significant  the  answers to those questions would be, given that superficially they appear to be  very  simplistic.    One  approach  was  to  determine  whether  or  not  answers  would  contribute to one or other of the two InterPares ‘universal’ criteria:  the authenticity  of the object, and the feasibility of digital preservation for that object.  The results of  this last stage of analysis are shown in Table 3.  This  table  introduces  a  further  categorization  –  this  time  of  metadata  type.    The  categories applied are those defined in the DELOS Digital Library Reference Model  (Candela et al., 2007, p.78):  

Syntactic:  Metadata  that  provides  information  about  the  syntax  or  structure  of the information object.  For example, creation date, size, file format. 



Semantic:  Metadata  that  provides  information  about  the  content  of  the  information object.  For example, name of creating agency, business function,  keywords. 



Contextual:    Metadata  that  provides  information  not  related  to  either  the  semantics or the syntax of the information object.  For example, background  information  relating  to  the  establishment  of  the  creating  agency  of  an  information object, or digital rights restrictions. 

 

50

DELOS Deliverable 6.10.1 Table 3: Metadata elements that could be used to assist in appraisal decision making Category Appraisal Question Possible Metadata Category metadata element(s) Semantic Content What spatial area is covered? Descriptive – place names Content

Is there an ongoing record of modifications? Are there any relationships to existing items?

Audit log

Contextual

Matter, agency, etc.

Semantic

Content

Have similar objects/records been ingested before?

Agency, actors

Semantic

Content

Is the language comprehensible?

Character set

Semantic

Content

What is it about?

Semantic

Content

What genre is it?

Content

What timeframe does it cover?

Title, descriptors Document genre Dates

Content

Is the item a duplicate?

Checksum

Syntactic

Content

Syntactic Syntactic

Why is this important? Fit with acquisitions/collections policy May indicate item is incomplete Coherency of collection; finding aids to other records May influence determination of feasibility, build coherent body of information May be an indication of usability, and also fit with collection May indicate possibility of significant content May indicate possibility of significant content Extent of time may be an indication of value Impact on feasibility assessment

Comments

This will be a question of matching content of metadata fields. Matching metadata elements may answer this question. Specification of which elements will vary according to domain and jurisdiction.

Value will vary according to domain, but this may be a key consideration File synchronisation may also be required – see http://www.cis.upenn.edu/~bcpierce /unison/

51

DELOS Deliverable 6.10.1 Content Is information available in different media?

Identifier elements, e.g. author, ISBN Publisher/ creator or other indicator of place such as url Actors, agency

Semantic

Contextual

Is provenance appropriate?

Contextual

What business function/organization was the item created for?

Contextual

How often has it been accessed?

Audit log

Contextual

Is it identifiable?

Actors, dates, matter etc. etc

Syntactic & semantic

Are there special hardware requirements? Are there special software requirements? Are additional metadata elements required?

File format

Syntactic

File format

Syntactic

All

Semantic, syntactic, contextual

Societal

Is the creator significant?

Creator

Contextual

Societal

Was it created at a significant time

Dates

Contextual

Evidence

Operational Operational Operational

Semantic

Semantic

May indicate duplication, so will be influential in feasibility assessment Fit with acquisitions/ collections policy May provide an indicator of likely value especially if functional approach is used May be a weighting factor in cost benefit analysis Contributes to assessment of authenticity Influential in assessing feasibility of preservation Influential in assessing feasibility of preservation Influential in assessing feasibility of preservation May be an indicator of intrinsic value May be an indicator of intrinsic value

Appropriate elements will vary according to domain and type of information.

To be used with care – may not be significant at all in certain domains, e.g. recordkeeping The existence of certain metadata elements can be used to answer this question. The specification of which elements will vary according to domain and jurisdiction.

There are likely to be substantial cost implications if metadata has to be manually assigned. Particularly applicable to born-digital artworks For instance – an organisation’s first website; early examples of born-digital art

52

DELOS Deliverable 6.10.1 Societal Does content include personal information? Societal Is creating agency subject to legislative requirements?

Descriptive – personal names Agency

Semantic Semantic

May be legal barriers to retention or accessibility May be legal requirements to retain and make accessible May influence determination of feasibility of preservation

Technical

Is it static or dynamic?

File format

Syntactic

Technical

Is item simple or complex?

File format

Syntactic

Impact on feasibility assessment

Technical

How sticky is the metadata?

All

Syntactic

Technical

Has it been tampered with?

Syntactic

Technical

Is it complete and uncorrupted?

Audit log Electronic signature/ seal Fixity check

Impact on feasibility assessment Contribute to assessment of authenticity

Technical

Can it be accessed?

Contextual & syntactic

Technical

Can it continue to be accessed?

Rights, permissions, etc. Format type Rights

Technical

Are there restrictions on who

Rights

Contextual

Syntactic

Contextual

Contribute to assessment of authenticity and determine feasibility of preservation Will influence determination of feasibility of preservation

Format will provide an indicator that may partially answer this question, but further analysis may also be required in order to discover embedded formats, e.g. spreadsheet in word document. Format will provide an indicator that may partially answer this question, but further analysis may also be required in order to discover embedded formats

If restrictions apply, now or in the future, it may not be worth preserving

Impact on feasibility assessment Impact on feasibility

53

DELOS Deliverable 6.10.1 can access? Technical What format is it in?

Technical

How big is it?

Format type

Syntactic

File size

Syntactic

assessment May influence determination of feasibility, and prioritisation May influence determination of feasibility of preservation

54

DELOS Deliverable 6.10.1

The  applicability  and  relative  significance  of  the  factors  listed  in  Table  3  will  vary  according to domain and to the policy of the specific collecting body.  It is suggested  therefore  that  weightings  should  be  assigned  to  each  factor,  appropriate  to  the  organisation  concerned.        Genres  should  also  be  considered  as  another  factor  in  determining weightings.     

R8.1.1 A ranking system for appraisal factors must be developed.  The system should  be flexible enough to allow customization for different organizational settings and to  take into account the purpose for which the information is being managed    

8.1.1 Genres The concept of genre  is an important one for information communities, and is very  significant  for  digital  libraries  in  particular.  A  genre  can  be  broadly  defined  as  a  socially  recognised  communication  norm,  and  examples  of  genres  will  encompass  the  whole  gamut  of  communication  from  text  messages  to  scholarly  publications.   Possible  benefits  to  be  derived  from  application  of  the  genre  concept  to  appraisal  include  a  preliminary  categorisation  by  document  type  to  facilitate  metadata  extraction and/or assign weightings to various metadata elements.  Other benefits for  digital  libraries/archives  include  enriched  description  and  understanding  of  the  digital object itself and the context of its creation and use, and ultimately improved  access to information.    There is a lack of consensus in the literature relating to the definition of genre (Kim &  Ross,  2007).      Attempts  to  provide  a  universal  classification  however  are  likely  to  result  in  an  overly  simplistic  approach  to  genre  definition.    Awareness  of  the  different purposes for which information is being managed will enable a much richer  and more useful definition of genre. It is suggested, therefore, that there should be at  least two analytical frameworks used – one for the purpose of managing information  for  accountability,  and  the  other  for  managing  information  for  awareness  or  entertainment.  Attempts  to  classify  genres  range  from  a  simple  categorisation  to  a  much  more  complex  multi‐faceted  approach.    Proponents  of  a  multi‐faceted  approach  point  to  the difficulty in assigning a single genre to some digital documents (Santini, 2007) or  the  need  to  incorporate  contextual  information  (Crowston  &  Kwasnik,  2003;  Yoshioka, Herman, Yates, & Orlikowski, 2001).  Appropriate classification is a critical  first step in any experimental attempts to automate identification – if the appropriate  framework  is  not used,  much valuable  research runs the  risk of being  dismissed as  overly simplistic or reductionist 

8.1.1.1 Information for Accountability Wanda  Orlikowski,  Joanne  Yates  and  colleagues  at  the  MIT  have  undertaken  a  number  of  studies  of  genres,  from  a  structurational  perspective.    In  other  words, 

55

DELOS Deliverable 6.10.1

viewing  genres  as  socially  recognised  communicative  transactions,  that  as  they  are  enacted  over  time,  also  become  organising  structures  and  templates  for  behaviour.   See, for example, their analysis of business presentations, and the use of PowerPoint  (Yates  &  Orlikowski,  2007).    Their  analysis  is  potentially  very  useful  for  the  recordkeeping  community  for  two  main  reasons.    Firstly,  because  of  the  acknowledgement  of  the  importance  of  context.  Secondly,  because  of  their  development of the notion of a genre system.    For the recordkeeping community, as the primary purpose of the information stored  in a digital archive will be information as evidence for accountability purposes, the  context  of  the  genre  is  of  critical  concern.    A  proposal  of  a  multi‐dimensional  taxonomy  for  organisational  genres  (Yoshioka  et  al.,  2001)  may  go  some  way  to  addressing  this  issue.  This  taxonomy  is  based  on  the  analysis  of  the  following  six  genre dimensions:  

The purpose (why) 



The content (what) 



The timing (when) 



The location (where) 



The participants (who) 



The structure and media (how) 

In  reference  to  the  interrogative  pronouns,  this  taxonomy  is  referred  to  as  “5W1H”  (Yoshioka et al., 2001).  It has been suggested that these dimensions can be used as a  framework for gathering genre‐based metadata (Honkaranta, 2003b).    The 5W1H taxonomy encompasses  both genres and genre  systems.  The concept  of  genre  system  is  very  important,  as  it  ensures  consideration  of  documents  as  components  of  a  communicative  action,  and  not  solely  as  discrete  objects.    For  instance, a meeting genre system might comprise an invitation to attend, an agenda  and minutes (Osterlund, 2007; Yates & Orlikowski, 2007).  This genre system is akin  to  the  notion  of  aggregation  of  records,  so  is  extremely  useful  to  retain  in  any  consideration of the application of genre theory to digital archives.  

8.1.1.2 Information for Awareness/Entertainment Research being undertaken at the School of Information Studies, Syracuse University  explores  the  utility  of  genre  in  assisting  access  to  information  in  digital  collections.   These researchers argue that information about the genre, as well as the subject of a  document,  assists  in  improving  the  precision  of  searches  (Crowston  &  Kwasnik, 

56

DELOS Deliverable 6.10.1

2003; Kwasnik, Crowston, Nilan,  &  Roussinov,  2001).     This  research is  very much  grounded  in  a  library  perspective.    This  is  clear  from  the  examples  used,  and  the  historical overview of genre usage in tools such as the Dewey Decimal Classification  and Library of Congress Subject Headings.  There is clear recognition of the need for  a multi‐faceted classification to recognise both form and function of a genre as well  as  

“the numerous clues and components that allow us to discriminate one genre from another.” (Crowston & Kwasnik, 2003, p.356). A further publication on the taxonomy is currently being finalised17.    It  is  worth  noting  that  both  communities  emphasise  the  importance  of  user  involvement in genre identification (Crowston & Kwasnik, 2003; Honkaranta, 2003a,  2003b).       R8.1.2    Further  investigation  of  the  genre,  and  genre  system  concept  should  be  undertaken,  with  a  view  to  determining  appropriate  taxonomies  for  the  different  information domains.  R8.1.3  A  ranking  system  for  genres  should  be  developed  in  conjunction  with  taxonomies.    The  system  should  be  flexible  enough  to  allow  customization  for  different  organizational  settings  and  to  take  into  account  the  purpose  for  which  the  information is being managed.  R8.1.4 The potential for using genre to enhance archival description and cataloguing  of publications should be investigated 

8.1 Models of Automation As a result of the analysis of the approaches to appraisal discussed in section 7.0 we  identified a series of appraisal criteria and structured these so that we can represent  them as appraisal rules. Rules are susceptible to representation as active knowledge  components.  This representation then suggests three models of automation:  

17

Hybrid:    This  model  would  use  technology  to  carry  out  specific  tasks,  but  within  an  overarching  appraisal  top‐down  strategy  requiring  human  decision‐making,  or  automated  application  of  a  retention  and  disposal  schedule.  For  instance,  application  of  functional  appraisal  methodology 

Email from Kevin Crowston, 23 Nov 07

57

DELOS Deliverable 6.10.1

supplemented by subsequent automated triage to determine the feasibility of  preservation at the item level.    

Appraisal engine:  Where a document is submitted to an appraisal engine for  analysis using a combination of text mining and rule‐based reasoning. 



Profiler:  The development of a prototype  to review a variety of information  object  types  (image,  document,  dataset  for  example)  and  apply  appraisal  rules, probably again using rule‐based reasoning methodologies. 

A wholly automated approach to appraisal can at this stage only be envisaged where  a  top‐down  appraisal  strategy  is  not  required,  i.e.  when  managing  information  for  awareness  and/or  entertainment,  rather  than  information  for  evidential  purposes.   However, Maria Esteva’s concept of a natural electronic archive, and appraisal using  social  networking  analysis  and  text  mining  is  an  interesting  initiative  that  will  be  worth monitoring for further development (Esteva, 2007).   R8.2.1  Three models of automation are identified for further investigation:  

Hybrid:    A  combination  of  manual  and  automated  decision  making.    For  instance,  application  of  functional  appraisal  methodology  supplemented  by  subsequent  automated triage to determine the feasibility of preservation at the item level. 



Appraisal  Engine:    Where  a  text  document  is  submitted  to  an  appraisal  engine  for  analysis using a combination of text mining and rule‐based reasoning. 



Profiler:    The  development  of  a  prototype  to  review  a  variety  of  information  object  types (image, document, dataset for example) and apply appraisal rules. 

9.0 Summary of Recommendations Specific  recommendations  to  further  develop  automated  appraisal  (or  re‐appraisal)  are  listed  below.    Discussion  of  these  recommendations  is  provided  in  the  body  of  the text; the recommendation number is a guide to location within the report.  R2.0.1  A  glossary  should  be  developed  of  terminology  relating  to  the  entities  and  processes  associated  with  determining  the  significance  of  information.    Definitions  should  be  acceptable  from  the  perspective  of  all  information  management  occupations.  R2.1.1  Technological  solutions  to  determining  the  significance  of  information  must  take into account ideological, political, economic, cultural and social factors.  R2.2.1  Appraisal  methodologies  must  be  “fit  for  purpose”  –  i.e.,  take  into  account  the  purpose(s)  for  which  information  is  being  managed:  accountability,  awareness  and/ or entertainment. 

58

DELOS Deliverable 6.10.1

R2.3.1 Item‐level appraisal should be considered as a tool to be used in the context of  an  appropriate  theoretical  framework,  and  does  not  imply  the  destruction  of  contextual relationships.   R3.1.1  Appraisal  may  take  place  prior  to  ingest,  on  ingest  and/or  as  part  of  Preservation Planning functionality.  R3.3.1 Records created and maintained in accordance with ISO 15489 and ISO 23081  may not require appraisal at item level.  R5.3.1  Metadata  showing  relationships  and  levels  of  aggregation  of  records  should  be used to automatically generate description for archival repositories   R7.0.1    Further  analysis  should  take  place  of  policies,  guidelines  and  reports  of  practice  to  determine  a  comprehensive  database  of  criteria  used  in  appraisal  of  digital objects, linked not only to domain of practice but also to sector of activity and  country.  R8.0.1  Strategies  making  use  of  web‐enabled  communication  channels  should  be  investigated  to  enable  input  from  all  community  stakeholders  in  determining  what  information is significant and worth preserving  R8.1.1  Further  analysis  of  appraisal  criteria  is  required  in  order  to  formulate  appraisal rules. A wide range of input from practitioners and academics in different  domains  is  required,  so  consideration  should  be  given  to  a  Delphi  study  and/or  a  series of focus groups.  R8.1.2  Further  investigation  of  the  genre,  and  genre  system  concept  should  be  undertaken,  with  a  view  to  determining  appropriate  taxonomies  for  the  different  information domains.  R8.1.3  A  ranking  system  for  genres  should  be  developed  in  conjunction  with  taxonomies.    The  system  should  be  flexible  enough  to  allow  customization  for  different organizational settings and to take into account the purpose for which the  information is being managed.  R8.1.4 The potential for using genre to enhance archival description and cataloguing  of publications should be investigated  R8.2.1  Three models of automation are identified for further investigation:  

Hybrid:    A  combination  of  manual  and  automated  decision  making.    For  instance,  application  of  functional  appraisal  methodology  supplemented  by  subsequent  automated  triage  to  determine  the  feasibility  of  preservation  at  the item level. 

59

DELOS Deliverable 6.10.1



Appraisal  Engine:    Where  a  text  document  is  submitted  to  an  appraisal  engine  for  analysis  using  a  combination  of  text  mining  and  rule‐based  reasoning. 



Profiler:  The development of a prototype to review a variety of information  object  types  (image,  document,  dataset  for  example)  and  apply  appraisal  rules. 

60

DELOS Deliverable 6.10.1

10.0 Conclusions Appraisal is in the digital environment an activity at risk. In reality, the lack of active  appraisal  puts  preservation  itself  at  risk.  The  basic  requirements  identified  here  includes:  

an  early  initiative  within  the  design  of  the  resources  creation  (in  the  recordkeeping system in case of records) 



the neutrality of its principles and procedures as the guarantee for its role as  support to the research right 



the  capacity  to  ensure  the  corporate  memory  as  a  significant  memory  (a  contextual memory)  for the creator and for the social community. 

We  conclude  that  appraisal,  the  determination  of  the  worth  of  preserving  information, continues to be significant in the digital environment. Furthermore, the  concept is applicable beyond the recordkeeping domain that initiated it.  A number  of  strategies  have  been  identified  to  undertake  appraisal,  any  one  of  which,  or  combination of, may be appropriate to a specific information community or domain.   In  considering  the  automation  of  the  appraisal  function  in  the  context  of  a  digital  library or archives, the focus is likely to be on the assessment of individual items. The  results of this assessment will contribute to the overall appraisal determination.    Our analysis of the approaches to appraisal resulted in the identification of a series of  appraisal  criteria  which  have  been  structured  so  that  we  can  represent  them  as  appraisal  rules.  Rules  are  susceptible  to  representation  as  active  knowledge  components.  In considering the next steps, this representation suggests three models  of automation: hybrid, appraisal engine and profiler.  Research  underway  on  the  automation  of  metadata  extraction  in  conjunction  with  genre identification, together with the structurational view of genres, shows a great  deal  of  promise  for  the  digital  archives  community.      In  addition,  the  technological  possibilities  now  present  to  facilitate  input  of  other  voices  into  the  selection  of  information  that  has  value  for  communities  open  up  a  way  forward  to  a  new  information age, one that need no longer be exclusively defined by dominant societal  forces.  

61

DELOS Deliverable 6.10.1

11.0 References Anderson, R., Frost, H., Hoebelheinrich, N., & Johnson, K. (2005). The AIHT at Stanford University: Automated Preservation Assessment of Heterogeneous Digital Collections. D-Lib Magazine, 11(12), http://www.dlib.org/dlib/december05/johnson/12johnson.html. Bauer, G. P. (1946). The appraisal of current and recent records. Staff Information Circulars, 13, 2. Bearman, D. (1989). Archival methods Archives and Museum Informatics Technical Report. Bearman, D. (1995). Archival strategies. American Archivist, 58(Fall), 380-413. Bearman, D. (2005). Addressing selection and digital preservation as systemic problems. Paper presented at the Preserving the digital heritage: principles and policies, The Hague http://www.unesco.nl/images/preserving_the_digital_heritage.pdf. Boles, F., & Young, J. M. (1985). Exploring the black box: The appraisal of university administrative records. American Archivist, 48(2), 121-140. Burke, J. (1998). Renovating Conspectus for the digital era: applied at Queensland University of Technology. Paper presented at the 9th Biennial VALA, Melbourne http://www.nla.gov.au/libraries/hosted/embracin.html. Candela, L., Castelli, D., Ferro, N., Yoannidis, Y., Koutrika, G., Meghini, C., et al. (2007). The Digital Library Reference Model: Foundations for Digital Libraries. Pisa: DELOS.http://www.delos.info/files/pdf/ReferenceModel Cobb, J., Pearce-Moses, R., & Surface, T. (2005, April 26). ECHO Depository Project. Paper presented at the IS&T Archiving Conference, Washington, DC http://www.ndiipp.uiuc.edu/pdfs/IST2005paper_final.pdf. Committee on Appraisal. (2003). Manual on appraisal [draft]: International Council on Archives. http://www.ica.org/en/node/30417 Consultative Committee for Space Data Systems. (2004). Producer-Archive Interface Methodology Abstract Standard. Washington, DC: NASA. http://public.ccsds.org/publications/archive/651x0b1.pdf Cook, T. (1992). Documentation strategy. Archivaria(34), 181-191. Crowston, K., & Kwasnik, B. H. (2003). Can document-genre metadata improve information access to large digital collections? Library Trends, 52(2), 345-361. Cunningham, A., & Oswald, R. (2005). Some functions are more equal than others: the development of a macroappraisal strategy for the National Archives of Australia. Archival Science, 5, 163-184. Data Preservation Alliance for the Social Sciences (DataPASS). Appraisal guidelines. http://www.icpsr.umich.edu/DATAPASS/pdf/appraisal.pdf Day, M., Pennock, M., & Allinson, J. (2007). Co-operation for digital preservation and curation: collaboration for collection development in institutional repository networks. Paper presented at the DigCCurr2007: An International Symposium in Digital Curation, Chapel Hill, NC http://www.ils.unc.edu/digccurr2007/papers/dayPennock_paper_93.pdf. 62

DELOS Deliverable 6.10.1

Dorner, D. G. (2004). The impact of digital information resources on the roles of collection managers in research libraries. Library Collections, Acquisitions, & Technical Services, 28, 249-274. Duranti, L. (1994). The concept of appraisal in archival science. American Archivist, 57, 328-344 Eastwood, T. (1993). How goes it with appraisal? Archivaria(36), 111-121. Eastwood, T. (2003). What archivists have learned about appraisal of digital records. Paper presented at the International Workshop on the selection, appraisal and retention of digital scientific data, Lisbon, Portugal http://www.erpanet.org/events/2003/lisbon/presentations/Terry%20Eastwood %20paper.pdf. EROS. (1999). Inventory, appraisal and disposal. In Guidelines for management, appraisal and preservation of electronic records (2nd ed., Vol. 2: Procedures): The National Archives http://www.nationalarchives.gov.uk/electronicrecords/advice/guidelines.htm Esteva, M. (2007). Bits and pieces of text: appraisal of a natural electronic archive. Paper presented at the Digital Humanities 2007. from http://www.digitalhumanities.org/dh2007/abstracts/xhtml.xq?id=136. Evans, J., McKemmish, S., & Bhoday, K. (2005). Create once, use many times: The clever use of recordkeeping metadata for multiple archival purposes. Archival Science, 5, 17-42. Gilliland-Swetland, A. (1995). Development of an expert assistant for archival appraisal of electronic communications: An exploratory study. Unpublished Ph.D., University of Michigan. Grotke, A., & Ruth, J. E. (2007). Selecting and managing content captured from the web: Expanding curatorial expertise and skills in building Library of Congress web archives. Paper presented at the DigCCurr2007: An International Symposium in Digital Curation, Chapel Hill, NC http://www.ils.unc.edu/digccurr2007/papers/grotkeRuth_paper_9-3.pdf. Honkaranta, A. (2003a, April 23-26). Developing Document and Content Management in Enterprises Using a "Genre Lens". Paper presented at the Proceedings of the 5th International Conference on Enterprise Information Systems, Angers, France http://www.cc.jyu.fi/~ankarjal/ICEIS2003_GenreDM.pdf. Honkaranta, A. (2003b, June 16-17). Evaluating the 'genre lens' for analyzing requirements for content assembly. Paper presented at the Eighth CAiSE/IFIP8.1 International Workshop on Evaluation of Modeling Methods in Systems Analysis and Design (EMMSAD '03), Velden, Austria http://www.ad.jyu.fi/users/a/ankarjal/EMMSAD2003.pdf. International Organization for Standardization. (2001). Information and documentation Records Management - Part 1: General(No. ISO15489-1: 2001). Geneva: ISO. International Organization for Standardization. (2003). Space data and information transfer systems -- Open archival information system -- Reference model(No. ISO14721:2003). Geneva: ISO. International Organization for Standardization. (2006). Information and documentation Records management processes - Metadata for records. Part 1: Principles(No. ISO230811:2006). Geneva: ISO. 63

DELOS Deliverable 6.10.1

InterPARES. (2000a). Appendix 3 Appraisal of electronic records: A review of the literature in English. In The long-term preservation of authentic electronic records: Findings of the InterPares project http://www.interpares.org/book/interpares_book_l_app03.pdf InterPARES. (2000b). Appraisal Task Force Report. In The long-term preservation of authentic electronic records: Findings of the InterPares project: University of British Columbia http://www.interpares.org/book/interpares_book_e_part2.pdf InterPARES. (2002). Appendix 2 Requirements for assessing and maintaining the authenticity of electronic records. In The long-term preservation of authentic electronic records: Findings of the InterPares project http://www.interpares.org/book/interpares_book_k_app02.pdf JISC. (2007). e-Journals: Archiving and Preservation Briefing paper. from http://www.jisc.ac.uk/publications/publications/pub_ejournalspreservationbp.a spx Jonker, A. E. M. (2005). Macroappraisal in the Netherlands. The first ten years, 19912001, and beyond. Archival Science, 5, 203-218. Kim, Y., & Ross, S. (2006). Genre classification in automated ingest and appraisal metadata. Paper presented at the European Conference on Research and Advanced Technology for Digital Libraries (ECDL), Alicante, Spain http://eprints.erpanet.org/110/. Kim, Y., & Ross, S. (2007). "The naming of cats": Automated genre classification. International Journal of Digital Curation, 2(1), 49-62. http://www.ijdc.net/ijdc/article/view/24/27. Kretschmar, R. (2005). Archival appraisal in Germany: A decade of theory, strategies, and practices. Archival Science, 5, 219-238. Kwasnik, B. H., Crowston, K., Nilan, M., & Roussinov, D. (2001). Identifying document genre to improve web search effectiveness. Bulletin of the American Society for Information Science and Technology, 23-26. Lala, V., & Joe, S. (2006). Web archiving at the National Library of New Zealand. Paper presented at the LIANZA, Wellington http://www.lianza.org.nz/library/files/store_013/WebArchives_VLala.pdf. Lloyd, A. (2007). Guarding against collective amnesia? Making significance problematic: An exploration of issues. Library Trends, 56(1), 53-65. Morris, S. (2006, March 27). Identification, selection and appraisal within the North Carolina Geospatial Data Archiving Project (NCGDAP). Paper presented at the Digital Preservation in the State Government: Best Practices Exchange http://www.lib.ncsu.edu/ncgdap/presentations/StateArchIDSelectionfinal.ppt. Murray, K., & Hsieh, I. K. (2006). Collection Planning Guidelines. from http://web3.unt.edu/webatrisk/reports/cpg_final_31may2006.pdf Murray, K., & Phillips, M. (2007). Collaborations, best practices, and collection development for born-digital and digitized materials. Paper presented at the DigCCurr2007: An International Symposium in Digital Curation, Chapel Hill, NC http://www.ils.unc.edu/digccurr2007/papers/murrayPhillips_paper_9-3.pdf. National Archives and Records Administration. (2007). Strategic directions: appraisal policy. from http://www.archives.gov/records-mgmt/initiatives/appraisal.html 64

DELOS Deliverable 6.10.1

National Library of Australia. (2005). Online Australian publications: selection guidelines for archiving and preservation by the National Library of Australia. from http://pandora.nla.gov.au/selectionguidelines.html Neumayer, R., & Rauber, A. (2007). Why appraisal is not 'utterly' useless and why it's not the way to go either: A provocative position paper: Digital Preservation Europe. http://www.digitalpreservationeurope.eu/publications/position/appraisal_final. pdf OCLC. (2007). Creating the conspectus. from http://www.oclc.org/programs/ourwork/past/conspectus.htm Osterlund, C. (2007). Genre combinations: A window into dynamic communication practices. Journal of Management Information Systems, 23(4), 81-108. Pearce-Moses, R., & Kaczmarek, J. (2005). An Arizona model for preservation and access of web documents. DttP: Documents to the People, 33(1), 17-24, www.ndiipp.uiuc.edu/pdfs/azmodel.pdf. Pinsent, E., & Ashley, K. (2006). Digital Asset Assessment Tool (DAAT) project. London: University of London Computer Centre. http://www.jisc.ac.uk/publications/publications/pub_ejournalspreservationbp.a spx PREMIS Working Group. (2004). Implementing preservation repositories for digital materials. Mountain View, CA. www.oclc.org/research/projects/pmwg/surveyreport.pdf Pymm, B. (2006). Building collections for all time: the issue of significance. Australian Academic and Research Libraries (AARL), 37(1), 61-73. Roberts, J. (2005). Macroappraisal Kiwi style: Reflections on the impact and future of macroappraisal in New Zealand. Archival Science, 5, 185-201. Ross, S. (2003). Digital Library Development Review. Wellington: National Library of New Zealand. http://www.natlib.govt.nz/catalogues/library-documents/digitallibrary-development-review/?searchterm=ross&body_language= Santini, M. (2007). Characterizing genres of web pages: Genre hybridism and individualization. Paper presented at the Proceedings of the 40th Hawaii International Conference on System Sciences http://csdl2.computer.org/comp/proceedings/hicss/2007/2755/00/27550071. pdf. Schauder, D., Stillman, L., & Johanson, G. (2005). Sustaining a community network: the information continuum, e-democracy and the case of VICNET. Journal of Community Informatics., 1(2), http://www.cijournal.net/index.php/ciej/article/view/239/203. Schellenberg, T. R. (2003). Modern Archives: Principles and Techniques. Chicago: Society of American Archivists Shilton, K., & Srinivasan, R. (2007). Participatory appraisal and arrangement for multicultural archival collections. Archivaria(63), 87-101. State Records Authority of New South Wales. (2007). The DIRKS manual - strategies for documenting government business. rev., from http://www.records.nsw.gov.au/recordkeeping/dirks-manual_4226.asp 65

DELOS Deliverable 6.10.1

The National Archives. (2004). Appraisal policy. from http://www.nationalarchives.gov.uk/recordsmanagement/selection/appraisal.ht m Thomas, S. (2007). Paradigm: A practical approach to the preservation of personal digital archives. Oxford. http://www.paradigm.ac.uk/projectdocs/jiscreports/ParadigmFinalReportv1.pd f Treloar, A., Groenewegen, D., & Harboe-Ree, C. (2007). The data curation continuum: Managing data objects in institutional repositories. D-Lib Magazine, 13(9/10), http://www.dlib.org/dlib/september07/treloar/09treloar.html. Underwood, W., Isbell, S., & Underwood, M. (2007). Grammatical induction and recognition of the documentary form of records. Paper presented at the DigCCurr2007, Chapel Hill, NC http://www.ils.unc.edu/digccurr2007/papers/underwood_paper_4-5.pdf. United States Geological Survey. (2007). Records appraisal tool. from http://eros.usgs.gov/government/ratool/view_questions.php Verheul, I. (2006). Networking for digital preservation: Current practice in 15 national libraries. Muenchen: Saur.http://www.ifla.org/VI/7/pub/IFLAPublication-No119.pdf Yates, J., & Orlikowski, W. (2007). The PowerPoint Presentation and its corollaries: How genres shape communicative action in organizations. In M. Zacrhy & C. Thralls (Eds.), Communicative Practices in Workplaces and the Professions: Cultural Perspectives on the Regulation of Discourse and Organizations. Amityville, NY: Baywood Publishing Yoshioka, T., Herman, G., Yates, J., & Orlikowski, W. (2001). Genre taxonomy: A knowledge repository of communicative actions. ACM Transactions on Information Systems, 19(4), 431-456.

66

DELOS Deliverable 6.10.1

Appendix 1: Summary of Findings from InterPares

Archival appraisal can be considered as a type of a preservation function for digital  records.  General principles are as follows:18  In  the  archival  sector  appraisal  is  part  of  a  selection  process  made  of  specific  activities  (selection,  appraisal,  disposition  as  destruction  or  preservation).  The  appraisal should be conducted on the basis of  well defined principles and criteria as  further developed with reference to the non‐electronic environment. Specifically, the  appraisal  should  be  carried  out  when  the  digital  resources  are  still  in  their  active  phase, as near to the time of creation as possible.   The  management  of  the  appraisal  function  implies  the  use  and  the  maintenance  of  a  huge  amount  of  information  which  include  the  decisions  taken  in  the  past  (with  reference  to  the  various  responsibilities  involved  and  the  strategies  and  procedures  developed),  the  contextual  information  related  to  the  records  (the  juridical,  documentary, technological contexts), the values established for the records and for  their  preservation  feasibility  (in  terms  of  cost  and  in  terms  of  preserving  the  authenticity of the records).   The  feasibility  of  the  records  preservation  is  strictly  based  on  the  capacity  of  preserving the essential digital components of the records, those able for the present  and  for  the  future  to  confer  their  identity  and  to  ensure  their  integrity.  This  information  (which  includes  content  and  data/metadata  necessary  to  organise,  structure or render the content of the records)  have to be structured and articulated  in  a  way  to  enable  the  decisions  related  to  the  present  and  future  capacity  of  preserving the digital components which constitute the record identity and to ensure  its integrity. This effort includes at least three phases:  

determine which elements are able to make the authenticity presumable 

The assumptions included in the following paragraphs are basically a synthesis of the main results of the Appraisal Task Force of the InterPARES project. They could be considered a common conceptual framework in the archival international community. See National and multinational team report. Italian research team report, in The long-germ preservation of authentic electronic records: findings of the InterPARES project, Luciana Duranti editor, San Miniato (PI), Archilab, 2005. This paragraph considers also and integrates the evolution of the Monash Clever Recordkeeping Metadata (CRKM) project as presented by S. McKemmish, J. Evans, K. Bhoday, Create Once, Use Many Times: the Clever Use of Recordkeeping Metadata for Multiple Archival Purposes, at the International Conference of ICA in Vienna, 23-29 August 2004 in the session: Smart Metadata and the Archives of the Future. 18

67

DELOS Deliverable 6.10.1



identify  where  these  crucial  elements  are  manifested  (in  which  digital  components)  and  what  is  the  technical  information  relevant  for  their  preservation 



reconcile  these  preservation  requirements  with  the  financial  and  technical  capacities of the repository 

  As clearly testified by the flexibility required in the preservation process, appraisal is  a  relevant  active  component  of  this  process  and  it  includes  a  higher  level  of  responsibility  than  in  the  past.  The  quality  of  the  preservation  is  strictly  connected  with the quality of an early appraisal. The more complex and rich the digital data to  be preserved (as in the scientific world), the more relevant is the active appraisal here  described which includes crucial tools whose automation19 will ensure the success of  the preservation itself:  

criteria and policies able to orient a neutral approach, 



auditing and validating procedures 



contextual information automatically extracted and preserved. 

19

See http://eros.usgs.gov/government/RAT/tool.asp where the USGS Scientific Records Appraisal Tool is described (see Codata-ERPANET workshop, The selection, appraisal and retention of digital scientific data…cit., p.13.

68

DELOS Deliverable 6.10.1

Appendix 2: Source Documents Used to Provide Initial List of Criteria  1.  InterPARES.  (2000).  Appraisal  Task  Force  Report.  In  The  long‐term  preservation  of  authentic  electronic  records:  Findings  of  the  InterPares  project:  University  of  British  Columbia http://www.interpares.org/book/interpares_book_e_part2.pdf This  summarises  the  overall  findings  on  appraisal  from  InterPARES  1,  detailing  requirements to assess authenticity. InterPARES findings are applicable to records of  all types. This source was used as it is so significant and influential, and referred to  throughout our report.  2. Eastwood, T. (2003). What archivists have learned about appraisal of digital records.  Paper presented at the International Workshop on the selection, appraisal and  retention of digital scientific data, Lisbon, Portugal    http://www.erpanet.org/events/2003/lisbon/presentations/Terry%20Eastwood%20pap er.pdf Terry  Eastwood  was  chair  of  the  InterPARES  Appraisal  Task  Force.    This  paper  discusses the InterPARES findings, and is particularly relevant to our report in that  consideration  is  given  to  applying  those  findings  outside  the  archival  domain,  to  scientific data. 3. Data Preservation Alliance for the Social Sciences (DataPASS). Appraisal guidelines.  http://www.icpsr.umich.edu/DATAPASS/pdf/appraisal.pdf DataPASS is a major US collaborative project with partners from the academic sector  and the National Archives and Records Administration, supported by the Library of  Congress.  Activities  involve  surveying  important  research  in  the  social  sciences,  as  well  as  other  sources  of  information  about  potential  acquisitions,  and  identifying  content  that  should  be  preserved  –  including  public  and  private  sources  of  data.  Appraisal  standards  have  been  developed  to  guide  this  process.    This  source  was  utilised because it focuses specifically on a significant and specialised type of data.  4.  National  Archives  and  Records  Administration.  (2007).  Strategic  directions:  appraisal policy.  http://www.archives.gov/records-mgmt/initiatives/appraisal.html  The NARA appraisal policy is very clearly written and provides useful explanations  for  appraisal criteria.    One of the key factors influencing  its  use as  a source for  our  report is its currency: the policy was published in September 2007.  

5. Thomas, S. (2007). Paradigm: A practical approach to the preservation of personal digital  archives. Oxford.  http://www.paradigm.ac.uk/projectdocs/jiscreports/ParadigmFinalReportv1.pdf 

69

DELOS Deliverable 6.10.1

The appraisal criteria used in the Paradigm Project are referred to in our report.  This  document provides a very detailed view of specific issues with personal papers – i.e.  records  created  in  uncontrolled  environments.    The  appraisal  criteria  used  were  devised within that context, which provides a unique perspective.  The fact that this  was a British project was also significant.    6. Grotke, A., & Ruth, J. E. (2007). Selecting and managing content captured from the web:  Expanding curatorial expertise and skills in building Library of Congress web archives.  Paper presented at the DigCCurr2007: An International Symposium in Digital  Curation, Chapel Hill, NC  http://www.ils.unc.edu/digccurr2007/papers/grotkeRuth_paper_9-3.pdf   A case study of work undertaken in web archiving at the  Library of Congress, that  provides  some  brief,  but  useful,  discussion  about  the  development  of  specific  appraisal criteria.  7. InterPARES. (2002). Appendix 2 Requirements for assessing and maintaining the  authenticity of electronic records. In The long‐term preservation of authentic electronic  records: Findings of the InterPares project   http://www.interpares.org/book/interpares_book_k_app02.pdf  As  with  the  first  source  listed  above,  the  significance  and  influence  of  InterPARES  findings determined the inclusion of this resource.  8. Morris, S. (2006, March 27). Identification, selection and appraisal within the North  Carolina Geospatial Data Archiving Project (NCGDAP). Paper presented at the Digital  Preservation in the State Government: Best Practices Exchange  http://www.lib.ncsu.edu/ncgdap/presentations/StateArchIDSelectionfinal.ppt  This PowerPoint presentation provides useful detail of the appraisal criteria applied  to geospatial data in a specific archiving project.  9.  National  Library  of  Australia.  (2005).  Online  Australian  publications:  selection  guidelines for archiving and preservation by the National Library of Australia. From  http://pandora.nla.gov.au/selectionguidelines.html   These relatively recent guidelines were included for analysis because of their specific  focus  on  publications.    The  fact  that  they  originated  from  Australia  was  also  important, as this introduced another perspective into the mix. 

10.  Murray,  K.,  &  Phillips,  M.  (2007).  Collaborations,  best  practices,  and  collection  development  for  born‐digital  and  digitized  materials.  Paper  presented  at  the  DigCCurr2007:  An  International  Symposium  in  Digital  Curation,  Chapel  Hill,  NC  http://www.ils.unc.edu/digccurr2007/papers/murrayPhillips_paper_9-3.pdf 

70

DELOS Deliverable 6.10.1

A  report  that  describes  a  survey  undertaken  of  curators  and  librarians  and  the  subsequent development of web collection plans.  This highlights and discusses the  specific  concerns  identified,  and  was  included  as  a  source  in  our  report  because  of  the input from a number of organisations.    11. Lala, V., & Joe, S. (2006). Web archiving at the National Library of New Zealand. Paper  presented at the LIANZA Conference, Wellington  http://www.lianza.org.nz/library/files/store_013/WebArchives_VLala.pdf.  A case study of web archiving in  New Zealand, which includes some discussion of  the specific criteria used in appraisal.   12.  United  States  Geological  Survey.  (2007).  Records  appraisal  tool.  from  http://eros.usgs.gov/government/ratool/view_questions.php   An  extensive  list  of  questions  used  to  gather  the  data  necessary  to  determine  the  value of geospatial data.  This was particularly useful, as the criteria are very specific.   

   

71