Web Archiving Service - Confluence

2 downloads 181 Views 3MB Size Report
UC Berkeley Office of Public Affairs. • UC Berkeley Libraries. • UC Davis Libraries. • UC Irvine ... Stanford Univ
Web Archiving Service (WAS) Rosalie Lack [email protected] Data Curation for Practitioners 2012 Workshop

Imagine a world …

Imagine a world “Imagine a world in which libraries and archives had never existed. No institutions had ever systematically collected or preserved our collective cultural past: every book, letter, or document was created, read and then immediately thrown away. What would we know about our past?’’ A Vision Of The Role And Future Of Web Archives Kalev H. Leetaru, Graduate School of Library and Information Science, University of Illinois. Presented as the keynote address at the 2012 IIPC General Assembly in Washington, DC. http://netpreserve.org/sites/default/files/resources/VisionRoles.pdf

This is our world …

This is our world … “Yet, that is precisely what is happening with the web: more and more of our daily lives occur within the digital world, yet more than two decades after the birth of the modern web, the “libraries” and “archives” of this world are still just being formed.” A Vision Of The Role And Future Of Web Archives Kalev H. Leetaru, Graduate School of Library and Information Science, University of Illinois. Presented as the keynote address at the 2012 IIPC General Assembly in Washington, DC. http://netpreserve.org/sites/default/files/resources/VisionRoles.pdf

WAS … is A service of the UC Curation Center to collect, manage, preserve and publish websites and documents.

WAS Snapshot 53 public archives 120+ archives total 7,500+ sites 50+ TB 23 institutions

WAS Institutions • • • • • • • • • • •

Institute of Governmental Studies Library, UCB UC Berkeley Office of Public Affairs UC Berkeley Libraries UC Davis Libraries UC Irvine Libraries UC Los Angeles Libraries UC Riverside Libraries UC San Diego Libraries UC San Francisco Libraries UC Santa Barbara UC Santa Cruz McHenry Library

• • • • • • • • • • • •

Emory University Library Institute for Research on Labor and Employment New York University Northwestern University Library Purdue University Stanford University Libraries Temple University University of Arkansas Libraries University of Illinois at Urbana Champaign Libraries University of Michigan, Bentley Historical Library USDA Economic Research Service Water Resources Collections and Archives

WAS Overview A) Curator Tools

Curator Workflow

1. Create Site • Enter site name, URL and description • Scope • Capture frequency • Robots.txt

2. Capture Sites

3. View Captures • View

captures

• QA • Compare

4. Public Access • Customize the archive • Write description • Create custom banner and icon

WAS Overview B) Public Archives

Web Archive ‘home page’

Browse: Site List + Tags

Search: All Sites in an Archive

Integration with your Systems

How are people using WAS?

Institution’s website • Preserve intuitional history • Capture university news and events

Geographically focused

Topical Support special research collections

Event • Sudden action required • May need many selectors • Start date / end date

Researcher’s Perspective • Building collections for research – Study the topic / event – Study site change or web-based communication – Websites are datasets for analysis and data mining

• Preservation of research – Archive grant-funded websites – Selected sites

• Create stable citations for publications

Get started! • Each library has WAS administrator(s) • Unlimited number of curators per account • What’s the cost? – UC does not pay a service fee – Storage only: $1040/per TB (average site is $1.46/annually); storage costs to go down

Challenges • Shared collection development • Metadata issues • Workflow and cost models for faculty projects • Time! • Limitations of web crawlers • Websites are messy

Contact me!

Rosalie Lack WAS Service Manager [email protected]

www.votearriana.com (2003)*

*WAS 2013 California Recall Election Web Archive California http://webarchives.cdlib.org/a/carecall2003

www.votearriana.com (2012)