Archiving the web

The UK Web Archiving Consortium is a group of national bodies collaborating to selectively archive websites of interest. Not the complete web. They have established a test-bed activity, and a website has just been announced.

Using PANDAS software, developed by the National Library of Australia, consortium members will archive sites (once appropriate permissions have been obtained from website owners) relevant to their interests. For example, the Wellcome Library will focus on collecting medical sites, whilst the National Library of Wales will collect sites that reflect life in contemporary Wales. The British Library will focus on sites of cultural, historical and political importance. [UK Web Archiving Consortium: Project Overview]
I wonder what takeup there would be for a service which does this type of thing for libraries or other users.

This touches on my local history comment of the other day. If I am a public library, I might like to secure the archiving of a selection of websites which represent various local community activities. If I am an academic library I may wish to secure the archiving of some websites which are of specific value to particular faculty interests. At the moment, doing this type of thing on an individual basis may be difficult.

Comments: 1

May 19, 2005
Nick Baker

I was involved with a project at the University of Michigan School of Information to do a very similar thing, namely archive the Umich.edu domain. Rather than attempt to gather bits of the entire internet, we wanted a comprehensive collection of just one site.

We used the Internet Archive's open source crawler Heritrix to perform the crawls, and wrote our own open source software to process and mirror the content. Check it out here: http://www.si.umich.edu/mirror/

I agree with you that there is a lot of room for local initiatives to archive websites. As more and more of the cultural record is generated digitally and online, it becomes ever more important to capture this material in a sustainable manner.