One of the biggest questions for records managers is if information is “valuable”. The value of information is the determining factor of if a document is a record and how long to keep it. Even archivists even ask that question when decided to add something to their collection or not.
The interesting thing about information on the web is that most assessments of it do not depend upon value, but cultural relevance. Since cultural relevance is an objective concept most web capture initatiative seek to capture everything or at least everything pertaining to a subject or from a particular medium. One example of this which I mentioned before is Library of Congresses’ goal to store and index every tweet every posted.
This past weekend the British Library upped the ante with an initiative to capture all websites ending in .uk.
For centuries the library has kept a copy of every book, pamphlet, magazine and newspaper published in Britain. Starting Saturday, it will also be bound to record every British website, e-book, online newsletter and blog in a bid to preserve the nation’s “digital memory.”
Their rationale for doing this is because web content is ephemeral and already most cultural history from the web is vanishing down a “digital black hole.” “The library says firsthand accounts of everything from the 2005 London transit bombings to Britain’s 2010 election campaign have already vanished.” The argument could be made that the information vanished because of diminished value since it, or similar accounts, are captured elsewhere.
One of the things the article mentions the British Library keeping is the blog of a 9 year old girl about school lunches. Think about the ridiculousness of this. That would be like 50 years ago a library saying they wanted to collect the diaries of middle school students to capture cultural history.
Like reference collections around the world, the British Library has been attempting to archive the Web for years in a piecemeal way and has collected about 10,000 sites. Until now, though, it has had to get permission from website owners before taking a snapshot of their pages.
That began to change with a law passed in 2003, but it has taken a decade of legislative and technological preparation for the library to be ready to begin a vast trawl of all sites ending with the suffix .uk.
I am not extremely familiar with the Digital Millennium Copyright Act but it seems like legislation similar to the one passed in Britain for archiving the web would be difficult to pass in the US due to copyright restrictions. If you are familiar with DMCA or other large scale web archive projects I would love to hear your thoughts on this.