Categories
Data Custody Decentralisation and Neutrality Privacy and Anonymity

The Library of Congress and online archival – Part 2

(Part 1)

Online archival is important to me. I am particularly interested in blogs that still have great value but are no longer maintained – some of these are by people who I had followed in the 2000s. Some of these are friends who have long since stopped writing other than on social media.

If their writings are on third-party services like Blogspot, the service itself can be shut down or they can be taken down because of inactivity. If they are on their own domain, the owner may allow the domain to expire.

In some cases, the owner may deliberately erase posts, asking even the internet Archive to delete its records. The current head of the Microsoft-owned Github, Nat Friedman, used to write a fun, eclectic, useful, and – to me – inspirational blog that blended his personal and professional lives. Some years ago it was wiped clean of the content I used to follow. More recently it was wiped again. Now it’s just a Medium-hosted blog with a half-dozen posts. I respect Nat’s decision to not have his old life displayed online. I just wish I had my own archive of it, one that I of course intend to keep private.

For now I have a short list of sites that I have downloaded using wget, with flags to download images and other linked content, and change URLs to local ones so I can browse the site offline. i’m interested in whether the US Library of Congress’ online archival format, web ARChive, and its toolset, is an improvement.

Endnote: Archiving entire blogs or websites is different from individual articles, of course. We’ve seen my iOS shortcut that both saves a Markdown-formatted cruft-less version of online articles locally as well as optionally saves to one of Instapaper, Pocket or Evernote.

(ends)