The Internet Archive is trying to save Google+ content before it is deleted
It is now just a couple of weeks until Google+ closes down forever. While few will mourn its passing, there is still a lot of content on Google's social site that is worth preserving. Maybe.
Google+ users have the option of downloading their own data for posterity, but the Archive Team, in conjunction with the Internet Archive, has grander plans: it is looking to archive as much of the site as possible before it is deleted forever.
- Google adds privacy-focused DuckDuckGo search engine to Chrome
- Google recommends upgrading to Windows 10 to avoid unpatched Windows 7 zero-day that's being actively exploited
- Google Project Zero reveals 'high severity' macOS vulnerability that Apple has failed to patch
- Google reveals the shutdown date for Google+ and when your data will be deleted
As noted by the Verge, in a post on Reddit the Archive Team (a "loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage" that is not affiliated with the Internet Archive but works closely with it) reveals that it recently started its mission to archive as much of Google+'s public content as possible.
Having previously worked on large-scale projects such as archiving Mozilla Addons and Tindeck, the Archive Team uses a tool called "Warrior", based on "grabber" scripts, which run in a virtual machine on a desktop or server system. While the aim is to grab as much public content as possible, the team notes that there are limitations to its work:
- Only public content that is presently available on Google+ is being included. Private posts, and any previously deleted content will not be saved. (Previously saved content that's since been deleted will be available.)
- Full post comments may not be archived. Google+ allows up to 500 comments per post, but only presents a subset of these as static HTML. It's not clear that long discussion threads will be preserved. Historically they have not been.
- Image and video content may not be preserved at full resolution. This will apply mostly to high-def image and video content, though photographers may want to be aware.
- Content archival is subject to the rate at which the project can proceed and any limitations imposed outside its control. From past experience, the Archive Team can suck in amazing amounts of data quickly, and general success is likely.