Adopted ArchiveBox

Digital Preservation Coalition (DPC)

Faced with the difficulty of preserving social-media content after platforms restricted their APIs, the DPC's Preservation Registries Technical Architect, Andrew Jackson, turned to the open-source, self-hosted ArchiveBox to archive Facebook, Instagram and LinkedIn data. Native platform exports supply only CSV files of textual data with fragile links to the media, so Jackson used ArchiveBox from the command line to iterate through the exported links, fetch the media payloads, and store them in standards-based WARC files that preserve the original URLs. The one real difficulty he documented was extraction: ArchiveBox records each item in its own separate WARC, making it awkward to pull the captures into a single collection to keep alongside the source CSV.

Original source
Archiving Facebook, Instagram & LinkedIn
Digital Preservation Coalition
The archived copy opens a snapshot on the Internet Archive's Wayback Machine, preserved for when the original moves or disappears.
The Archive Migration Review summarises this story in its own words and links to the original source for verification. We are editorially independent and not affiliated with the institution or software project named above. Summaries are compiled in good faith from publicly available accounts; corrections are welcome.
Keep reading

Related migrations