Digital Preservation Coalition (DPC)
Faced with the difficulty of preserving social-media content after platforms restricted their APIs, the DPC's Preservation Registries Technical Architect, Andrew Jackson, turned to the open-source, self-hosted ArchiveBox to archive Facebook, Instagram and LinkedIn data. Native platform exports supply only CSV files of textual data with fragile links to the media, so Jackson used ArchiveBox from the command line to iterate through the exported links, fetch the media payloads, and store them in standards-based WARC files that preserve the original URLs. The one real difficulty he documented was extraction: ArchiveBox records each item in its own separate WARC, making it awkward to pull the captures into a single collection to keep alongside the source CSV.