120 results for “topic:web-archiving”
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Core Python Web Archiving Toolkit for replay and recording of web archives
Collect and revisit web pages.
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Automatically archive links to videos, images, and social media content from Google Sheets (and more).
Run a high-fidelity browser-based web archiving crawler in a single Docker container
Free web archiving and sharing service based on Cloudflare. 跑在 Cloudflare 上的免费网页归档和分享工具。
Serverless replay of web archives directly in the browser
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.
InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
Wayback Machine API interface & a command-line tool
Indelible links
Archiveror will help you preserve the webpages you love. 💾
Streaming WARC/ARC library for fast web archive IO
Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)
A Tool To Push Web Resources Into Web Archives
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
WarcDB: Web crawl data as SQLite databases.
:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
Chrome extension to "Create WARC files from any webpage"
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)
Social Feed Manager user interface application.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
The repository and website hosting the peer review process for new Programming Historian lessons
Perpetual Access To The Scholarly Record
🗄️ A simple CLI for converting WARC to Parquet.
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.