49 results for “topic:webarchiving”
An Awesome List for getting started with web archiving
Wayback Machine API interface & a command-line tool
WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
A list of things related to software, literature, and other content for 🕣 Memento
Parse And Create Web ARChive (WARC) files with node.js
Various Jupyter notebooks about Common Crawl data
A dockerized, queued high fidelity web archiver based on Squidwarc
Quick Cache and Archive search buttons
metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)
Awesome list dedicated to digital and data preservation tools, sources, services and so on.
A social media open post web archiving tool
Digital Preservation of HTTP in documentary heritage.
Decentralized web archiving
A tool for detecting viruses and NSFW material in WARC files
🗄 File-Based Reference Filing System.
Seeder - Czech webarchive curating tool and public site
A javascript for fighting link rot and content drift using link decoration and web archives.
Parser for WARC (aka WebArchive) files
Tika based link (URL) extractor for httpreserve
An archival thumbnail visualization server
pywb recorder over tor, anonymously records the web. (docker image)
A wiki of the broader Web Archiving Community: important organizations, alternative projects, blog posts, and more.
News Archiver, Data Aggregation for CNN and Fox News
record current active tab on webrecorder.io
A helper package to tokenize textual content and retrieve hyperlinks
Client app for httpreserve pkg that generates CSV, JSON, HTTP, and BoltDB
Here lies the code for 'pagebinder' - more details in README.
Given four bytes, download a random file from web archives implementing the UKWA Shine interface