"topic:web-archiving" — Search | GitHunt

© 2026 GitHunt · tansuasici

120 results for “topic:web-archiving”

ArchiveBox/ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Python27.0k1.5kUpdated 2 hours ago

archiveboxbackupsbookmark-archiverbrowser-bookmarkschromiumdigipresfirefoxheadless-browserinternet-archivingpinboardpocketpythonrssself-hostedsinglefilewarcwayback-machineweb-archivingwgetyoutube-dl

webrecorder/pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

JavaScript1.6k239Updated 1 day ago

pythonpywbwaybackweb-archivesweb-archiving

Rhizome-Conifer/conifer

Collect and revisit web pages.

Python1.5k127Updated 1 day ago

archivesdockerpythonpywbwarcwaybackweb-archivingwebrecorder

webrecorder/archiveweb.page

A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

TypeScript1.4k99Updated 2 hours ago

archivingbrowser-extensionchromiumextensionwaczwarcweb-archivingwebrecorder

gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

JavaScript1.2k115Updated 1 day ago

archivingclicrawlerdenodockerfilenodejsscraping-websitessingle-fileweb-archivingweb-crawlerweb-scraperweb-scraping

bellingcat/auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

Python1.0k97Updated 1 day ago

archivedockeropen-source-researchpythonscrapingserviceweb-archiving

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

TypeScript993131Updated 16 hours ago

crawlercrawlingwaczwarcweb-archivingweb-crawlerwebrecorder

Ray-D-Song/web-archive

Free web archiving and sharing service based on Cloudflare. 跑在 Cloudflare 上的免费网页归档和分享工具。

TypeScript921297Updated 12 hours ago

cloudflarecloudflare-pagesd1freehonoself-hostedserverlessweb-archiveweb-archiving

webrecorder/replayweb.page

Serverless replay of web archives directly in the browser

TypeScript91889Updated 13 hours ago

replay-web-pageservice-workerwaczwarcwayback-machineweb-archiveweb-archivingweb-replay

eclaire-labs/eclaire

Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.

TypeScript82082Updated 2 days ago

aiai-assistantautomationbookmark-managerbookmarksdata-extractiondocument-processingllmlocal-firstnote-takingocron-device-aiopen-sourcepersonal-knowledge-managementprivacyrest-apiself-hostedtask-managementweb-archiving

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

Python65041Updated 1 week ago

dockeripfsmementomemento-rfcpythonservice-workerwarcwaybackweb-archiving

akamhy/waybackpy

Wayback Machine API interface & a command-line tool

Python56540Updated 5 days ago

archive-webpagearchive-webpagescdx-apiinternet-archiveinternet-archivingosintsavepagenowwayback-machinewayback-machine-apiwayback-machine-pythonweb-archivingwebarchiving

harvard-lil/perma

Indelible links

JavaScript50185Updated 4 days ago

librariesweb-archiving

rahiel/archiveror

Archiveror will help you preserve the webpages you love. 💾

JavaScript45642Updated 21 hours ago

archivingbookmarkbrowser-extensionchrome-extensionfirefox-extensionjavascriptlinkrotmhtmlweb-archivingwebextension

webrecorder/warcio

Streaming WARC/ARC library for fast web archive IO

Python45167Updated 1 week ago

pythonpywbwarcweb-archivesweb-archiving

webrecorder/webrecorder-playerArchived

Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)

JavaScript44841Updated 3 days ago

electronpywbwarcweb-archivingwebrecorder

oduwsdl/archivenow

A Tool To Push Web Resources Into Web Archives

Python43240Updated 1 week ago

internet-archiveweb-archiving

ArchiveBox/archivebox-browser-extension

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

JavaScript42045Updated 1 day ago

archiveboxarchivingbrowser-extensionchrome-extensiondigipresdigital-preservationfirefox-extensioninternet-archivingsvelteweb-archiving

Florents-Tselai/WarcDB

WarcDB: Web crawl data as SQLite databases.

Python40510Updated 2 days ago

clicrawlingdatabasesqlitewarcweb-archivingweb-data

:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation

Roff39238Updated 3 days ago

guiheritrixopenwaybackpyinstallerpythonwarcwaybackweb-archiving

webrecorder/browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

TypeScript39064Updated 4 days ago

archivingcloudkuberneteswaczwarcweb-archiveweb-archivingwebrecorder

machawk1/warcreate

Chrome extension to "Create WARC files from any webpage"

JavaScript22815Updated 1 month ago

chrome-extensionwarcweb-archiving

commoncrawl/cdx_toolkit

A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine

Python20134Updated 12 hours ago

cdxcdx-apicommoncrawlpythonwarcweb-archivesweb-archiving

ArchiveBox/electron-archivebox

Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)

JavaScript18015Updated 1 month ago

archiveboxdesktopdesktop-electrondigipresdockerelectronguiinternet-archivinglinuxmacosweb-archivingwindows

gwu-libraries/sfm-ui

Social Feed Manager user interface application.

Python15726Updated 1 month ago

code4libsocial-feed-managersocial-mediaweb-archiving

helgeho/ArchiveSpark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

Scala15719Updated 1 week ago

archivesparkinternet-archivesparkspark-frameworkwarcweb-archivingwebarchive

programminghistorian/ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons

HTML150115Updated just now

apidata-managementdhdigital-historydigital-humanitiesdistant-readinglinked-open-datamappingmulti-lingualnetwork-analysisopen-educational-resourcesopen-sourcepedagogyprogramming-historianpythonr-studioweb-archivingweb-scraping

internetarchive/fatcat

Perpetual Access To The Scholarly Record

Python12118Updated 3 weeks ago

digital-libraryopen-accesspostgresqlpythonrustscholarly-communicationweb-archiving

maxcountryman/warc-parquet

🗄️ A simple CLI for converting WARC to Parquet.

Rust1131Updated 3 months ago

crawlingduckdbparquetwarcweb-archiving

Own-Data-Privateer/hoardy-web

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

Python11210Updated 13 hours ago

archivearchiverarchivingauto-savebackupsbrowser-extensioncliinternetinternet-archivingoffline-readingself-hostedsnapshotwayback-machineweb-archiveweb-archivingweb-browsingwebsite-archive

Page 1 of 4