grab-site

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic igno

Python1233other

3 months ago

archivingcrawlcrawler

jwarc

Java library for reading and writing WARC files with a typed API

Java43apache-2.0

2 months ago

warcprox

warcprox

WARC writing MITM HTTP/S proxy

Python357

6 months ago

solrwayback

solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer fram

Java91apache-2.0

2 months ago

html2warc

simple script to convert web resources to a single warc file

Python14mit

11 months ago

warcio

Streaming WARC/ARC library for fast web archive IO

Python335apache-2.0

5 months ago

pythonpywbwarc

webarchive-discovery

WARC and ARC indexing and discovery tools.

Java114

4 months ago

wasapi-downloader

Java application to download WARCs from WASAPI

Java6other

3 months ago

applicationinfrastructurejava

node-warc

Parse And Create Web ARChive (WARC) files with node.js

JavaScript90mit

last year

chrome-remote-interfacepupeteerwarc

warcat

Tool and library for handling Web ARChive (WARC) files.

Python134gpl-3.0

last year

python

Web2Warc

An easy-to-use and highly customizable crawler that enables you to create your o

Scala24mit

7 years ago

warc2html

warc2html

Converts WARC files to static HTML

Java36apache-2.0

last year

warctools

Command line tools and libraries for handling and manipulating WARC files (and H

Python140mit

4 years ago

httrack2warc

Converts HTTrack crawls to WARC files

Java27apache-2.0

2 years ago

web-archiving

har2warc

Convert HTTP Archive (HAR) -> Web Archive (WARC) format

Python41apache-2.0

6 years ago

webarchive-indexing

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

Python40mit

6 years ago

webarchive

golang readers for ARC and WARC webarchive formats

Go17apache-2.0

last year