GitHunt

Welcome to WebGraph!

Maven Central
javadoc

Introduction

WebGraph is a framework for graph compression aimed at studying web
graphs. It provides simple ways to manage very large graphs, exploiting
modern compression techniques. More precisely, it is currently made of:

  1. A set of flat codes, called ζ codes, which are particularly
    suitable for storing web graphs (or, in general, integers with power-law
    distribution in a certain exponent range). The fact that these codes work
    well can be easily tested empirically, but we also try to provide a
    detailed mathematical
    analysis
    .

  2. Algorithms for compressing web graphs that exploit gap compression and
    referentiation (à la LINK),
    intervalisation and ζ codes to provide a high compression ratio (see our
    datasets
    ). The algorithms are
    controlled by several parameters, which provide different tradeoffs
    between access speed and compression ratio.

  3. Algorithms for accessing a compressed graph without actually
    decompressing it, using lazy techniques that delay the decompression until
    it is actually necessary.

  4. Algorithms for analysing very large graphs, such as
    HyperBall
    which has been used to show that Facebook has just four degrees of
    separation
    .

  5. An implementation of the algorithms above in Java and
    Rust distributed under either the GNU
    Lesser General Public License
    2.1+
    or the
    Apache Software License
    2.0
    . Besides a clearly
    defined API, we also provide several classes tha modify (e.g., transpose)
    or recompress a graph, so to experiment with various settings.

  6. Datasets for large graph. These are either
    gathered from public sources (such as
    WebBase), or
    produced by UbiCrawler and BUbiNG.

In the end, with WebGraph you can access and analyse very large web
graphs. Using WebGraph is as easy as installing a few jar files and
downloading a dataset. This makes studying phenomena such as PageRank,
distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! If you find our software
useful for your research, please quote this
paper
.

This version of WebGraph is limited to graphs with at most 2³¹ nodes. For
larger graphs, have a look at the big
version
.

JGraphT

JGraphT has a
few WebGraph adapters.

Hadoop

Helge Holzmann has developed an input
format for Hadoop
for graphs
in BVGraph format.

WebGraph++

Jacob Ratkievicz has developed a C++ version of
WebGraph
that you might
want to try. The library exposes a
BVGraph as an object of the
Boost Graph Library, so
it is easily integrable with other code.

pyWebgraph

Massimo Santini has developed a front-end
that interfaces Jython with
WebGraph
. It makes exploring
small portions of very large graphs very easy and interactive.

Papers

  • A detailed description of
    the compression algorithms used in WebGraph, published in the proceedings
    of the Thirteenth International World–Wide Web
    Conference
    .

  • A mathematical analysis
    of the performance of γ, δ and ζ codes against power-law distributions.

  • Some quite surprising
    experiments
    showing that the
    transpose graph reacts very peculiarly to compression after
    lexicographical or Gray-code sorting.

  • A paper about
    HyperBall
    (then named HyperANF), our tool for computing an approximation of the
    neighbourhood function, reachable nodes and geometric centralities of
    massive graphs. More information can be found in this
    preprint.

  • HyperBall was used to
    find out that on average there are just four degrees of
    separation
    on
    Facebook, and the experiment was reported by the
    New York
    Times
    .
    Alas, the degrees were actually 3.74 (one less than the average
    distance
    ), but the off-by-one
    between graph theory (“distance”) and sociology (“degrees of separation”)
    generated a lot of confusion.

  • A paper were we
    describe our efforts to compress one of the largest social graphs
    available—the graph of commits of the Software Heritage archive.

  • A paper about our effort
    to bring WebGraph to Rust.

Languages

Java97.2%Ruby1.1%HTML0.8%C0.7%Shell0.2%Makefile0.1%

Contributors

GNU Lesser General Public License v2.1
Created January 14, 2021
Updated February 25, 2026