DigitalPebble Ltd

DigitalPebble

Bristol, UK

http://www.digitalpebble.com

Languages

Java68%Shell12%Rust8%SCSS4%Dockerfile4%FLUX4%

Repos

29

Stars

428

Forks

108

Top Language

Java

Loading contributions...

Top Repositories

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

TextClassification

A Text Classification API in Java originally developed by DigitalPebble Ltd. The API is independent from the ML implementations used and can be used as a front end to various ML algorithms. libSVM and liblinear are currently embedded.

SPRUCE is an open-source enrichment platform for GreenOps which helps measure and reduce the environmental impact of cloud computing.

stormcrawler-docker

Resources for running StormCrawler with Docker services

stormcrawlerfight

Crawl configurations for benchmarking / testing StormCrawler

textclassification-examples

Use cases for DigitalPebble's TextClassification API

Repositories

29

DigitalPebble/spruce

SPRUCE is an open-source enrichment platform for GreenOps which helps measure and reduce the environmental impact of cloud computing.

Java204Updated just now

apache-sparkawscarbon-emissionsclimatecloudgreenopsgreensoftwareopen-sourcesustainability

DigitalPebble/ginkgo

Estimate the environmental impact of GitHub Actions for your entire organization.

Rust10Updated 3 weeks ago

carbon-emissionsgithub-actionsgreenopsopen-sourcesustainability

DigitalPebble/digitalpebble.github.io

Resources for the DigitalPebble website

SCSS00Updated 2 weeks ago

DigitalPebble/behemothArchived

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Java28459Updated 7 years ago

hadoopjavamapreducenlp

DigitalPebble/stormcrawler-docker

Resources for running StormCrawler with Docker services

Dockerfile104Updated 1 year ago

apache-stormdockerstormcrawler

DigitalPebble/ngrams-apiArchived

Java API for querying a N-Grams corpus. Uses Lucene for searching and indexing from the Google Web-1T format

Java52Updated 13 years ago

DigitalPebble/carbonaraArchived

Enrichment pipeline for CUR / FOCUS reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.

Java50Updated 8 months ago

apachesparkawscarbon-emissionsclimatecloudfocusgreenopsgreensoftwaresustainability

DigitalPebble/crawlurlfrontierArchived

Crawl config used to test URL Frontier on a large scale and produce WARCs for CommonCrawl.

FLUX10Updated 1 year ago

DigitalPebble/stormcrawlerfight

Crawl configurations for benchmarking / testing StormCrawler

Shell105Updated 6 years ago

DigitalPebble/tika-detector-stormcrawler

Wraps the charset detection logic from StormCrawler as a Tika module

Java01Updated 2 years ago

DigitalPebble/tikaFork

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

00Updated 2 years ago

DigitalPebble/benchmark

StormCrawler topology to evaluate the performance of different backends and configurations

Shell00Updated 8 months ago

benchmarkelasticsearchopensearchstormcrawler

DigitalPebble/docsFork

Documentation for Docker Official Images in docker-library

00Updated 2 years ago

DigitalPebble/urlfrontier-client

URLFrontier client written in Rust (mostly as a way of learning Rust)

Rust10Updated 3 years ago

grpcrusturl-frontierurlfrontierwebcrawler

DigitalPebble/nutchFork

Apache Nutch is an extensible and scalable web crawler

Java10Updated 2 years ago

DigitalPebble/ansible-storm

Ansible playbook for deploying a Storm cluster

71Updated 2 years ago

ansibleplaybookstormstormcrawler

DigitalPebble/TextClassification

A Text Classification API in Java originally developed by DigitalPebble Ltd. The API is independent from the ML implementations used and can be used as a front end to various ML algorithms. libSVM and liblinear are currently embedded.

Java4822Updated 4 years ago

DigitalPebble/stormFork

Mirror of Apache Storm

Java00Updated 1 year ago

DigitalPebble/behemoth-elasticsearchArchived

ElasticSearch module for Behemoth

Java10Updated 12 years ago

DigitalPebble/behemoth-textclassificationArchived

Module for classifying Behemoth documents with a model from our Text Classification API

Java10Updated 13 years ago

DigitalPebble/behemoth-commoncrawlArchived

Support for old (pre 2013) CommonCrawl dataset in Behemoth

Java40Updated 10 years ago

DigitalPebble/TextClassificationPluginArchived

GATE Processing Resource wrapping DigitalPebble's TextClassification API

Java53Updated 13 years ago

DigitalPebble/tescobankArchived

Setup for crawling tescobank with SC

Java42Updated 10 years ago

DigitalPebble/crawler4j-frontier-battleFork

No description provided.

Java00Updated 3 years ago

DigitalPebble/textclassification-examples

Use cases for DigitalPebble's TextClassification API

Java103Updated 10 years ago

DigitalPebble/crawler-commonsFork

A set of reusable Java components that implement functionality common to any web crawler

Java41Updated 8 years ago

DigitalPebble/sc-warc

WARC resources for StormCrawler

21Updated 9 years ago

DigitalPebble/tika-cc

resources for generating a corpus of docs from CC for Tika

Shell00Updated 11 years ago

DigitalPebble/NutchFight

Resources for comparison between 1.8 and 2.x of Apache Nutch

Java40Updated 11 years ago

Gists

Recent Activity

DigitalPebble Ltd (DigitalPebble) | GitHunt