GitHunt

DigitalPebble Ltd

DigitalPebble

Languages

Java68%Shell12%Rust8%SCSS4%Dockerfile4%FLUX4%

Repos

29

Stars

428

Forks

108

Top Language

Java

Loading contributions...

Top Repositories

Repositories

29
DI
DigitalPebble/spruce

SPRUCE is an open-source enrichment platform for GreenOps which helps measure and reduce the environmental impact of cloud computing.

Java204Updated just now
apache-sparkawscarbon-emissionsclimatecloudgreenopsgreensoftwareopen-sourcesustainability
DI
DigitalPebble/ginkgo

Estimate the environmental impact of GitHub Actions for your entire organization.

Rust10Updated 3 weeks ago
carbon-emissionsgithub-actionsgreenopsopen-sourcesustainability
DI
DigitalPebble/digitalpebble.github.io

Resources for the DigitalPebble website

SCSS00Updated 2 weeks ago
DI
DigitalPebble/behemothArchived

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Java28459Updated 7 years ago
hadoopjavamapreducenlp
DI
DigitalPebble/stormcrawler-docker

Resources for running StormCrawler with Docker services

Dockerfile104Updated 1 year ago
apache-stormdockerstormcrawler
DI
DigitalPebble/ngrams-apiArchived

Java API for querying a N-Grams corpus. Uses Lucene for searching and indexing from the Google Web-1T format

Java52Updated 13 years ago
DI
DigitalPebble/carbonaraArchived

Enrichment pipeline for CUR / FOCUS reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.

Java50Updated 8 months ago
apachesparkawscarbon-emissionsclimatecloudfocusgreenopsgreensoftwaresustainability
DI
DigitalPebble/crawlurlfrontierArchived

Crawl config used to test URL Frontier on a large scale and produce WARCs for CommonCrawl.

FLUX10Updated 1 year ago
DI
DigitalPebble/stormcrawlerfight

Crawl configurations for benchmarking / testing StormCrawler

Shell105Updated 6 years ago
DI
DigitalPebble/tika-detector-stormcrawler

Wraps the charset detection logic from StormCrawler as a Tika module

Java01Updated 2 years ago
DI
DigitalPebble/tikaFork

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

00Updated 2 years ago
DI
DigitalPebble/benchmark

StormCrawler topology to evaluate the performance of different backends and configurations

Shell00Updated 8 months ago
benchmarkelasticsearchopensearchstormcrawler
DI
DigitalPebble/docsFork

Documentation for Docker Official Images in docker-library

00Updated 2 years ago
DI
DigitalPebble/urlfrontier-client

URLFrontier client written in Rust (mostly as a way of learning Rust)

Rust10Updated 3 years ago
grpcrusturl-frontierurlfrontierwebcrawler
DI
DigitalPebble/nutchFork

Apache Nutch is an extensible and scalable web crawler

Java10Updated 2 years ago
DI
DigitalPebble/ansible-storm

Ansible playbook for deploying a Storm cluster

71Updated 2 years ago
ansibleplaybookstormstormcrawler
DI
DigitalPebble/TextClassification

A Text Classification API in Java originally developed by DigitalPebble Ltd. The API is independent from the ML implementations used and can be used as a front end to various ML algorithms. libSVM and liblinear are currently embedded.

Java4822Updated 4 years ago
DI
DigitalPebble/stormFork

Mirror of Apache Storm

Java00Updated 1 year ago
DI
DigitalPebble/behemoth-elasticsearchArchived

ElasticSearch module for Behemoth

Java10Updated 12 years ago
DI
DigitalPebble/behemoth-textclassificationArchived

Module for classifying Behemoth documents with a model from our Text Classification API

Java10Updated 13 years ago
DI
DigitalPebble/behemoth-commoncrawlArchived

Support for old (pre 2013) CommonCrawl dataset in Behemoth

Java40Updated 10 years ago
DI
DigitalPebble/TextClassificationPluginArchived

GATE Processing Resource wrapping DigitalPebble's TextClassification API

Java53Updated 13 years ago
DI
DigitalPebble/tescobankArchived

Setup for crawling tescobank with SC

Java42Updated 10 years ago
DI
DigitalPebble/crawler4j-frontier-battleFork

No description provided.

Java00Updated 3 years ago
DI
DigitalPebble/textclassification-examples

Use cases for DigitalPebble's TextClassification API

Java103Updated 10 years ago
DI
DigitalPebble/crawler-commonsFork

A set of reusable Java components that implement functionality common to any web crawler

Java41Updated 8 years ago
DI
DigitalPebble/sc-warc

WARC resources for StormCrawler

21Updated 9 years ago
DI
DigitalPebble/tika-cc

resources for generating a corpus of docs from CC for Tika

Shell00Updated 11 years ago
DI
DigitalPebble/NutchFight

Resources for comparison between 1.8 and 2.x of Apache Nutch

Java40Updated 11 years ago

Gists

Recent Activity

DigitalPebble Ltd (DigitalPebble) | GitHunt