GitHunt

Sebastian Nagel

sebastian-nagel

Languages

Python39%Java28%Jupyter Notebook11%Shell6%HTML6%FLUX6%JavaScript6%

Repos

63

Stars

25

Forks

7

Top Language

Python

Loading contributions...

Top Repositories

Repositories

63
SE
sebastian-nagel/nutch-test-single-node-cluster

No description provided.

Shell52Updated 3 weeks ago
SE
sebastian-nagel/nutchFork

Mirror of Apache Nutch

Java20Updated 1 month ago
SE
sebastian-nagel/PyAthenaFork

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.

Python00Updated 4 years ago
SE
sebastian-nagel/webarchive-commonsFork

Common web archive utility code.

Java00Updated 1 month ago
SE
sebastian-nagel/jwarcFork

Java library for reading and writing WARC files with a typed API

Java00Updated 1 month ago
SE
sebastian-nagel/cld2Fork

Compact Language Detector 2

00Updated 4 years ago
SE
sebastian-nagel/compact_enc_detFork

compact_enc_det - Compact Encoding Detection

00Updated 2 years ago
SE
sebastian-nagel/konstanz-in-zahlenFork

Konstanz in Zahlen: Jährliche Zahlen und Fakten zur Stadt Konstanz

00Updated 4 months ago
SE
sebastian-nagel/sitemap-performance-test

No description provided.

Java00Updated 3 months ago
SE
sebastian-nagel/crawler-commonsFork

A set of reusable Java components that implement functionality common to any web crawler

Java00Updated 3 weeks ago
SE
sebastian-nagel/docker-hadoopFork

Apache Hadoop docker image

33Updated 4 years ago
SE
sebastian-nagel/ossym2022-robotstxt-experimentsArchived

Experiments and metrics about robots.txt captures, presentation at #ossym2022

Jupyter Notebook00Updated 3 years ago
robots-txtrobotstxtuser-agents
SE
sebastian-nagel/surtFork

Sort-friendly URI Reordering Transform (SURT) python module

Python00Updated 1 year ago
SE
sebastian-nagel/zip2gzFork

Create a file tree with the raw data from a zip file in usable format

Python00Updated 1 year ago
SE
sebastian-nagel/nutch-1Fork

No description provided.

10Updated 4 years ago
SE
sebastian-nagel/pga-declarationsFork

Declarations of terms of major social media platforms. Maintained by the Platform Governance Archive team, University of Bremen.

00Updated 1 year ago
SE
sebastian-nagel/sfm-facebook-harvesterForkArchived

No description provided.

Python00Updated 3 years ago
SE
sebastian-nagel/selenium_test_demo_tu2txtFork

No description provided.

00Updated 1 year ago
SE
sebastian-nagel/storm-crawlerFork

Web crawler SDK based on Apache Storm

HTML10Updated 3 months ago
SE
sebastian-nagel/warc-crawler

Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr

FLUX81Updated 2 years ago
apache-stormelasticsearchsolrstormcrawlerwarcwarc-filesweb-archives
SE
sebastian-nagel/cc2datasetFork

Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...

00Updated 3 years ago
SE
sebastian-nagel/news-pleaseFork

news-please - an integrated web crawler and information extractor for news that just works.

Python11Updated 3 years ago
SE
sebastian-nagel/duckdb-webFork

DuckDB-Web - Source code of duckdb.org

00Updated 3 years ago
SE
sebastian-nagel/simhash-pyFork

Simhash and near-duplicate detection

10Updated 6 years ago
SE
sebastian-nagel/twarc-csvFork

A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV.

00Updated 3 years ago
SE
sebastian-nagel/introduction-to-python

No description provided.

Jupyter Notebook20Updated 4 years ago
SE
sebastian-nagel/sfm-instagram-harvesterFork

No description provided.

Python00Updated 3 years ago
SE
sebastian-nagel/sfm-web-harvester-browsertrix

No description provided.

Python00Updated 2 years ago
SE
sebastian-nagel/browsertrix-crawlerFork

Run a high-fidelity browser-based crawler in a single Docker container

JavaScript10Updated 3 years ago
SE
sebastian-nagel/wdc-pageFork

This repository contains the source files of the Web Data Commons website and is used to maintain the site. The Web Data Commons project extracts structured data from the Common Crawl

00Updated 3 years ago

Gists

Recent Activity

Sebastian Nagel (sebastian-nagel) | GitHunt