Repos
39
Stars
371
Forks
277
Top Language
Python
Loading contributions...
Top Repositories
python implementation of the parquet columnar file format.
examples of running Spark and Scalding jobs over Avro data.
A Scala productivity framework for Hadoop.
Hue is a browser-based desktop interface for interacting with Hadoop. It supports a file browser, job tracker interface, cluster health monitor, and more.
John Langford's original release of Vowpal Wabbit -- a fast online learning algorithm
Job scheduler
Repositories
39python implementation of the parquet columnar file format.
No description provided.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The Pants Build System
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
A Pure Python Protobuf Parser
Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.
Hue is a browser-based desktop interface for interacting with Hadoop. It supports a file browser, job tracker interface, cluster health monitor, and more.
examples of running Spark and Scalding jobs over Avro data.
Chef Cookbook for Docker
Apache Druid: a high performance real-time analytics database.
Keep your code spotless
Twitter common libraries for python and the JVM
Datadog Agent
John Langford's original release of Vowpal Wabbit -- a fast online learning algorithm
The Parquet site.
No description provided.
A Scala productivity framework for Hadoop.
Kafka protocol support in Python
NFS IOSTAT Module for NewRelic
Schema registry for Kafka
Datadog library for Scala
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
Python interface to Amazon Web Services
A release plugin for sbt (>= 0.11.0)
An AWS SDK-backed FileSystem driver for Hadoop
Job scheduler
Metrics module for Play2
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Mirror of Apache Avro