Repos
20
Stars
3
Forks
4
Top Language
Java
Loading contributions...
Top Repositories
Run in all nodes of your cluster before the cluster starts - let's you customize your cluster
Provides a Spark backend for executing Dataflow pipelines.
The interoperable, open source catalog for Apache Iceberg
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
Apache Parquet
Repositories
20The interoperable, open source catalog for Apache Iceberg
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
Apache Parquet
Run in all nodes of your cluster before the cluster starts - let's you customize your cluster
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
No description provided.
Provides a Spark backend for executing Dataflow pipelines.
Mirror of Apache Hadoop
A massively spiffy yet delicately unobtrusive compression library.
Examples of how to use Cloud Bigtable both with GCE map/reduce as well as stand alone applications.
A skeleton for creating Python applications using the Flask framework on App Engine
Kaggle 2nd annual data science bowl
Repository with examples and smoke tests for the GCP Airflow operators and hooks
Mirror of Apache Hive
Mirror of Apache Bigtop
Mirror of Apache Zeppelin
Mirror of Apache HBase
Mirror of Apache Spark
CSV data source for Spark SQL and DataFrames
Codelabs in various languages demonstrating usage of several tools & systems upon genomics data.