Repos
83
Stars
1
Forks
1
Top Language
Scala
Loading contributions...
Top Repositories
Originally forked from Apache Spark, integrated with a simplified version of parameter server, supporting large-scale model training.
Testbench for experimenting with Apache Hive at any data scale.
hdp-configuration-utils
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
一个用go语言实现的三国slg游戏服务器demo
Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
Repositories
83Originally forked from Apache Spark, integrated with a simplified version of parameter server, supporting large-scale model training.
Testbench for experimenting with Apache Hive at any data scale.
hdp-configuration-utils
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
一个用go语言实现的三国slg游戏服务器demo
Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
Apache Hive
StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
Upserts, Deletes And Incremental Processing on Big Data.
Apache Iceberg
DataX集成可视化页面,选择数据源即可一键生成数据同步任务,支持批量创建RDBMS数据同步任务,集成开源调度系统,支持分布式、增量同步数据、实时查看运行日志、监控执行器资源、KILL运行进程、数据源信息加密等。
Unidirectional Data Flow in Swift - Inspired by Redux
Performance Analysis Tool
Factorization Machines on Spark and Glint
Brings SQL and AI together.
A sink to save Spark Structured Streaming DataFrame into Hive table
Spark Structured Streaming JDBC Sink
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
The OpenAPI SDK for PHP with Composer support
Web tool for Avro Schema Registry |
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Tranquility helps you send real-time event streams to Druid and handles partitioning, replication, service discovery, and schema rollover, seamlessly and without downtime.
A collection of system log datasets for massive log analysis
A toolkit for automated log parsing
A log analysis toolkit for automated anomaly detection
Qubole Sparklens tool for performance tuning Apache Spark
A tool for monitoring and tuning Spark jobs for efficiency.
🐌 useful scripts for making developer's everyday life easier and happier
The Apache Kafka C/C++ library
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.