GitHunt — Discover GitHub Repositories

5,292 results for “topic:big-data”

binhnguyennus/awesome-scalability

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

69.2k6.9kUpdated just now

architectureawesomeawesome-listbackendbig-data+15

ClickHouse/ClickHouse

ClickHouse® is a real-time analytics database management system

C++46.2k8.2kUpdated 2 hours ago

aianalyticsbig-dataclickhousecloud-native+12

apache/spark

Apache Spark - A unified analytics engine for large-scale data processing

Scala42.9k29.1kUpdated 3 hours ago

big-datajavajdbcpythonr+3

donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python28.9k8.0kUpdated 1 hour ago

awsbig-datacaffedata-sciencedeep-learning+14

apache/flink

Apache Flink

Java25.8k13.9kUpdated 1 hour ago

big-dataflinkjavapythonscala+1

thingsboard/thingsboard

Open-source IoT Platform - Device management, data collection, processing and visualization.

Java21.3k6.2kUpdated 4 hours ago

big-datacloudcoap-serverdashboardshttp+13

amark/gun

An open source cybersecurity protocol for syncing decentralized graph data.

JavaScript19.0k1.2kUpdated 13 hours ago

artificial-intelligencebig-datablockchaincrdtcrypto+15

heibaiying/BigData-Notes

大数据入门指南 :star:

Java16.9k4.3kUpdated 18 hours ago

azkabanbig-databigdataflumehadoop+12

prestodb/presto

The official home of the Presto distributed SQL query engine for big data

Java16.7k5.5kUpdated 6 hours ago

big-datadatahadoophivejava+4

andkret/Cookbook

The Data Engineering Cookbook

Python15.0k2.7kUpdated 10 hours ago

best-practicesbig-datacookbookdata-engineerdata-engineering

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java12.6k3.5kUpdated just now

analyticsbig-datadata-sciencedatabasedatabases+14

apache/predictionio

PredictionIO, a machine learning server for developers and ML engineers.

Scala12.5k1.9kUpdated 23 hours ago

big-datapredictionioscala

vesoft-inc/nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability

C++12.1k1.3kUpdated 1 hour ago

big-datacppdatabasedistributeddistributed-systems+9

yahoo/CMAK

CMAK is a tool for managing Apache Kafka clusters

Scala11.9k2.5kUpdated 2 days ago

big-datacluster-managementkafkascala

provectus/kafka-ui

Open-Source Web UI for Apache Kafka Management

Java11.9k1.4kUpdated 22 hours ago

apache-kafkabig-datacluster-managementevent-streaminghacktoberfest+13

StarRocks/starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java11.4k2.3kUpdated 11 hours ago

analyticsbig-datacloudnativedatabasedatalake+15

quickwit-oss/quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Rust10.9k521Updated 7 hours ago

big-datacloud-nativecloud-storagedistributed-tracinglog-management+5

cython/cython

The most widely used Python to C compiler

Cython10.6k1.6kUpdated 9 hours ago

big-dataccppcpythoncpython-extensions+3

catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

C++8.8k1.3kUpdated 19 hours ago

big-datacatboostcategorical-featurescoremlcuda+13

delta-io/delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala8.6k2.0kUpdated 1 hour ago

acidanalyticsbig-datadelta-lakespark

apache/beam

Apache Beam is a unified programming model for Batch and Streaming data processing.

Java8.5k4.5kUpdated 1 hour ago

batchbeambig-datagolangjava+3

apache/datafusion

Apache DataFusion SQL Query Engine

Rust8.5k2.0kUpdated 13 hours ago

arrowbig-datadataframedatafusionolap+4

h2oai/h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Jupyter Notebook7.5k2.0kUpdated 1 day ago

automlbig-datadata-sciencedeep-learningdistributed+15

arkime/arkime

Arkime is an open source, large scale, full packet capturing, indexing, and database system.

C7.3k1.1kUpdated 6 hours ago

big-datacjavascriptnetwork-monitoringnsm+3

apache/couchdb

Seamless multi-primary syncing database with an intuitive HTTP/JSON API, designed for reliability

Erlang6.8k1.1kUpdated 6 hours ago

big-datacloudcontentcouchdbdatabase+5

vespa-engine/vespa

AI + Data, online. https://vespa.ai

Java6.8k700Updated 9 hours ago

aibig-datajavamachine-learningrag+9

feast-dev/feast

The Open Source Feature Store for AI/ML

Python6.8k1.2kUpdated 8 hours ago

big-datadata-engineeringdata-qualitydata-sciencefeature-store+5

apache/zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Java6.6k2.8kUpdated 1 day ago

big-datadatabaseflinkjavajavascript+4

hazelcast/hazelcast

Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.

Java6.6k1.9kUpdated 16 hours ago

big-datacachingdata-in-motiondata-insightsdistributed+10

apache/iotdb

Apache IoTDB

Java6.3k1.1kUpdated 20 hours ago

big-datadatabaseiotjavanosql+2

Page 1 of 34