5,292 results for “topic:big-data”
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
ClickHouse® is a real-time analytics database management system
Apache Spark - A unified analytics engine for large-scale data processing
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Apache Flink
Open-source IoT Platform - Device management, data collection, processing and visualization.
An open source cybersecurity protocol for syncing decentralized graph data.
大数据入门指南 :star:
The official home of the Presto distributed SQL query engine for big data
The Data Engineering Cookbook
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
PredictionIO, a machine learning server for developers and ML engineers.
A distributed, fast open-source graph database featuring horizontal scalability and high availability
CMAK is a tool for managing Apache Kafka clusters
Open-Source Web UI for Apache Kafka Management
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
The most widely used Python to C compiler
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Apache Beam is a unified programming model for Batch and Streaming data processing.
Apache DataFusion SQL Query Engine
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Arkime is an open source, large scale, full packet capturing, indexing, and database system.
Seamless multi-primary syncing database with an intuitive HTTP/JSON API, designed for reliability
AI + Data, online. https://vespa.ai
The Open Source Feature Store for AI/ML
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
Apache IoTDB