96 results for “topic:big-data-processing”
Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.
Flink SQL 实战 -中文博客专栏
Yet Another SPark Framework
big data processing and machine learning platform,just like useing sql
🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
R for Big Data (Chinese Version)
GCP_Data_Enginner
Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow
Hybrid time-series and block-column storage database engine written in Java
A curated selection of tools, libraries and services that help tame your dataflow to productively build ambitious, data driven & reactive applications on a streaming lakehouse
End-to-end data pipeline transforming Olist e-commerce data through Azure cloud services. Implements medallion architecture (Bronze-Silver-Gold) with multi-source ingestion, Spark-based processing, and OLTP-to-OLAP optimization for analytics-ready datasets.
No description provided.
Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.
Github Repository for a versatile usable Big Data infrastructure (AVUBDI)
全球电信资源分布不均衡指数刻画
Implementation of algorithms for big data using python, numpy, pandas.
Introduction to Spark Batch processing.
Here I demonstrate the performance difference between the Poisson and the classic bootstrap by estimating the confidence interval for the difference of CTRs of the two user groups
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
From traffic sensors to smarter cities: real-time congestion prediction with Kafka, Spark, LSTM, XGBoost, and dynamic routing powered by graph algorithms.
Analysis, organization and querying of large genomic datasets using C++, Monsoon and various data structures.
Big Data and AI Engineering bootcamp 2nd capstone project. Using Big Data Tools to predict the probability of university enrollment for Egypt's High School students. :school: :books: :microscope:
The following readme file, assume that before running the Spark analytic job, you have already installed the correct versions of **Java**, **Hadoop**, **Spark** and that you are inside **Ubuntu**.
Crack Detection model using yolov7
rock-solid pillars for enterprise-grade solutions
excel, markdown, csv, sql 数据源批量/单独格式互相转换