"topic:big-data-processing" — Search

96 results for “topic:big-data-processing”

Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.

Jupyter Notebook5851Updated 5 months ago

big-databig-data-analyticsbig-data-architecturebig-data-processing

souvik-databricks/dlt-with-debug

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

Python509Updated 3 years ago

big-databig-data-processingdatabricksdelta-live-tablesdltetletl-pipelinepython3spark

airscholar/FlinkCommerce

This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres

Java4930Updated 2 years ago

apache-flinkbig-databig-data-processingpythonrealtime-streaming

felipefrizzo/terraform-aws-kinesis-firehose

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

HCL2622Updated 4 years ago

analyticsbig-databig-data-processingcloudwatch-logsetl-jobkinesis-firehoseparquetterraformterraform-awsterraform-provider

eskimo-sh/eskimo

Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.

Java257Updated 2 years ago

big-databig-data-analyticsbig-data-platformbig-data-processingbig-data-projectsbigdatacerebrocluster-managementelasticsearchflinkglusterglusterfskafkakibanakuberneteskubernetes-clusterkubernetes-setupsparkwebconsolezeppelin

StarPlatinumStudio/Flink-SQL-Practice

Flink SQL 实战 -中文博客专栏

Java166Updated 3 years ago

apache-flinkbig-data-processingsqlstream-processing

giucris/yasp

Yet Another SPark Framework

Scala101Updated 3 years ago

big-databig-data-processingeltetletl-frameworketl-pipelineframeworkscalasparksparksql

pyajs/veronica

big data processing and machine learning platform，just like useing sql

Python100Updated 1 year ago

big-data-processingmachine-learning-platformpysparkpython3sqlxql

impresso/impresso-text-acquisition

🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.

Jupyter Notebook93Updated 2 days ago

big-data-processinghistorical-newspapersimpresso-project

hope-data-science/R4BD

R for Big Data (Chinese Version)

R82Updated 8 months ago

big-databig-data-analytics-techniquesbig-data-processingr

anjijava16/GCP_Data_Enginner_Utils

GCP_Data_Enginner

Shell81Updated 4 years ago

big-data-processingbigquerydataflowdataprocdataproc-clustergcpgcp-storagenotebookpubsubpysparkpythonscalashell-script

bdnf/BigData-Engineering-Projects

Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow

Jupyter Notebook71Updated 6 years ago

airflowbig-data-analyticsbig-data-processingcassandradata-lakedata-warehouseredshiftspark

akardapolov/dimension-db

Hybrid time-series and block-column storage database engine written in Java

Java71Updated 1 week ago

big-data-processingcolumn-storedbmsjavasqltime-series

tabletop-labs/tabletop

A curated selection of tools, libraries and services that help tame your dataflow to productively build ambitious, data driven & reactive applications on a streaming lakehouse

Go60Updated 2 years ago

big-databig-data-analyticsbig-data-processingelasticscalingkafkamicroservicesmodern-data-stackreal-timesemi-structured-cloud-warehousestream-processingtimetravel

DHANA5982/Azure-Powered-Data-Lakehouse-and-ETL-Pipeline

End-to-end data pipeline transforming Olist e-commerce data through Azure cloud services. Implements medallion architecture (Bronze-Silver-Gold) with multi-source ingestion, Spark-based processing, and OLTP-to-OLAP optimization for analytics-ready datasets.

Jupyter Notebook50Updated 4 months ago

apache-sparkazure-data-factotyazure-data-lake-storage-gen2azure-synapse-analyticsbig-data-processingdata-engineeringdata-pipeline-automationdatabricksdistributed-computingdistributed-systemsecommerceetl-pipelinekpi-dashboardmedallion-architectureparellel-processingpysaprk

theGuyWithBlackTie/electricChargingStations

No description provided.

Jupyter Notebook40Updated 4 years ago

big-databig-data-processingcharging-stationselectric-vehiclesspark-ml

vvittis/FlinkSampling

Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.

Java41Updated 2 years ago

apache-flinkbig-data-analyticsbig-data-processinggroup-byjavareservoir-samplingsamplingstratumstreaming-datastreaming-tuplestopic

software-competence-center-hagenberg/AVUBDI

Github Repository for a versatile usable Big Data infrastructure (AVUBDI)

Shell31Updated 5 years ago

big-data-platformbig-data-processingdockerdocker-composedocker-swarmkafkasparktemplate-project

chuanting/imbalance_index

全球电信资源分布不均衡指数刻画

HTML30Updated 4 years ago

6gbig-data-processingconnect-the-unconnecteddigital-divide

kochlisGit/Big-Data-Algorithms

Implementation of algorithms for big data using python, numpy, pandas.

Python31Updated 5 years ago

a-prioribig-data-processingbloom-filterfrequent-itemset-miningfrequent-itemsetslshlsh-algorithmmin-hasingmultihash-pcymultistage-pcypcypythonshinglingsimilar-itemsstream-miningstreams

mtumilowicz/big-data-scala-spark-batch-workshop

Introduction to Spark Batch processing.

Scala31Updated 1 year ago

batch-processingbig-databig-data-processingsparkspark-sqlworkshopworkshop-materials

VladOnMyOwn/ctr-poisson-bootstrap

Here I demonstrate the performance difference between the Poisson and the classic bootstrap by estimating the confidence interval for the difference of CTRs of the two user groups

Jupyter Notebook32Updated 3 years ago

ab-testab-testingab-testsbig-databig-data-processingbootstrapclick-through-ratepoisson-bootstrappythonstatistical-testsstatistics

chandnii7/Big-Data-Processing-Pipeline

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

Python34Updated 4 years ago

big-databig-data-processingdata-analyticsdata-processing-pipelinesdata-visualizationkafkakafka-consumerkafka-producerkafka-streamingmongodbnosql-databasetableautwitter-apizookeeper

devarshpatel1506/smart_traffic_routing

From traffic sensors to smarter cities: real-time congestion prediction with Kafka, Spark, LSTM, XGBoost, and dynamic routing powered by graph algorithms.

Python31Updated 5 months ago

algorithmsbig-databig-data-processingdata-engineeringkafkamachine-learningoptimization-algorithmssparkstreaming-datastreamlitvisualization

Anirban166/Big-Data-ft.-Genomics

Analysis, organization and querying of large genomic datasets using C++, Monsoon and various data structures.

C++20Updated 3 years ago

big-data-processingbioinformaticsdata-structures-and-algorithmsgenomic-sequences

RghdE/CapstoneTwo_EducationalLandscape

Big Data and AI Engineering bootcamp 2nd capstone project. Using Big Data Tools to predict the probability of university enrollment for Egypt's High School students. :school: :books: :microscope:

Jupyter Notebook20Updated 2 years ago

apache-pigbig-databig-data-analyticsbig-data-processingbig-data-projectsbig-data-visualizationdata-sciencemachine-learningpyspark

JKA098/Pokemon-Feistiness-Apache-Spark-Job

The following readme file, assume that before running the Spark analytic job, you have already installed the correct versions of **Java**, **Hadoop**, **Spark** and that you are inside **Ubuntu**.

Python20Updated 10 months ago

apache-sparksqlbatch-processingbig-data-processingcluster-computingdata-analytics-projectdata-pipelinedistributed-computinghadoop-mapreducejavalinux-environmentopen-dataubuntu

zaid-24/Crack-Detection-using-CNN

Crack Detection model using yolov7

Jupyter Notebook21Updated 2 years ago

big-data-processingcnnpythonpytorchyolov7

IncredibleProgress/sweetheart.pyArchived

rock-solid pillars for enterprise-grade solutions

Python20Updated 2 years ago

big-data-processingjupyternginx-unitpy-scriptpythonrethinkdbrhelrust-langtailwindcssubuntuvue

JamesHanZhang/table-data-format-transform-app

excel, markdown, csv, sql 数据源批量/单独格式互相转换

Python20Updated 2 years ago

big-data-processingcsv-to-excelcsv-to-sqldata-cleaning-pipelinedata-preprocessingeasy-to-useetl-frameworkexcel-to-mdmultifileupload

Page 1 of 4