112 results for “topic:dataproc”
Drop-in replacement for Apache Spark UI
An end to end demo of Google's Cloud data and analytic stack.
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Ephemeral Hadoop clusters using Google Compute Platform
A Python framework for data processing on GCP.
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
AI/ML Recipes for Vertex AI, Serverless Spark and BigQuery open-source project is an effort to jumpstart your development of data processing and machine learning notebooks using VertexAI, BigQuery and Dataproc's distributed processing capabilities.
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.
Data Pipeline from the Global Historical Climatology Network DataSet
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.
Creating an Inverted Index of words occurring in a large set of documents extracted from web pages using Hadoop MapReduce and Google Dataproc
Dataproc Scala Examples is an effort to assist in the creation of Spark jobs written in Scala to run on Dataproc.
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
A search engine to query social media insights with political theme
An educational project to build an end-to-end pipline for near real-time and batch processing of data further used for visualisation and a machine learning model.
GCP_Data_Enginner
Trino Autoscaler on Dataproc automates the scaling of Dataproc cluster based on real-time resource utilization by Trino workloads
opens a chrome browser to a dataproc cluster
Demonstration of Google Cloud Dataproc Workflow Templates
Demonstration of Google Cloud Dataproc for running PySpark jobs
Project for the Data Engineering Zoomcamp by DataTalks.Club
New York TLC (Taxi & Limousine Commission) Trip Record Data Processing Project with Google Cloud Platform
✈ A Spark-based ETL Pipeline for the OpenSky and OpenFlights Datasets
Generando un proceso ETL con dataset de Amazon
Demonstration of Google Cloud Dataproc for running Spark jobs with Java
Data Workflows with GCP Dataproc, Apache Airflow and Apache Spark
La empresa GreenMiles NYC Taxis está interesada en invertir en el sector de transporte de pasajeros con automóviles, con una visión de un futuro menos contaminado y ajustarse a las tendencias de mercado actuales.