291 results for “topic:apache-beam”
TFX is an end-to-end platform for deploying production ML pipelines
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
Yet Another UserAgent Analyzer
[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Tools to make weather data accessible and useful.
Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
A collection of tools for extracting FHIR resources and analytics services on top of that data.
TFRecorder makes it easy to create TensorFlow records (TFRecords) from Pandas DataFrames and CSVs files containing images or structured data.
Clojure API for a more dynamic Google Dataflow
Collection of transforms for the Apache beam python SDK.
Asgarde allows simplifying error handling with Apache Beam Java, with less code, more concise and expressive code.
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Tool to define Apache Beam pipeline in YAML or JSON
Repository to quickly get you started with new Machine Learning projects on Google Cloud Platform. More info(slides):
Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow
Some class materials for a data processing course using PySpark
Microservices in Post-Kubernetes Era. A polyglot monorepo
Blockchain ETL Architecture
Opinionated serverless event analytics pipeline
A Python toolkit for extracting and transforming Bitcoin blockchain data into structured formats.
Asgarde allows simplifying error handling with Apache Beam Python, with less code, more concise and expressive code.
This project leverages GCS, Composer, Dataflow, BigQuery, and Looker on Google Cloud Platform (GCP) to build a robust data engineering solution for processing, storing, and reporting daily transaction data in the online food delivery industry.
Apache Beam examples for running on Google Cloud Dataflow.
Integrates LLMs as PTransform in Apache Beam pipelines using LangChain
Efficient streaming data ingestion, transformation & activation
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
Code to statistically up-weight conversion values of consenting customers to feed up to 100% of the factual conversion values back into Google Ads.
Log analysis pipeline utilizing Apache Beam
Libraries for efficient and scalable group-structured dataset pipelines.