GitHunt
BR

bradmiro/cloud-dataproc

Samples for Cloud Dataproc

Google Cloud Dataproc

This repository contains code and documentation for use with
Google Cloud Dataproc.

Samples in this Repository

  • codelabs/opencv-haarcascade provides the source code for the OpenCV Dataproc Codelab, which demonstrates a Spark job that adds facial detection to a set of images.
  • codelabs/spark-bigquery provides the source code for the PySpark for Preprocessing BigQuery Data Codelab, which demonstrates using PySpark on Cloud Dataproc to process data from BigQuery.
  • codelabs/spark-nlp provides the source code for the PySpark for Natural Language Processing Codelab, which demonstrates using spark-nlp library for Natural Language Processing.
  • spark-tensorflow provides an example of using Spark as a preprocessing toolchain for Tensorflow jobs. Optionally,
    it demonstrates the spark-tensorflow-connector to convert CSV files to TFRecords.
  • spark-translate provides a simple demo Spark application that translates words using Google's Translation API and running on Cloud Dataproc.

See each directories README for more information.

Additional Dataproc Repositories

You can find more Dataproc resources in these github repositories:

Dataproc projects

Connectors

Kubernetes Operators

Examples

For more information

For more information, review the Dataproc
documentation
. You can also
pose questions to the Stack
Overflow
community
with the tag google-cloud-dataproc.
See our other Google Cloud Platform github
repos
for sample applications and
scaffolding for other frameworks and use cases.

Contributing changes

Licensing