tfabien/bigquery-autojob
Automatically load data from Google Cloud Storage files into Big Query tables
bigquery-autojob
Note: Documentation is currently not in sync with the rewrite, updating soon...
A Google Cloud Function providing a simple and configurable way to automatically load data from GCS files into Big Query tables.
It features a convention over configuration approches, and provides a sensible default configuration for common file formats (CSV, JSON, AVRO, ORC, Parquet)
- The table name is automatically derived from the file's name, minus the extension, and date/timestamp suffix if any.
- Autodetect features enabled
- Avro logical types are used
- New data is appended to the table
If the default behaviour does not suit your needs, it can be modified for all or certain files through mapping files or custom metadata.
Quickstart
-
Create a new
bq-autoloadGoogle Cloud Storage bucket$> gsutil mb -c regional -l europe-west1 "gs://bq-autoload"
-
Create a new
StagingBigQuery dataset$> bq mk --dataset "Staging"
-
Clone and deploy this repository as a cloud function triggered by changes on this GCS bucket (do not forget to replace the project id)
$> git clone "https://github.com/tfabien/bigquery-autoload/" \ && cd "bigquery-autoload" \ && npm install -g typescript \ && npm install \ && npm build \ && gcloud functions deploy "bq-autoload" \ --entry-point autoload \ --trigger-bucket "bq-autoload" \ --set-env-vars "PROJECT_ID={{YOUR_GCP_PROJECT_ID}}" \ --runtime "nodejs10" \ --memory "128MB" \ --region europe-west1
That's it ๐
Any file you upload to the bq_autoload GCS bucket will now automatically be loaded into a BigQuery table within seconds.
Usage
See the wiki for usage samples and advanced configuration