mesmacosta/datacatalog-fileset-exporter
A Python package to manage Google Cloud Data Catalog Fileset export scripts.
Datacatalog Fileset Exporter
A Python package to manage Google Cloud Data Catalog Fileset export scripts.
Disclaimer: This is not an officially supported Google product.
Table of Contents
- Executing in Cloud Shell
- 1. Environment setup
- 2. Export Filesets to CSV file
Executing in Cloud Shell
# Set your SERVICE ACCOUNT, for instructions go to 1.3. Auth credentials
# This name is just a suggestion, feel free to name it following your naming conventions
export GOOGLE_APPLICATION_CREDENTIALS=~/datacatalog-fileset-exporter-sa.json
# Install datacatalog-fileset-exporter
pip3 install datacatalog-fileset-exporter --user
# Add to your PATH
export PATH=~/.local/bin:$PATH
# Look for available commands
datacatalog-fileset-exporter --help1. Environment setup
1.1. Python + virtualenv
Using virtualenv is optional, but strongly recommended unless you use Docker.
1.1.1. Install Python 3.6+
1.1.2. Get the source code
git clone https://github.com/mesmacosta/datacatalog-fileset-exporter
cd ./datacatalog-fileset-exporterAll paths starting with ./ in the next steps are relative to the datacatalog-fileset-exporter
folder.
1.1.3. Create and activate an isolated Python environment
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate1.1.4. Install the package
pip install --upgrade .1.2. Docker
Docker may be used as an alternative to run the script. In this case, please disregard the
Virtualenv setup instructions.
1.3. Auth credentials
1.3.1. Create a service account and grant it below roles
- Data Catalog Admin
1.3.2. Download a JSON key and save it as
This name is just a suggestion, feel free to name it following your naming conventions
./credentials/datacatalog-fileset-exporter-sa.json
1.3.3. Set the environment variables
This step may be skipped if you're using Docker.
export GOOGLE_APPLICATION_CREDENTIALS=~/credentials/datacatalog-fileset-exporter-sa.json2. Export Filesets to CSV file
2.1. A CSV file representing the Filesets will be created
Filesets are composed of as many lines as required to represent all of their fields. The columns are
described as follows:
| Column | Description | Mandatory |
|---|---|---|
| entry_group_name | Entry Group Name. | Y |
| entry_group_display_name | Entry Group Display Name. | Y |
| entry_group_description | Entry Group Description. | Y |
| entry_id | Entry ID. | Y |
| entry_display_name | Entry Display Name. | Y |
| entry_description | Entry Description. | Y |
| entry_file_patterns | Entry File Patterns. | Y |
| schema_column_name | Schema column name. | N |
| schema_column_type | Schema column type. | N |
| schema_column_description | Schema column description. | N |
| schema_column_mode | Schema column mode. | N |
2.2. Run the datacatalog-fileset-exporter script
- Python + virtualenv
datacatalog-fileset-exporter filesets export --project-ids my-project --file-path CSV_FILE_PATH