duckdb-spanner
A DuckDB extension for querying Google Cloud Spanner databases directly from DuckDB.
Experimental: This extension is under active development. The API (function signatures, named parameters, configuration) may change without notice between versions. Do not depend on backward compatibility.
Both GoogleSQL and PostgreSQL dialect databases are supported transparently. Dialect detection is automatic — no configuration needed. spanner_query and spanner_scan work the same way regardless of the database dialect.
Prerequisites
- Rust (stable)
- DuckDB CLI (for running the extension)
- Python 3 (for
make extensionmetadata appending) - Make
Known Issues
-
This project depends on a patched version of gcloud-spanner via
[patch.crates-io].
The upstreamRowIterator::try_recv()discards metadata and stats when aPartialResultSethas empty values, which breakscolumns_metadata()andstats()forQueryMode::Plan, empty result sets, and other scenarios.
See yoshidan/google-cloud-rust#428 for details. -
database_role(fine-grained access control) is not yet supported as a named parameter. The upstream gcloud-spannerSessionConfigdoes not exposecreator_rolefor session creation (the underlyingBatchCreateSessionsRequest.session_templatesupports it, but the Rust client hardcodes it toNone). -
Results are streamed via an internal channel. Memory usage is bounded regardless of result set size.
-
use_parallelismenables the partitioned API (required for Data Boost), but partitions are currently executed sequentially. True concurrent partition execution is planned for a future release.
Installation
Build from Source
Clone the repository with submodules:
git clone --recurse-submodules https://github.com/apstndb/duckdb-spanner.git
cd duckdb-spannerBuild the extension with metadata (recommended):
make extensionOr build with cargo directly (without extension metadata):
cargo build --features loadable-extension --releaseThe built extension is spanner.duckdb_extension (via make extension) or target/release/libduckdb_spanner.dylib (macOS) / libduckdb_spanner.so (Linux).
Loading the Extension
This is an unsigned extension, so DuckDB must be started with the -unsigned flag:
duckdb -unsignedThen load the extension:
LOAD 'path/to/spanner.duckdb_extension';Or set the option programmatically before loading:
SET allow_unsigned_extensions = true;
LOAD 'path/to/spanner.duckdb_extension';For a quick build-and-launch workflow:
make duckdbAuthentication
This extension uses Application Default Credentials (ADC).
For local development with end-user credentials, install the Google Cloud SDK and run:
gcloud auth application-default loginWhen using the Spanner emulator, no authentication is required. Either set the SPANNER_EMULATOR_HOST environment variable or pass endpoint directly:
export SPANNER_EMULATOR_HOST=localhost:9010Getting Started
Run a SQL Query
SELECT * FROM spanner_query(
'projects/myproj/instances/myinst/databases/mydb',
'SELECT Id, Name FROM Users WHERE Age > 20'
);Read a Table
SELECT * FROM spanner_scan(
'projects/myproj/instances/myinst/databases/mydb',
'Users'
);With the Spanner Emulator
export SPANNER_EMULATOR_HOST=localhost:9010-- endpoint is inferred from SPANNER_EMULATOR_HOST
SELECT * FROM spanner_query(
'projects/test/instances/test/databases/test',
'SELECT * FROM Users'
);
-- Or pass endpoint explicitly
SELECT * FROM spanner_query(
'projects/test/instances/test/databases/test',
'SELECT * FROM Users',
endpoint := 'localhost:9010'
);Table Functions
spanner_query
Runs arbitrary Spanner SQL. Implemented as a table macro that wraps spanner_query_raw, accepting a STRUCT for params and automatically converting it to JSON.
SELECT * FROM spanner_query(
'projects/myproj/instances/myinst/databases/mydb',
'SELECT * FROM Users WHERE Age > @min_age',
params := {'min_age': spanner_value(21::BIGINT)},
exact_staleness_secs := 10,
use_data_boost := true
);spanner_scan
Reads all rows from a table using the Spanner Read API.
SELECT * FROM spanner_scan(
'projects/myproj/instances/myinst/databases/mydb',
'Users',
index := 'UsersByName'
);Named Parameters
Named parameters are inspired by BigQuery's EXTERNAL_QUERY and CloudSpannerProperties.
Both functions accept the following named parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
endpoint |
VARCHAR | (production) | Custom gRPC endpoint (e.g., localhost:9010 for the emulator) |
use_parallelism |
BOOLEAN | false |
Use partitioned query/read for parallel execution |
use_data_boost |
BOOLEAN | false |
Enable Data Boost |
max_parallelism |
INTEGER | (default) | Maximum number of partitions |
exact_staleness_secs |
BIGINT | Read at an exact staleness (seconds ago) | |
max_staleness_secs |
BIGINT | Read with bounded staleness (at most N seconds ago) | |
read_timestamp |
VARCHAR | Read at a specific timestamp (RFC 3339, e.g., '2024-01-15T10:30:00Z') |
|
min_read_timestamp |
VARCHAR | Read at a timestamp no earlier than this (RFC 3339) | |
priority |
VARCHAR | Request priority: 'low', 'medium', or 'high' |
At most one timestamp bound parameter can be specified. If none is set, Spanner uses a strong read (the default).
spanner_query additionally accepts:
| Parameter | Type | Description |
|---|---|---|
params |
STRUCT | Query parameters (see Query Parameters) |
spanner_scan additionally accepts:
| Parameter | Type | Description |
|---|---|---|
index |
VARCHAR | Secondary index name to use for the read |
dialect |
VARCHAR | Database dialect: 'googlesql' or 'postgresql' (auto-detected if omitted) |
Query Parameters
The params parameter accepts a STRUCT mapping parameter names to values. Two helper macros format values for Spanner type compatibility.
spanner_value(val) -- Auto-Detect Type
Infers the Spanner type from DuckDB's typeof() and converts the value to a Spanner-compatible format.
spanner_value(42::BIGINT) -- {"value":42,"type":"INT64"}
spanner_value('2024-01-15'::DATE) -- {"value":"2024-01-15","type":"DATE"}
spanner_value('\xDEAD'::BLOB) -- {"value":"3kFE","type":"BYTES"}
spanner_value('2024-06-15T10:30:00Z'::TIMESTAMPTZ) -- {"value":"2024-06-15T10:30:00.000000Z","type":"TIMESTAMP"}
spanner_value(NULL::BIGINT) -- {"value":null,"type":"INT64"}See DuckDB to Spanner type mapping for the full conversion table.
spanner_typed(val, type_name) -- Explicit Type
For cases where auto-detection is insufficient, specify the Spanner type name directly.
spanner_typed(NULL, 'INT64') -- {"value":null,"type":"INT64"}
spanner_typed('abc', 'STRING') -- {"value":"abc","type":"STRING"}Plain Values
Values without spanner_value() or spanner_typed() wrappers are passed as raw JSON. Types are inferred from the JSON representation (number, string, boolean, null).
-- Mix plain and typed values
params := {'id': 1, 'name': spanner_value('Alice')}Type Mapping
DuckDB to Spanner (Query Parameters)
spanner_value() maps DuckDB types to Spanner types. Values are serialized as JSON internally:
| DuckDB Type | Spanner Type | JSON Value |
|---|---|---|
| BOOLEAN | BOOL | boolean |
| TINYINT, SMALLINT, INTEGER, BIGINT | INT64 | number |
| FLOAT | FLOAT32 | number |
| DOUBLE | FLOAT64 | number |
| VARCHAR | STRING | string |
| DATE | DATE | string |
| TIMESTAMP | TIMESTAMP | string (RFC 3339: %Y-%m-%dT%H:%M:%S.%fZ) |
| TIMESTAMP WITH TIME ZONE | TIMESTAMP | string (normalized to UTC, RFC 3339) |
| BLOB | BYTES | string (base64 encoded) |
| HUGEINT, UBIGINT, DECIMAL | NUMERIC | string (preserves precision) |
| UUID | UUID | string |
| INTERVAL | INTERVAL | string (ISO 8601 duration) |
| JSON | JSON | any JSON value |
| TIME | STRING | string |
| BIT | BYTES | string (base64 encoded) |
| T[] | ARRAY<T> | JSON array (elements follow scalar conversion rules) |
spanner_value() also supports array types:
spanner_value([1, 2, 3]) -- {"value":[1,2,3],"type":"ARRAY<INT64>"}
spanner_value(['a', 'b']) -- {"value":["a","b"],"type":"ARRAY<STRING>"}
spanner_value([true, false]) -- {"value":[true,false],"type":"ARRAY<BOOL>"}STRUCT parameters are not supported.
Spanner to DuckDB (Query Results)
| Spanner Type | DuckDB Type |
|---|---|
| BOOL | BOOLEAN |
| INT64 | BIGINT |
| FLOAT32 | FLOAT |
| FLOAT64 | DOUBLE |
| NUMERIC | DECIMAL(38,9) |
| STRING | VARCHAR |
| JSON | VARCHAR (aliased as JSON) |
| BYTES | BLOB |
| DATE | DATE |
| TIMESTAMP | TIMESTAMP WITH TIME ZONE |
| UUID | UUID |
| INTERVAL | INTERVAL |
| ARRAY<T> | LIST(T) |
| STRUCT<...> | STRUCT(...) |
| PROTO | BLOB |
| ENUM | BIGINT |
Testing
Shell Tests
Shell tests run SQL queries against a DuckDB CLI with the extension loaded:
make testIntegration Tests
Integration tests require a Spanner emulator:
# Start the emulator
make emulator-start
# Run integration tests (uses testcontainers, but the emulator must be available)
cargo testAdvanced
spanner_query_raw -- Low-Level Table Function
The underlying table function that spanner_query wraps. Use this when:
- You have a pre-built JSON string for
params(e.g., from application code) - Table macro expansion causes issues in your client library or tooling
SELECT * FROM spanner_query_raw(
'projects/myproj/instances/myinst/databases/mydb',
'SELECT * FROM Users WHERE Age > @min_age',
params := '{"min_age": {"value": 21, "type": "INT64"}}'
);The params parameter is a VARCHAR containing a JSON object. Use spanner_params() to build it from a STRUCT:
SELECT * FROM spanner_query_raw(
'projects/myproj/instances/myinst/databases/mydb',
'SELECT * FROM Users WHERE Age > @min_age',
params := spanner_params({'min_age': spanner_value(21::BIGINT)})
);Registered Names
This extension registers the following names into the global DuckDB namespace.
| Name | Kind | Description |
|---|---|---|
spanner_query |
table macro | Wraps spanner_query_raw with ergonomic params (see Table Functions) |
spanner_query_raw |
table function | Execute Spanner SQL (see Low-Level Table Function) |
spanner_scan |
table function | Read a Spanner table (see Table Functions) |
spanner_value(val) |
scalar macro | Auto-detect Spanner type from DuckDB type (see Query Parameters) |
spanner_typed(val, type_name) |
scalar macro | Explicit Spanner type wrapper (see Query Parameters) |
spanner_params(s) |
scalar macro | Convert a STRUCT to a JSON params string for spanner_query_raw |
interval_to_iso8601(i) |
scalar macro | Convert a DuckDB INTERVAL to an ISO 8601 duration string (e.g., 'P1Y3M', 'PT2H30M') |
_spanner_type_name(t) |
scalar macro | Internal: map a DuckDB type name (scalar or array) to a Spanner type name |
_spanner_scalar_type_name(t) |
scalar macro | Internal: scalar-only type name mapping used by _spanner_type_name |
Names prefixed with _ are internal implementation details and may change without notice.
Convenience Macro Pattern
You can define a convenience macro that hardcodes your database and endpoint:
CREATE MACRO my_query(
sql, params := NULL, use_parallelism := NULL,
use_data_boost := NULL, max_parallelism := NULL,
exact_staleness_secs := NULL, max_staleness_secs := NULL,
read_timestamp := NULL, min_read_timestamp := NULL,
priority := NULL
) AS TABLE
SELECT * FROM spanner_query(
'projects/myproj/instances/myinst/databases/mydb',
sql,
endpoint := 'localhost:9010',
params := params,
use_parallelism := use_parallelism,
use_data_boost := use_data_boost,
max_parallelism := max_parallelism,
exact_staleness_secs := exact_staleness_secs,
max_staleness_secs := max_staleness_secs,
read_timestamp := read_timestamp,
min_read_timestamp := min_read_timestamp,
priority := priority
);
-- Usage
SELECT * FROM my_query('SELECT * FROM Users');
SELECT * FROM my_query(
'SELECT * FROM Users WHERE Age > @min',
params := {'min': spanner_value(21::BIGINT)}
);