palantir-compute-module-pipeline-search

Pipeline-mode Foundry Compute Module (Go) that:

Reads a dataset of email addresses
Enriches each email via Gemini (grounding + URL context + structured output)
Writes enriched rows to either:
- a snapshot dataset (transactions), or
- a streaming dataset (stream-proxy)

Local-first workflow: iterate locally (mock Foundry APIs + real container) and deploy the same image into Foundry.

Note: compute modules run as long-lived containers. This module runs the pipeline once per container start and then keeps the process alive so the platform does not restart it (which would re-run the pipeline and can duplicate stream outputs).

Repo Layout

This repo is split into reusable kit packages and an example module:

pkg/pipeline/...: reusable pipeline primitives (worker, local/foundry IO adapters, schema contract)
pkg/foundry/...: Foundry env parsing and HTTP client
pkg/mockfoundry/...: emulated Foundry server used by local harnesses and tests
examples/email_enricher/...: example email enrichment domain logic and output mapping
cmd/enricher: example binary wiring the kit + example

External-consumer contracts are validated in:

test/consumer: imports reusable packages directly
test/template: minimal new-module skeleton using pipeline kit APIs

Development

Canonical entrypoint:

./dev help

Verify (CI parity + external consumer checks):

./dev verify

Real e2e test run (Gemini + Foundry-emulated docker-compose):

./dev test

./dev test performs real Gemini calls and fails if committed output contains any status=error rows.

Preflight diagnostics:

./dev doctor
./dev doctor --json

Run locally (no Foundry required, Gemini required):

export GEMINI_API_KEY=...
./dev run local -- --input /path/to/emails.csv --output /path/to/enriched.csv

GEMINI_MODEL is optional; default is gemini-2.5-flash.

Run Foundry-like flow locally (mock dataset API + real Gemini + real container):

./dev run foundry-emulated

Run a long-lived local dev loop (watches input CSV and reruns automatically):

./dev run foundry-emulated --watch

./dev run foundry-emulated --watch starts a tight local loop:

starts mock-foundry + a real container
runs once immediately, then reruns on input CSV edits
reuses prior status=ok rows by email (best-effort incremental cache)
stops cleanly on Ctrl+C

Local Watch Loop Quickstart

Set a valid Gemini key in .env:

GEMINI_API_KEY=...
# GEMINI_MODEL is optional (default: gemini-2.5-flash)

Edit input rows in:

.local/mock-foundry/inputs/ri.foundry.main.dataset.11111111-1111-1111-1111-111111111111.csv

Start the local loop:

./dev run foundry-emulated --watch

Read latest committed output at:

.local/mock-foundry/uploads/ri.foundry.main.dataset.22222222-2222-2222-2222-222222222222/_committed/readTable.csv

Change and save the input CSV again to trigger another run.

Reset local compose state and clear mock-foundry uploads (inputs are preserved):

./dev clean

See docker-compose.local.yml for fixture mounts and output paths.

Run CI-style docker-compose E2E (fixed fixtures + output validation):

export GEMINI_API_KEY=...
./dev test -v

Note: CI jobs that require Gemini secrets are skipped automatically if GEMINI_API_KEY / GEMINI_MODEL GitHub secrets are not configured.

Docs

docs/DESIGN.md: architecture, interfaces, local testing approach
docs/RELEASE.md: Foundry configuration steps (Sources, egress, probes) and publishing guidance
docs/TROUBLESHOOTING.md: common deployment failures and diagnosis
docs/DIAGRAMS.md: Mermaid sequence diagrams + flowcharts for API usage scenarios

Defaults (high-signal)

Defaults differ between:

binary internal fallbacks (used when env vars are unset in Foundry)
local docker-compose harness defaults in docker-compose.local.yml

Key ones:

REQUEST_TIMEOUT: 30s binary fallback; local compose sets 2m
WORKERS: 10
MAX_RETRIES: 3
FAIL_FAST: false

For the full set of options and Foundry configuration, see docs/RELEASE.md.

Screenshots

Put Foundry UI screenshots in docs/screenshots/ and reference them from this README.

Convention: docs/screenshots/<short-topic>-<yyyy-mm-dd>.png

Current screenshots:

Compute module configuration (pipelines mode, sources + env vars):

Lineage overview (inputs, sources, egress, output):

Streaming dataset current transaction view:

Streaming dataset metrics:

shpitdev/palantir-compute-module-pipeline-template