JetBrains/databao-context-engine
Databao Context Engine is an open-source engine that automatically generates a governed semantic context from your databases, BI tools, documents, and spreadsheets. It runs locally in your environment and integrates with any LLM to deliver accurate, context-aware answers
Databao Context Engine
Semantic context for your LLMs — generated automatically.
No more copying schemas. No manual documentation. Just accurate answers.
Website • Quickstart • Data Sources • Contributing
What is Databao Context Engine?
Databao Context Engine is a Python library that automatically generates governed semantic context from your databases, BI tools, documents, and spreadsheets.
Use it with any LLM to deliver accurate, context-aware answers — without copying schemas or writing documentation by hand.
You can add Databao Context Engine as a standard Python dependency in your code or via Databao CLI (coming soon).
Your data sources → Context Engine → Unified semantic graph → Any LLM
Why choose Databao Context Engine?
| Feature | What it means for you |
|---|---|
| Auto-generated context | Extracts schemas, relationships, and semantics automatically |
| Runs locally | Your data never leaves your environment |
| MCP integration | Works with Claude Desktop, Cursor, and any MCP-compatible tool |
| Multiple sources | Databases, dbt projects, spreadsheets, documents |
| Built-in benchmarks | Measure and improve context quality over time |
| LLM agnostic | OpenAI, Anthropic, Ollama, Gemini — use any model |
| Governed & versioned | Track, version, and share context across your team |
| Dynamic or static | Serve context via MCP server or export as artifact |
Installation
Databao Context Engine is available on PyPI
and can be installed with uv, pip, or another package manager.
Using uv
uv add databao-context-engineUsing pip
pip install databao-context-engineSupported data sources
- Athena
- BigQuery
- ClickHouse
- DuckDB
- MSSQL
- MySQL
- PostgreSQL
- Snowflake
- SQLite
- dbt projects
- PDF files
- Markdown and text files
Supported LLMs
| Provider | Configuration |
|---|---|
| Ollama | languageModel: OLLAMA: runs locally, free |
Quickstart
1. Create a domain
# Initialize the domain in an existing directory
from databao_context_engine import init_dce_domain
domain_manager = init_dce_domain(Path(tempfile.mkdtemp()))
# Or use an existing project
from databao_context_engine import DatabaoContextDomainManager
domain_manager = DatabaoContextDomainManager(domain_dir=Path("path/to/project"))2. Configure data sources
from databao_context_engine import (
DatasourceConnectionStatus,
DatasourceType,
)
# Create a new datasource
postgres_datasource_id = domain_manager.create_datasource_config(
DatasourceType(full_type="postgres"),
datasource_name="my_postgres_datasource",
config_content={
"connection": {"host": "localhost", "user": "dev", "password": "pass"}
},
).datasource.id
# Check the connection to the datasource is valid
check_result = domain_manager.check_datasource_connection()
assert len(check_result) == 1
assert check_result[0].datasource_id == postgres_datasource_id
assert check_result[0].connection_status == DatasourceConnectionStatus.VALID3. Build context
build_result = domain_manager.build_context()
assert len(build_result) == 1
assert build_result[0].datasource_id == postgres_datasource_id
assert build_result[0].datasource_type == DatasourceType(full_type="postgres")
assert build_result[0].context_file_path.is_file()4. Use the built contexts
Create a context engine
# Switch to the engine if you're already using a domain_manager
context_engine = domain_manager.get_engine_for_domain()
# Or directly create a context engine from the path to your DCE domaint
from databao_context_engine import DatabaoContextEngine
context_engine = DatabaoContextEngine(domain_dir=Path("path/to/project"))Get all built contexts
# Switch to the engine to use the context built
all_built_contexts = context_engine.get_all_contexts()
assert len(all_built_contexts) == 1
assert all_built_contexts[0].datasource_id == postgres_datasource_id
print(all_built_contexts[0].context)Search in built contexts
# Run a vector similarity search
results = context_engine.search_context("my search query")
print(f"Found {len(results)} results for query")
print(
"\n\n".join(
[f"{str(result.datasource_id)}\n{result.context_result}" for result in results]
)
)Contributing
We’d love your help! Here’s how to get involved:
- ⭐ Star this repo — it helps others find us!
- 🐛 Found a bug? Open an issue
- 💡 Have an idea? We’re all ears — create a feature request
- 👍 Upvote issues you care about — helps us prioritize
- 🔧 Submit a PR
- 📝 Improve docs — typos, examples, tutorials — everything helps!
New to open source? No worries! We're friendly and happy to help you get started. 🌱
For more details, see CONTRIBUTING.
📄 License
Apache 2.0 — use it however you want. See the LICENSE file for details.
Like Databao Context Engine? Give us a ⭐ — it means a lot!