PyCQEngine
High-performance in-memory NoSQL indexing engine for Python object collections, powered by Rust.
The project is in development phase and is provided as-is for now.
Performance: Sub-microsecond point lookups. 100x+ faster than list comprehensions for selective queries on 1,000,000+ objects.
Features
- π Blazing Fast: Rust-backed hash & BTree indexing with sub-1ΞΌs point lookups
- π Thread-Safe: Lock-free concurrent indexing using DashMap + parking_lot
- π‘ Simple API: Intuitive query DSL β
eq,and_,or_,in_,gt,lt,between - β‘ Fused Materialization: Query + object retrieval in a single RustβPython call
- π² Range Queries: BTree indexes for
gt/gte/lt/lte/between - π Parallel Execution: Rayon-powered parallel index operations with GIL release
- π¦ Batch Ingestion:
add_many()for efficient bulk loading (~330K obj/s) - ποΈ Memory Lifecycle:
remove(),remove_many(),clear(),__del__support - π― Zero-Cost Counting:
count()andfirst(n)without materializing objects - πΎ LRU Query Cache: Automatic caching of repeated queries (1,000 entries)
- π Weak References: Opt-in
use_weakrefs=Truemode β objects auto-cleaned when Python GC'd
Architecture
βββββββββββββββββββββββββββββββββββββββββββ
β Python Application β
β (User Code + Query DSL) β
ββββββββββββββββ¬βββββββββββββββββββββββββββ
β PyO3 FFI Boundary
ββββββββββββββββΌβββββββββββββββββββββββββββ
β Rust Core Engine β
β β’ CollectionManager (Object Registry) β
β β’ HashIndex (DashMap β O(1) eq) β
β β’ BTreeIndex (BTreeMap β range scans) β
β β’ Fused query_*_objects() methods β
β β’ Rayon parallel intersection/union β
β β’ LRU query cache (parking_lot Mutex) β
β β’ GIL Release (True parallelism) β
βββββββββββββββββββββββββββββββββββββββββββ
Key Design Principles:
- Attribute Extraction: Lambda extractors run once during
add(), bypassing Python'stp_getattrooverhead during queries - Fused Materialization: Queries execute + materialize objects in a single FFI call, eliminating the IDsβPythonβRust roundtrip
- GIL Release: Index operations release the GIL for true multi-core parallelism
- Static Dispatch:
IndexKindenum avoids vtable overhead for hot-path lookups
Installation
Prerequisites
- Python 3.11+
- Rust 1.70+ (install via rustup)
From Source
# Clone the repository
git clone https://github.com/yourusername/py-cqengine.git
cd py-cqengine
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Install maturin
pip install maturin
# Build and install
maturin develop --releaseQuick Start
from pycqengine import IndexedCollection, Attribute, eq, and_, gt, between
class Car:
def __init__(self, vin, brand, price):
self.vin = vin
self.brand = brand
self.price = price
# Step 1: Define Attributes (lambda extractors)
VIN = Attribute("vin", lambda c: c.vin)
BRAND = Attribute("brand", lambda c: c.brand)
PRICE = Attribute("price", lambda c: c.price)
# Step 2: Setup Collection
cars = IndexedCollection()
cars.add_index(VIN) # Hash index (default)
cars.add_index(BRAND) # Hash index
cars.add_index(PRICE, index_type="btree") # BTree index for range queries
# Step 3: Load Data (use add_many for batch efficiency)
cars.add_many([
Car(1, "Tesla", 50000),
Car(2, "Ford", 30000),
Car(3, "Tesla", 60000),
Car(4, "BMW", 45000),
])
# Step 4: Query
results = cars.retrieve(eq(BRAND, "Tesla"))
for car in results:
print(f"VIN: {car.vin}, Brand: {car.brand}, Price: ${car.price}")
# Count without materializing objects
count = cars.retrieve(eq(BRAND, "Tesla")).count() # ~0.9ΞΌs
# First N results
top3 = cars.retrieve(eq(BRAND, "Tesla")).first(3) # ~1.2ΞΌsQuery DSL
Equality Query
from pycqengine import eq
# Find all Teslas
results = cars.retrieve(eq(BRAND, "Tesla"))AND Query (Intersection)
from pycqengine import and_, eq, gt
# Find Teslas priced above $55,000
results = cars.retrieve(and_(
eq(BRAND, "Tesla"),
gt(PRICE, 55000)
))OR Query (Union)
from pycqengine import or_, eq
# Find Tesla or Ford vehicles
results = cars.retrieve(or_(
eq(BRAND, "Tesla"),
eq(BRAND, "Ford")
))IN Query (Membership)
from pycqengine import in_
# Find vehicles from specific brands
results = cars.retrieve(in_(BRAND, ["Tesla", "Ford", "BMW"]))Range Queries (requires BTree index)
from pycqengine import gt, gte, lt, lte, between
# Price > 40,000
results = cars.retrieve(gt(PRICE, 40000))
# Price >= 30,000
results = cars.retrieve(gte(PRICE, 30000))
# Price < 50,000
results = cars.retrieve(lt(PRICE, 50000))
# 30,000 <= Price <= 50,000 (inclusive)
results = cars.retrieve(between(PRICE, 30000, 50000))Memory Management
# Remove a single object
cars.remove(car_obj)
# Remove multiple objects
cars.remove_many([car1, car2, car3])
# Clear entire collection
cars.clear()Weak References
By default, IndexedCollection holds strong references to objects, keeping them alive as long as the collection exists. Enable weak reference mode to let Python's GC reclaim objects when no other references exist:
# Opt-in weak reference mode
cars = IndexedCollection(use_weakrefs=True)
cars.add_index(BRAND)
cars.add_index(PRICE, index_type="btree")
car = Car(1, "Tesla", 50000)
cars.add(car)
# Object is retrievable while reference exists
assert list(cars.retrieve(eq(BRAND, "Tesla"))) == [car]
# Drop the reference β Python GC can reclaim it
del car
# Explicit garbage collection
cleaned = cars.gc() # Returns number of dead refs cleaned
print(cars.alive_count) # Number of still-alive objects
# Dead refs are also cleaned lazily during queries
results = list(cars.retrieve(eq(BRAND, "Tesla"))) # Returns [] β dead ref auto-cleanedNotes:
- Objects that don't support weakrefs (tuples, ints, etc.) automatically fall back to strong refs
- Query performance has zero overhead in weakref mode
- Build throughput is ~13% slower (weakref creation + reverse index population)
gc()andalive_countscan all objects β suitable for periodic maintenance, not hot loops
Performance
Benchmarked on macOS ARM64 (Apple Silicon), Python 3.14, Rust 1.93.
100K Objects
| Scenario | Median | Results | vs Python |
|---|---|---|---|
| Point lookup (eq VIN) | 0.8 ΞΌs | 1 | 3,290x |
| count() eq(BRAND) | 0.9 ΞΌs | 12,500 | 2,377x |
| first(10) eq(BRAND) | 1.2 ΞΌs | 10 | β |
| AND 2-way list() | 94 ΞΌs | 4,167 | 22x |
| AND 3-way list() | 19 ΞΌs | 833 | 110x |
| AND 4-way (empty result) | 2.4 ΞΌs | 0 | 923x |
| OR 2-way list() | 535 ΞΌs | 25,000 | 6.0x |
| IN 3-val list() | 773 ΞΌs | 37,500 | 4.6x |
| gt(PRICE, 40000) list() | 1,173 ΞΌs | 59,000 | 1.7x |
| between(30k-40k) list() | 425 ΞΌs | 21,000 | 7.2x |
| count() gt(PRICE) | 0.6 ΞΌs | 59,000 | 4,087x |
| between(narrow) list() | 102 ΞΌs | 5,000 | 26.7x |
| AND(eq+gt) mixed list() | 173 ΞΌs | 8,500 | 12.3x |
| Build time | 0.30s | β | 334K obj/s |
Scaling to 1M Objects
| Scenario | 100K | 500K | 1M |
|---|---|---|---|
| Point lookup | 0.8ΞΌs (3,290x) | 0.8ΞΌs (16,824x) | 0.8ΞΌs (33,654x) |
| count() eq | 0.9ΞΌs (2,377x) | 1.0ΞΌs (11,441x) | 0.9ΞΌs (23,313x) |
| AND 3-way | 19ΞΌs (110x) | 97ΞΌs (114x) | 215ΞΌs (104x) |
| AND 4-way empty | 2.4ΞΌs (923x) | 2.3ΞΌs (4,794x) | 2.3ΞΌs (9,663x) |
| count() gt | 0.6ΞΌs (4,087x) | 0.6ΞΌs (22,686x) | 0.6ΞΌs (43,840x) |
| between(narrow) | 102ΞΌs (26.7x) | 537ΞΌs (26.6x) | 1,527ΞΌs (18.9x) |
| Build throughput | 334K obj/s | 337K obj/s | 331K obj/s |
Point lookups, counts, and empty-result queries are O(1) β speedup scales linearly with collection size.
Selective queries (AND, narrow range) remain 10β100x+ faster at all scales.
Development
Project Structure
py-cqengine/
βββ src/ # Rust source code
β βββ lib.rs # PyO3 module initialization
β βββ types.rs # TypedValue enum (str/int/float/bool)
β βββ collection.rs # CollectionManager + query methods
β βββ index.rs # Index trait (lookup, insert, remove)
β βββ hash_index.rs # DashMap-based O(1) equality index
β βββ btree_index.rs # BTreeMap-based range index
βββ python/pycqengine/ # Python package
β βββ __init__.py # Public API exports
β βββ core.py # IndexedCollection + ResultSet
β βββ attribute.py # Attribute extractor
β βββ query.py # Query DSL (eq, and_, or_, in_, gt, between...)
βββ tests/ # Python tests (119 tests)
βββ benchmarks/ # Performance benchmarks
βββ Cargo.toml # Rust dependencies
βββ pyproject.toml # Python package config
Build Commands
# Development build (with debug symbols)
maturin develop
# Release build (optimized)
maturin develop --release
# Run Python tests
python -m pytest tests/ -v
# Run benchmarks
python benchmarks/run_all.py # Standard (100K)
python benchmarks/run_all.py --sizes 100000,500000 # Multi-scale
python benchmarks/run_all.py --quick # Fast iteration
python benchmarks/run_all.py --json # Save JSON for diffingContributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details.