pinexai/tokenspy
cProfile for LLMs — find which function burns your AI budget. Flame graphs, tracing, evals, prompt versioning. pip install tokenspy
🔥 The Problem
You get an OpenAI invoice for $800 this month. You have no idea which function caused it.
Langfuse and Braintrust require you to sign up, configure API keys, and reroute traffic through their cloud proxy just to see what's happening. tokenspy intercepts in-process — one decorator, no proxy, no account, no monthly fee.
⚡ Fix It in One Line
import tokenspy
@tokenspy.profile
def run_pipeline(query):
docs = fetch_and_summarize(query) # ← costs $0.038?
entities = extract_entities(docs) # ← or this?
return generate_report(entities) # ← or this?
run_pipeline("Analyze Q3 earnings")
tokenspy.report()╔══════════════════════════════════════════════════════════════════════╗
║ tokenspy cost report ║
║ total: $0.0523 · 18,734 tokens · 3 calls ║
╠══════════════════════════════════════════════════════════════════════╣
║ fetch_and_summarize $0.038 ████████████░░░░ 73% ║
║ └─ gpt-4o $0.038 ████████████░░░░ 73% ║
║ generate_report $0.011 ████░░░░░░░░░░░░ 21% ║
║ extract_entities $0.003 █░░░░░░░░░░░░░░░ 6% ║
╠══════════════════════════════════════════════════════════════════════╣
║ 🔴 fetch_and_summarize → switch to gpt-4o-mini: 94% cheaper ║
╚══════════════════════════════════════════════════════════════════════╝
Now you know: fetch_and_summarize is burning 73% of your budget.
✨ Full Observability Stack — v0.2.0
Everything Langfuse and Braintrust do, without sending a single byte to the cloud.
| Feature | v0.1 | v0.2.0 |
|---|---|---|
| Cost flame graph | ✅ | ✅ |
| Budget alerts | ✅ | ✅ |
| SQLite persistence | ✅ | ✅ |
| Structured tracing (Trace + Span) | ❌ | ✅ |
| OpenTelemetry export | ❌ | ✅ |
| Evaluations + datasets | ❌ | ✅ |
| Prompt versioning | ❌ | ✅ |
| Live web dashboard | ❌ | ✅ |
🖥️ Live Dashboard
pip install tokenspy[server]
tokenspy serve # → http://localhost:72345 tabs: Overview · Traces · Evaluations · Prompts · Settings — all your LLM data, local, real-time.
🚀 Quick Start
Tracing
tokenspy.init(persist=True)
with tokenspy.trace("pipeline", input={"query": q}) as t:
with tokenspy.span("retrieve") as s:
docs = fetch(q); s.update(output=docs)
with tokenspy.span("generate", span_type="llm") as s:
answer = llm_call(docs) # ← auto-linked to span
t.update(output=answer)
t.score("quality", 0.9)Evaluations
from tokenspy.eval import scorers
ds = tokenspy.dataset("qa-golden")
ds.add(input={"q": "Capital of France?"}, expected_output="Paris")
exp = tokenspy.experiment("gpt4o-mini-v1", dataset="qa-golden",
fn=my_fn, scorers=[scorers.exact_match])
exp.run().summary()Prompt versioning
p = tokenspy.prompts.push("summarizer", "Summarize in {{style}}: {{text}}")
p.compile(style="concise", text="...")
tokenspy.prompts.set_production("summarizer", version=2)Budget alerts
@tokenspy.profile(budget_usd=0.10, on_exceeded="raise")
def strict_agent(query): ...
# raises BudgetExceededError if cost > $0.10🆚 tokenspy vs Langfuse vs Braintrust
| Langfuse | Braintrust | tokenspy | |
|---|---|---|---|
| Requires cloud proxy | ✅ yes | ✅ yes | ❌ no |
| Requires signup | ✅ yes | ✅ yes | ❌ no |
| Data leaves your machine | ✅ yes | ✅ yes | ❌ never |
| Works offline | ❌ no | ❌ no | ✅ yes |
| Zero core dependencies | ❌ no | ❌ no | ✅ yes |
| Structured tracing | ✅ yes | ✅ yes | ✅ yes |
| Evaluations + datasets | ✅ yes | ✅ yes | ✅ yes |
| LLM-as-judge scoring | ✅ yes | ✅ yes | ✅ yes |
| Prompt versioning | ✅ yes | ✅ yes | ✅ yes |
| OpenTelemetry export | ⚡ partial | ❌ no | ✅ yes |
| Flame graph by function | ❌ no | ❌ no | ✅ yes |
@decorator API |
❌ no | ❌ no | ✅ yes |
| Budget alerts | ⚡ partial | ⚡ partial | ✅ yes |
| Git commit cost tracking | ❌ no | ❌ no | ✅ yes |
| GitHub Actions cost diff | ❌ no | ❌ no | ✅ yes |
| Monthly cost | $0–$250 | $0–$300 | free forever |
🔌 Integrations
| Provider | Auto-instrumented |
|---|---|
| OpenAI | chat.completions.create (sync + async + streaming) |
| Anthropic | messages.create (sync + async + streaming) |
| Google Gemini | generate_content |
| LangChain / LangGraph | Callback handler |
Exports: OpenTelemetry → Grafana, Jaeger, Datadog, Honeycomb (docs)
CI: GitHub Actions cost diff per PR (docs)
📦 Install
pip install tokenspy # core (zero dependencies)
pip install tokenspy[otel] # + OpenTelemetry export
pip install tokenspy[server] # + web dashboard (fastapi + uvicorn)
pip install tokenspy[all] # openai + anthropic + langchain📚 Documentation
→ Full documentation at pinakimishra95.github.io/tokenspy
| Guide | Description |
|---|---|
| Tracing | Trace + Span context managers, auto LLM linking, scores |
| Evaluations & Datasets | Datasets, scorers, llm_judge, experiment comparison |
| Prompt Versioning | push / pull / compile / set_production |
| Web Dashboard | Local dashboard, REST API |
| OpenTelemetry | OTEL export to Grafana, Jaeger, Datadog |
| GitHub Actions | Cost diff annotations per PR |
🤝 Contributing
git clone https://github.com/pinakimishra95/tokenspy
cd tokenspy && pip install -e ".[dev]"
pytest tests/ # 139 tests, ~0.3sIssues and PRs welcome — especially new provider support and pricing updates.
License
MIT © Pinaki Mishra. See LICENSE.


