GitHunt
PI

pinexai/tokenspy

cProfile for LLMs — find which function burns your AI budget. Flame graphs, tracing, evals, prompt versioning. pip install tokenspy

tokenspy — cProfile for LLMs

Docs · Tracing · Evals · Dashboard · Issues

PyPI version
PyPI downloads
Tests
Python 3.10+
License: MIT
Zero dependencies

pip install tokenspy

🔥 The Problem

You get an OpenAI invoice for $800 this month. You have no idea which function caused it.

Langfuse and Braintrust require you to sign up, configure API keys, and reroute traffic through their cloud proxy just to see what's happening. tokenspy intercepts in-process — one decorator, no proxy, no account, no monthly fee.


⚡ Fix It in One Line

import tokenspy

@tokenspy.profile
def run_pipeline(query):
    docs = fetch_and_summarize(query)    # ← costs $0.038?
    entities = extract_entities(docs)   # ← or this?
    return generate_report(entities)    # ← or this?

run_pipeline("Analyze Q3 earnings")
tokenspy.report()
╔══════════════════════════════════════════════════════════════════════╗
║  tokenspy cost report                                                ║
║  total: $0.0523  ·  18,734 tokens  ·  3 calls                       ║
╠══════════════════════════════════════════════════════════════════════╣
║  fetch_and_summarize      $0.038  ████████████░░░░  73%             ║
║    └─ gpt-4o               $0.038  ████████████░░░░  73%            ║
║  generate_report          $0.011  ████░░░░░░░░░░░░  21%            ║
║  extract_entities         $0.003  █░░░░░░░░░░░░░░░   6%            ║
╠══════════════════════════════════════════════════════════════════════╣
║  🔴 fetch_and_summarize → switch to gpt-4o-mini: 94% cheaper        ║
╚══════════════════════════════════════════════════════════════════════╝

Now you know: fetch_and_summarize is burning 73% of your budget.


✨ Full Observability Stack — v0.2.0

Everything Langfuse and Braintrust do, without sending a single byte to the cloud.

Feature v0.1 v0.2.0
Cost flame graph
Budget alerts
SQLite persistence
Structured tracing (Trace + Span)
OpenTelemetry export
Evaluations + datasets
Prompt versioning
Live web dashboard

🖥️ Live Dashboard

pip install tokenspy[server]
tokenspy serve   # → http://localhost:7234

tokenspy dashboard overview

tokenspy traces view

5 tabs: Overview · Traces · Evaluations · Prompts · Settings — all your LLM data, local, real-time.


🚀 Quick Start

Tracing

tokenspy.init(persist=True)

with tokenspy.trace("pipeline", input={"query": q}) as t:
    with tokenspy.span("retrieve") as s:
        docs = fetch(q);  s.update(output=docs)
    with tokenspy.span("generate", span_type="llm") as s:
        answer = llm_call(docs)   # ← auto-linked to span
    t.update(output=answer)

t.score("quality", 0.9)

Evaluations

from tokenspy.eval import scorers

ds = tokenspy.dataset("qa-golden")
ds.add(input={"q": "Capital of France?"}, expected_output="Paris")

exp = tokenspy.experiment("gpt4o-mini-v1", dataset="qa-golden",
                          fn=my_fn, scorers=[scorers.exact_match])
exp.run().summary()

Prompt versioning

p = tokenspy.prompts.push("summarizer", "Summarize in {{style}}: {{text}}")
p.compile(style="concise", text="...")
tokenspy.prompts.set_production("summarizer", version=2)

Budget alerts

@tokenspy.profile(budget_usd=0.10, on_exceeded="raise")
def strict_agent(query): ...
# raises BudgetExceededError if cost > $0.10

🆚 tokenspy vs Langfuse vs Braintrust

Langfuse Braintrust tokenspy
Requires cloud proxy ✅ yes ✅ yes ❌ no
Requires signup ✅ yes ✅ yes ❌ no
Data leaves your machine ✅ yes ✅ yes ❌ never
Works offline ❌ no ❌ no ✅ yes
Zero core dependencies ❌ no ❌ no ✅ yes
Structured tracing ✅ yes ✅ yes ✅ yes
Evaluations + datasets ✅ yes ✅ yes ✅ yes
LLM-as-judge scoring ✅ yes ✅ yes ✅ yes
Prompt versioning ✅ yes ✅ yes ✅ yes
OpenTelemetry export ⚡ partial ❌ no ✅ yes
Flame graph by function ❌ no ❌ no ✅ yes
@decorator API ❌ no ❌ no ✅ yes
Budget alerts ⚡ partial ⚡ partial ✅ yes
Git commit cost tracking ❌ no ❌ no ✅ yes
GitHub Actions cost diff ❌ no ❌ no ✅ yes
Monthly cost $0–$250 $0–$300 free forever

🔌 Integrations

Provider Auto-instrumented
OpenAI chat.completions.create (sync + async + streaming)
Anthropic messages.create (sync + async + streaming)
Google Gemini generate_content
LangChain / LangGraph Callback handler

Exports: OpenTelemetry → Grafana, Jaeger, Datadog, Honeycomb (docs)

CI: GitHub Actions cost diff per PR (docs)


📦 Install

pip install tokenspy              # core (zero dependencies)
pip install tokenspy[otel]        # + OpenTelemetry export
pip install tokenspy[server]      # + web dashboard (fastapi + uvicorn)
pip install tokenspy[all]         # openai + anthropic + langchain

📚 Documentation

→ Full documentation at pinakimishra95.github.io/tokenspy

Guide Description
Tracing Trace + Span context managers, auto LLM linking, scores
Evaluations & Datasets Datasets, scorers, llm_judge, experiment comparison
Prompt Versioning push / pull / compile / set_production
Web Dashboard Local dashboard, REST API
OpenTelemetry OTEL export to Grafana, Jaeger, Datadog
GitHub Actions Cost diff annotations per PR

🤝 Contributing

git clone https://github.com/pinakimishra95/tokenspy
cd tokenspy && pip install -e ".[dev]"
pytest tests/   # 139 tests, ~0.3s

Issues and PRs welcome — especially new provider support and pricing updates.


License

MIT © Pinaki Mishra. See LICENSE.


Everything Langfuse and Braintrust do. Zero cloud. Zero signup. Zero cost.

GitHub · PyPI · Docs · Issues