Get started with Vantra
Add full observability to any Python AI agent in under 5 minutes. No changes to your agent logic required.
Quickstart
Install the SDK
pip install vantra
Get your API key
Go to Settings → API Keys and create a key.
Add 3 lines to your agent
import vantra
vantra.init(api_key="van_live_...", project="my-agent")
@vantra.trace
def run_agent(message: str):
return agent.run(message) # your existing codeRun your agent and open the dashboard
Every trace appears in your Traces dashboard within seconds. OpenAI and Anthropic calls are captured automatically.
Installation
Python 3.8+ or Node.js 14+. Both SDKs are open source on GitHub.
# Python
pip install vantra
# Node.js / TypeScript
npm install vantra-sdkvantra.init()
Call once at the start of your application, before any traced functions run.
vantra.init(
api_key="van_live_...", # required
project="my-agent", # optional — groups traces in dashboard
capture_io=False, # optional — suppress prompt/response capture
)| Parameter | Type | Required | Description |
|---|---|---|---|
| api_key | str | Yes | Your Vantra API key from Settings |
| project | str | No | Project name shown in the dashboard |
| capture_io | bool | No | Set to False to suppress prompt and response capture. Useful for sensitive workloads. Default: True |
@vantra.trace
Decorator that wraps a function as a root trace. Every call creates a new trace in the dashboard with timing, status, and any nested spans.
@vantra.trace
def run_agent(message: str) -> str:
result = call_llm(message)
return result
# Also works on async functions
@vantra.trace
async def run_agent_async(message: str) -> str:
result = await call_llm_async(message)
return resultPrompt versioning
Pass a prompt_version string to tag traces with the version of your prompt. In the dashboard you can filter by version or use the Compare versions page to see error rate, latency, and cost side by side across versions.
# Python
@vantra.trace(name="my-agent", prompt_version="v2")
def run_agent(message: str) -> str:
...
# Also works on async functions
@vantra.trace(name="my-agent", prompt_version="v2")
async def run_agent_async(message: str) -> str:
...// TypeScript
const runAgent = trace(async function runAgent(message: string) {
return agent.run(message)
}, { promptVersion: 'v2' })| Parameter | Type | Description |
|---|---|---|
| prompt_version | str | Any string up to 100 chars — "v1", "v2", a git SHA, etc. |
vantra.span()
Context manager for creating child spans inside a trace. Use this to instrument individual steps — tool calls, retrievals, chains.
@vantra.trace
def run_agent(message: str) -> str:
with vantra.span("search_knowledge", kind="tool") as span:
results = search(message)
span.set_input({"query": message})
span.set_output({"results": results})
with vantra.span("generate_response", kind="llm") as span:
response = llm.chat(message, context=results)
span.set_output({"response": response})
span.set_tokens(input_tokens=320, output_tokens=180)
return response| Parameter | Type | Description |
|---|---|---|
| name | str | Span name shown in the waterfall |
| kind | str | "llm" | "tool" | "retrieval" | "chain" | "agent" |
Methods available on the span context object:
| Method | Description |
|---|---|
| span.set_input(data) | Log the input to this span (any JSON-serializable value, truncated at 2KB) |
| span.set_output(data) | Log the output from this span |
| span.set_tokens(input_tokens, output_tokens) | Manually record token counts — useful when using a model not auto-patched by Vantra |
OpenAI auto-patch
Vantra automatically patches the OpenAI client after vantra.init() is called. Every chat.completions.create call is captured as an LLM span — including tokens, cost, model, and latency.
import vantra
import openai
vantra.init(api_key="van_live_...", project="my-agent")
client = openai.OpenAI()
@vantra.trace
def ask(question: str) -> str:
# This call is automatically captured — no extra code needed
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.contentAnthropic auto-patch
Same as OpenAI — Anthropic's messages.create is automatically captured.
import vantra
import anthropic
vantra.init(api_key="van_live_...", project="my-agent")
client = anthropic.Anthropic()
@vantra.trace
def ask(question: str) -> str:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": question}]
)
return response.content[0].textStreaming
Streaming calls are captured automatically — no changes to your code needed. Tokens, cost, and assembled output are recorded when the stream finishes.
# Python — OpenAI streaming
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}],
stream=True,
)
for chunk in stream: # captured transparently
print(chunk.choices[0].delta.content or "", end="")
# Python — Anthropic streaming
stream = client.messages.create(
model="claude-haiku-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": question}],
stream=True,
)
for event in stream: # captured transparently
if event.type == "content_block_delta":
print(event.delta.text, end="")// TypeScript — OpenAI streaming
const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: question }],
stream: true,
})
for await (const chunk of stream) { // captured transparently
process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
}
// TypeScript — Anthropic streaming
const stream = await client.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
messages: [{ role: 'user', content: question }],
stream: true,
})
for await (const event of stream) { // captured transparently
if (event.type === 'content_block_delta') {
process.stdout.write(event.delta.text)
}
}stream_options: {include_usage: true} so token counts are always available. You don't need to set this yourself.Full example
A complete support agent with nested spans, tool calls, and automatic LLM capture:
import vantra
import openai
vantra.init(api_key="van_live_...", project="support-agent")
client = openai.OpenAI()
def search_knowledge_base(query: str) -> list[str]:
# your retrieval logic
return ["relevant doc 1", "relevant doc 2"]
def send_email(to: str, body: str) -> bool:
# your email logic
return True
@vantra.trace
def handle_support_ticket(ticket: dict) -> str:
# Step 1: classify
with vantra.span("classify", kind="llm"):
classification = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Classify this support ticket."},
{"role": "user", "content": ticket["message"]},
]
)
category = classification.choices[0].message.content
# Step 2: search
with vantra.span("search_kb", kind="tool") as span:
docs = search_knowledge_base(ticket["message"])
span.set_output({"docs": docs, "count": len(docs)})
# Step 3: respond
with vantra.span("generate_response", kind="llm"):
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"You are a support agent. Context: {docs}"},
{"role": "user", "content": ticket["message"]},
]
)
reply = response.choices[0].message.content
# Step 4: send
with vantra.span("send_email", kind="tool"):
send_email(ticket["email"], reply)
return reply
if __name__ == "__main__":
handle_support_ticket({
"message": "My payment isn't going through",
"email": "user@example.com",
})FAQ
No. All plans are per organization, not per user. Add your whole team without paying more.
No. All trace data is sent in a background thread with a queue. Your agent function returns immediately. Vantra never blocks the main thread.
Yes. OpenAI and Anthropic streaming responses are fully supported in both Python and TypeScript. Token counts and cost are captured as the stream completes.
Yes. LLM calls inside LangChain and LlamaIndex chains are auto-patched, so token usage and cost are captured automatically without any extra instrumentation. Use @vantra.trace on your top-level function and vantra.span() for individual steps.
Every individual span you send counts: LLM calls, tool invocations, vantra.span() blocks, and top-level @vantra.trace calls. You control what gets instrumented, so you only pay for what you track.
Yes. Inputs and outputs are captured so you can inspect them in the trace waterfall. Payloads over 2KB are truncated. Set capture_io=False in vantra.init() to disable input/output capture entirely.
Trace data is stored securely and never used to train models. You can disable input/output capture with capture_io=False if you are handling sensitive data.
Spans are queued in memory and retried. If they fail after retries, they are silently dropped. Your agent keeps running either way.
Go to Settings, then API Keys, then Create key. Keys start with van_live_.