Get started with Vantra

Add full observability to any Python AI agent in under 5 minutes. No changes to your agent logic required.

Quickstart

Install the SDK

bash

pip install vantra

Get your API key

Go to Settings → API Keys and create a key.

Add 3 lines to your agent

python

import vantra

vantra.init(api_key="van_live_...", project="my-agent")

@vantra.trace
def run_agent(message: str):
    return agent.run(message)  # your existing code

Run your agent and open the dashboard

Every trace appears in your Traces dashboard within seconds. OpenAI and Anthropic calls are captured automatically.

Installation

Python 3.8+ or Node.js 14+. Both SDKs are open source on GitHub.

bash

# Python
pip install vantra

# Node.js / TypeScript
npm install vantra-sdk

vantra.init()

Call once at the start of your application, before any traced functions run.

python

vantra.init(
    api_key="van_live_...",   # required
    project="my-agent",       # optional — groups traces in dashboard
    capture_io=False,         # optional — suppress prompt/response capture
)

Parameter	Type	Required	Description
api_key	str	Yes	Your Vantra API key from Settings
project	str	No	Project name shown in the dashboard
capture_io	bool	No	Set to False to suppress prompt and response capture. Useful for sensitive workloads. Default: True

@vantra.trace

Decorator that wraps a function as a root trace. Every call creates a new trace in the dashboard with timing, status, and any nested spans.

python

@vantra.trace
def run_agent(message: str) -> str:
    result = call_llm(message)
    return result

# Also works on async functions
@vantra.trace
async def run_agent_async(message: str) -> str:
    result = await call_llm_async(message)
    return result

Note: If the decorated function raises an exception, the trace is automatically marked as error and the error message is captured.

Prompt versioning

Pass a prompt_version string to tag traces with the version of your prompt. In the dashboard you can filter by version or use the Compare versions page to see error rate, latency, and cost side by side across versions.

python

# Python
@vantra.trace(name="my-agent", prompt_version="v2")
def run_agent(message: str) -> str:
    ...

# Also works on async functions
@vantra.trace(name="my-agent", prompt_version="v2")
async def run_agent_async(message: str) -> str:
    ...

typescript

// TypeScript
const runAgent = trace(async function runAgent(message: string) {
  return agent.run(message)
}, { promptVersion: 'v2' })

Parameter	Type	Description
prompt_version	str	Any string up to 100 chars — "v1", "v2", a git SHA, etc.

Note: Versions are stored on the trace and never change after the run. If you need to compare versions, go to Traces → Compare versions.

vantra.span()

Context manager for creating child spans inside a trace. Use this to instrument individual steps — tool calls, retrievals, chains.

python

@vantra.trace
def run_agent(message: str) -> str:
    with vantra.span("search_knowledge", kind="tool") as span:
        results = search(message)
        span.set_input({"query": message})
        span.set_output({"results": results})

    with vantra.span("generate_response", kind="llm") as span:
        response = llm.chat(message, context=results)
        span.set_output({"response": response})
        span.set_tokens(input_tokens=320, output_tokens=180)

    return response

Parameter	Type	Description
name	str	Span name shown in the waterfall
kind	str	"llm" \| "tool" \| "retrieval" \| "chain" \| "agent"

Methods available on the span context object:

Method	Description
span.set_input(data)	Log the input to this span (any JSON-serializable value, truncated at 2KB)
span.set_output(data)	Log the output from this span
span.set_tokens(input_tokens, output_tokens)	Manually record token counts — useful when using a model not auto-patched by Vantra

OpenAI auto-patch

Vantra automatically patches the OpenAI client after vantra.init() is called. Every chat.completions.create call is captured as an LLM span — including tokens, cost, model, and latency.

python

import vantra
import openai

vantra.init(api_key="van_live_...", project="my-agent")

client = openai.OpenAI()

@vantra.trace
def ask(question: str) -> str:
    # This call is automatically captured — no extra code needed
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

Anthropic auto-patch

Same as OpenAI — Anthropic's messages.create is automatically captured.

python

import vantra
import anthropic

vantra.init(api_key="van_live_...", project="my-agent")

client = anthropic.Anthropic()

@vantra.trace
def ask(question: str) -> str:
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": question}]
    )
    return response.content[0].text

Streaming

Streaming calls are captured automatically — no changes to your code needed. Tokens, cost, and assembled output are recorded when the stream finishes.

python

# Python — OpenAI streaming
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": question}],
    stream=True,
)
for chunk in stream:   # captured transparently
    print(chunk.choices[0].delta.content or "", end="")

# Python — Anthropic streaming
stream = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": question}],
    stream=True,
)
for event in stream:   # captured transparently
    if event.type == "content_block_delta":
        print(event.delta.text, end="")

typescript

// TypeScript — OpenAI streaming
const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: question }],
  stream: true,
})
for await (const chunk of stream) {   // captured transparently
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
}

// TypeScript — Anthropic streaming
const stream = await client.messages.create({
  model: 'claude-haiku-4-5',
  max_tokens: 1024,
  messages: [{ role: 'user', content: question }],
  stream: true,
})
for await (const event of stream) {   // captured transparently
  if (event.type === 'content_block_delta') {
    process.stdout.write(event.delta.text)
  }
}

Note: OpenAI streaming automatically injects stream_options: {include_usage: true} so token counts are always available. You don't need to set this yourself.

Full example

A complete support agent with nested spans, tool calls, and automatic LLM capture:

python

import vantra
import openai

vantra.init(api_key="van_live_...", project="support-agent")

client = openai.OpenAI()

def search_knowledge_base(query: str) -> list[str]:
    # your retrieval logic
    return ["relevant doc 1", "relevant doc 2"]

def send_email(to: str, body: str) -> bool:
    # your email logic
    return True

@vantra.trace
def handle_support_ticket(ticket: dict) -> str:
    # Step 1: classify
    with vantra.span("classify", kind="llm"):
        classification = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Classify this support ticket."},
                {"role": "user", "content": ticket["message"]},
            ]
        )
        category = classification.choices[0].message.content

    # Step 2: search
    with vantra.span("search_kb", kind="tool") as span:
        docs = search_knowledge_base(ticket["message"])
        span.set_output({"docs": docs, "count": len(docs)})

    # Step 3: respond
    with vantra.span("generate_response", kind="llm"):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": f"You are a support agent. Context: {docs}"},
                {"role": "user", "content": ticket["message"]},
            ]
        )
        reply = response.choices[0].message.content

    # Step 4: send
    with vantra.span("send_email", kind="tool"):
        send_email(ticket["email"], reply)

    return reply


if __name__ == "__main__":
    handle_support_ticket({
        "message": "My payment isn't going through",
        "email": "user@example.com",
    })

FAQ

No. All plans are per organization, not per user. Add your whole team without paying more.

No. All trace data is sent in a background thread with a queue. Your agent function returns immediately. Vantra never blocks the main thread.

Yes. OpenAI and Anthropic streaming responses are fully supported in both Python and TypeScript. Token counts and cost are captured as the stream completes.

Yes. LLM calls inside LangChain and LlamaIndex chains are auto-patched, so token usage and cost are captured automatically without any extra instrumentation. Use @vantra.trace on your top-level function and vantra.span() for individual steps.

Every individual span you send counts: LLM calls, tool invocations, vantra.span() blocks, and top-level @vantra.trace calls. You control what gets instrumented, so you only pay for what you track.

Yes. Inputs and outputs are captured so you can inspect them in the trace waterfall. Payloads over 2KB are truncated. Set capture_io=False in vantra.init() to disable input/output capture entirely.

Trace data is stored securely and never used to train models. You can disable input/output capture with capture_io=False if you are handling sensitive data.

Spans are queued in memory and retried. If they fail after retries, they are silently dropped. Your agent keeps running either way.

Go to Settings, then API Keys, then Create key. Keys start with van_live_.

Ready to get started?

Free for up to 50K spans/month. No credit card required.

Create free account