Introduction

Promptic helps you build better AI applications. It provides three core capabilities:

Tracing — Automatic OpenTelemetry-based observability for your LLM calls. See every request, response, token count, cost, and latency across OpenAI, Anthropic, Google, LangChain, and Cohere.

Prompt Optimization — Automated experiments that find the best prompt for your task. Define your inputs, expected outputs, and evaluation criteria, then let Promptic iterate toward the highest-scoring prompt.

Agent Evaluation — Structured evaluation of AI agents against datasets. Run your agent on test inputs, collect traces, and get AI-generated insights on quality, errors, and regressions.

How it works

Instrument your application with the Python SDK. One line of code auto-captures all LLM calls.
Organize traces into AI Components — logical groupings like "support-agent" or "classifier".
Optimize prompts by running experiments with your training data and evaluation criteria.
Deploy the best-performing prompt and fetch it at runtime from your code.
Evaluate agents by running them against datasets and reviewing AI-generated insights.

Core concepts

Concept	Description
AI Component	A logical grouping for an LLM-powered feature (e.g., "email-classifier"). Traces, experiments, datasets, and deployments are scoped to a component.
Experiment	An automated prompt optimization run. You provide observations (input variables and expected outputs), evaluators (scoring criteria), and a target model. Promptic finds the best prompt.
Iteration	A single optimization step within an experiment. Each iteration tests a candidate prompt and produces scores.
Evaluator	A scoring method for experiments: `f1` (classification accuracy), `referenceJudge` / `comparisonJudge` / `generalJudge` (LLM-as-judge variants), `similarity` (text similarity), or `structuredOutput` (schema validation).
Observation	A set of input variables and the expected output, used as training data for experiments.
Deployment	A published prompt from a completed experiment, fetchable at runtime via the SDK or API.
Dataset	A collection of traces grouped for evaluation. Created from the dashboard or auto-created via SDK tagging.
Run	A batch of traces within a dataset, typically representing a single evaluation pass of your agent.
Trace	A full record of an LLM interaction, including all spans (individual API calls), tokens, costs, and timing.

Choose your interface

Promptic offers three ways to interact with the platform:

Python SDK — The recommended way. Install the promptic-sdk package for tracing, API access, and a CLI.
REST API — Direct HTTP access to all endpoints. Use from any language.
Dashboard — Web UI for viewing traces, managing experiments, and reviewing evaluations.

Introduction

How it works

Core concepts

Choose your interface

Next steps

Quickstart

Tracing

Prompt Optimization

Agent Evaluation

API Reference

On this page