Introduction
Promptic helps you build better AI applications. It provides three core capabilities:
Tracing — Automatic OpenTelemetry-based observability for your LLM calls. See every request, response, token count, cost, and latency across OpenAI, Anthropic, Google, LangChain, and Cohere.
Prompt Optimization — Automated experiments that find the best prompt for your task. Define your inputs, expected outputs, and evaluation criteria, then let Promptic iterate toward the highest-scoring prompt.
Agent Evaluation — Structured evaluation of AI agents against datasets. Run your agent on test inputs, collect traces, and get AI-generated insights on quality, errors, and regressions.
How it works
- Instrument your application with the Python SDK. One line of code auto-captures all LLM calls.
- Organize traces into AI Components — logical groupings like "support-agent" or "classifier".
- Optimize prompts by running experiments with your training data and evaluation criteria.
- Deploy the best-performing prompt and fetch it at runtime from your code.
- Evaluate agents by running them against datasets and reviewing AI-generated insights.
Core concepts
| Concept | Description |
|---|---|
| AI Component | A logical grouping for an LLM-powered feature (e.g., "email-classifier"). Traces, experiments, datasets, and deployments are scoped to a component. |
| Experiment | An automated prompt optimization run. You provide observations (input variables and expected outputs), evaluators (scoring criteria), and a target model. Promptic finds the best prompt. |
| Iteration | A single optimization step within an experiment. Each iteration tests a candidate prompt and produces scores. |
| Evaluator | A scoring method for experiments: f1 (classification accuracy), referenceJudge / comparisonJudge / generalJudge (LLM-as-judge variants), similarity (text similarity), or structuredOutput (schema validation). |
| Observation | A set of input variables and the expected output, used as training data for experiments. |
| Deployment | A published prompt from a completed experiment, fetchable at runtime via the SDK or API. |
| Dataset | A collection of traces grouped for evaluation. Created from the dashboard or auto-created via SDK tagging. |
| Run | A batch of traces within a dataset, typically representing a single evaluation pass of your agent. |
| Trace | A full record of an LLM interaction, including all spans (individual API calls), tokens, costs, and timing. |
Choose your interface
Promptic offers three ways to interact with the platform:
- Python SDK — The recommended way. Install the
promptic-sdkpackage for tracing, API access, and a CLI. - REST API — Direct HTTP access to all endpoints. Use from any language.
- Dashboard — Web UI for viewing traces, managing experiments, and reviewing evaluations.