PrompticPromptic

Introduction

Promptic helps you build better AI applications. It provides three core capabilities:

Tracing — Automatic OpenTelemetry-based observability for your LLM calls. See every request, response, token count, cost, and latency across OpenAI, Anthropic, Google, LangChain, and Cohere.

Prompt Optimization — Automated experiments that find the best prompt for your task. Define your inputs, expected outputs, and evaluation criteria, then let Promptic iterate toward the highest-scoring prompt.

Agent Evaluation — Structured evaluation of AI agents against datasets. Run your agent on test inputs, collect traces, and get AI-generated insights on quality, errors, and regressions.

How it works

  1. Instrument your application with the Python SDK. One line of code auto-captures all LLM calls.
  2. Organize traces into AI Components — logical groupings like "support-agent" or "classifier".
  3. Optimize prompts by running experiments with your training data and evaluation criteria.
  4. Deploy the best-performing prompt and fetch it at runtime from your code.
  5. Evaluate agents by running them against datasets and reviewing AI-generated insights.

Core concepts

ConceptDescription
AI ComponentA logical grouping for an LLM-powered feature (e.g., "email-classifier"). Traces, experiments, datasets, and deployments are scoped to a component.
ExperimentAn automated prompt optimization run. You provide observations (input variables and expected outputs), evaluators (scoring criteria), and a target model. Promptic finds the best prompt.
IterationA single optimization step within an experiment. Each iteration tests a candidate prompt and produces scores.
EvaluatorA scoring method for experiments: f1 (classification accuracy), referenceJudge / comparisonJudge / generalJudge (LLM-as-judge variants), similarity (text similarity), or structuredOutput (schema validation).
ObservationA set of input variables and the expected output, used as training data for experiments.
DeploymentA published prompt from a completed experiment, fetchable at runtime via the SDK or API.
DatasetA collection of traces grouped for evaluation. Created from the dashboard or auto-created via SDK tagging.
RunA batch of traces within a dataset, typically representing a single evaluation pass of your agent.
TraceA full record of an LLM interaction, including all spans (individual API calls), tokens, costs, and timing.

Choose your interface

Promptic offers three ways to interact with the platform:

  • Python SDK — The recommended way. Install the promptic-sdk package for tracing, API access, and a CLI.
  • REST API — Direct HTTP access to all endpoints. Use from any language.
  • Dashboard — Web UI for viewing traces, managing experiments, and reviewing evaluations.

Next steps