Concepts

PromptOps organizes prompt management into a four-stage operations pipeline. Understanding these stages is key to getting the most out of the platform.

The Operations Pipeline

Every prompt flows through four stages: Register, Optimize, Benchmark, and Deploy. Each stage can be used independently, but they work best together.

Register

Version-control your prompts with full history, branching, and rollback.

Optimize

Automatically rewrite and test prompts to maximize an objective metric.

Benchmark

Compare prompt performance across multiple LLMs with your own data.

Deploy

Ship the best prompt+model combination via canary rollouts with monitoring.

Prompts as Code

PromptOps treats prompts like source code. Every change is versioned (v1.0, v1.1, v2.0), branches allow safe experimentation, and you can diff any two versions to see exactly what changed.

Optimization Objectives

When you optimize a prompt, you define an objective like "accuracy", "relevance", or "cost". PromptOps iteratively rewrites your prompt, runs it against your dataset, and scores each iteration. The best-performing version becomes your new baseline.

Benchmarking Methodology

Benchmarks run your prompt against every specified model using your own dataset — not generic benchmarks. You get accuracy, latency, and cost metrics for each model, letting you make data-driven decisions about which model to deploy.

Canary Deployments

When deploying, PromptOps uses a canary strategy: first 0% traffic (staging), then 10% (canary), then 100% (production). At each stage, monitoring thresholds (latency, accuracy, cost) must pass before promoting. If a threshold fails, PromptOps automatically rolls back.