AI Agent Spec Template

Devplan

Home

Docs

Approach

Pricing

About

Blog

Resources

Get started

Devplan

Resources

AI Agent Spec Template

A spec tells an AI agent what to build. A guardrail tells it what not to do. Most teams write neither well, which is why AI-assisted development produces so much rework. The difference between a vague feature request and an AI-executable spec comes down to five things: a locked problem statement, acceptance criteria written before solutions, explicit constraints, defined edge cases, and a clear definition of done. Get those right, and agents build what you meant. Skip them, and you spend the next sprint fixing what they assumed.

Quick summary: This guide covers what makes a spec AI-executable, how to write guardrails that hold up in practice, a stage-by-stage spec template you can use today, and the most common mistakes teams make when they skip structure.

What Makes a Spec "AI-Executable"?

An AI-executable spec is one where an agent can complete the task without needing to make a judgment call you didn't intend. That means every ambiguity your spec leaves open is a decision the agent makes on its own, using assumptions you never reviewed.

Most specs fail because they describe the solution, not the problem. An agent given "build a dashboard with filters" will build something. It just won't be what you meant. An agent given a clear problem, acceptance criteria, defined inputs and outputs, and explicit constraints will build something close to what you'd have built yourself.

This is why intent is the actual bottleneck in AI development — not the model, not the tooling. If you want to go deeper on that, Why Intent Is the New Bottleneck in AI Development is worth reading first.

The three signals that a spec is agent-ready:

You can read the acceptance criteria and know immediately whether the output passes or fails
Every system dependency is named, not implied
Edge cases are listed, not assumed away

When Do You Need a Full Spec vs. a Guardrail?

Not every task needs a full spec. Knowing which format to use saves time and keeps agents focused. For lighter-weight planning that lives inside your existing workflow, the Linear Product Brief Template is a good starting point before you graduate to a full spec.

A Spec

What it is: A complete description of what to build, why, and how success is measured
Use when: Building a new feature, integrating a system, or making changes that touch multiple components
What it contains: Problem statement, acceptance criteria, constraints, edge cases, definition of done
Length: 300 to 2,000 words depending on complexity
Who writes it: PM, tech lead, or founding engineer

A Guardrail

What it is: A constraint that limits agent behavior within a defined scope
Use when: Scoping an existing task, preventing drift, or setting non-negotiable limits
What it contains: What the agent must not do, what counts as out of scope, fallback behavior
Length: Usually a short list, 5 to 15 rules
Who writes it: Usually the person delegating the task

The short version: write a spec when you're starting something. Write guardrails when you're scoping something already in motion.

Spec Template by Development Stage

The most common spec mistake is writing a pre-build spec for an idea you haven't validated yet. Stage-appropriate depth is everything. Write too little too early and you're guessing. Write too much too early and you're locking in decisions before you have the information to make them. This is one of the core ideas behind spec-driven development — the spec is a living document, not a one-time artifact.

For tasks that are fully pre-build and require detailed architecture planning, the Technical Design Document Template covers the structural layer beneath a spec.

Early Exploration — Focus: Why are we building this?

Problem statement: Required, detailed
Solution description: Optional, directional
Acceptance criteria: Not yet
Constraints: Known blockers only
Edge cases: Flagged as open questions
Dependencies: Named but not detailed
Definition of done: Not yet

Mid-Design — Focus: What are we building?

Problem statement: Required, refined
Solution description: Required
Acceptance criteria: High-level
Constraints: Technical and business
Edge cases: Partially resolved
Dependencies: Mapped
Definition of done: Draft

Pre-Build — Focus: How exactly does it work?

Problem statement: Required, locked
Solution description: Required, specific
Acceptance criteria: Detailed and testable
Constraints: All constraints explicit
Edge cases: Fully documented
Dependencies: Fully specified
Definition of done: Required

If you're writing a spec and you don't know what stage you're in, ask: has the team agreed on why we're building this? If not, you're in early exploration, regardless of how much detail you have.

How to Write Acceptance Criteria That Agents Can Execute

Acceptance criteria are the most important part of any AI-executable spec. They are the test. If an agent can read your acceptance criteria and know with certainty whether its output passes or fails, the rest of the spec fills in around that.

The format that works: Given / When / Then.

Given — the starting state or context
When — the action or trigger
Then — the expected outcome, specific and verifiable

Bad vs. Good Acceptance Criteria

Bad: "The filter should work correctly" Good: "Given a list of 500 items, when a user applies a date filter for the last 30 days, then only items with a created_at date within that range are returned, sorted by newest first"

Bad: "Handle errors gracefully" Good: "When an API call fails, the UI displays a non-blocking error message within 300ms and retries once automatically after 2 seconds"

Bad: "Make it fast" Good: "P95 response time is under 400ms for all search queries with up to 10,000 indexed items"

Bad: "The form should validate" Good: "Given an email field left blank, when the user clicks Submit, then the field is outlined in red and an inline error reads 'Email is required' before any network request is made"

Step-by-Step: Writing Acceptance Criteria That Hold Up

Start with the user action or system trigger, not the feature name
State the expected output in terms a test could verify
Define what "done" looks like for edge cases, not just the happy path
Avoid words like "correctly," "properly," "appropriately" — replace them with measurable conditions
Write one criterion per behavior — never bundle two conditions into one statement
Read each criterion and ask: could an agent complete this without guessing? If not, rewrite it

What Guardrails Should Actually Contain

Guardrails are constraints, not instructions. The goal is to define the walls of the task so the agent operates inside a space you've reviewed, not one it constructed on its own.

A complete guardrail set has six components:

Scope boundary — what the agent is and is not allowed to modify or touch
Prohibited actions — specific things the agent must never do, with no exceptions
Fallback behavior — what the agent should do when it hits an edge case outside its scope (ask, stop, log, or default to a defined state)
Dependency constraints — which systems, APIs, or files the agent may or may not interact with
Output format requirements — what the output must look like for the task to be considered complete
Escalation trigger — the condition under which the agent should stop and surface a decision to a human rather than proceeding

Guardrail Example: Refactoring Task

Scope: Refactor authentication middleware only. Do not modify route handlers, database schemas, or session storage logic.

Prohibited: Do not remove existing error handling. Do not change public-facing API response shapes.

Fallback: If a dependency outside this scope needs to change for the refactor to work, stop and log the blocker. Do not proceed.

Output format: Refactored files only, with a summary comment at the top of each changed file describing what changed and why.

Escalation trigger: If the refactor requires touching more than 3 files outside the authentication directory, surface the decision before continuing.

Common Spec and Guardrail Mistakes (And How to Fix Them)

Writing the solution before locking the problem

What happens: agents build what you described, not what you needed. Fix: write a one-paragraph problem statement and get explicit sign-off before writing any solution details.

Acceptance criteria that describe behavior, not outcomes

What happens: agents pass their own tests. Fix: rewrite each criterion as a verifiable condition with no interpretation required.

No edge cases documented

What happens: agents make judgment calls you didn't intend. Fix: list at least 3 edge cases per feature, even if the answer is "out of scope for now."

Guardrails that say "don't break anything"

What happens: the agent has no idea what "anything" means. Fix: name specific files, systems, or behaviors that must not change.

Specs written post-hoc to document what was built

What happens: no alignment benefit, only documentation overhead. Fix: write specs before the build, even if rough. A bad spec written before is better than a perfect spec written after.

Stage mismatch

What happens: pre-build detail on a half-validated idea locks in wrong decisions. Fix: match spec depth to the current stage. Explore the why before you spec the how.

How Engineers, Product Managers, and AI Agents Should Each Read a Spec

Specs are read by different people for different reasons. The content stays consistent, but the emphasis shifts depending on who's in the room. For a broader look at what separates teams that get this right from those that don't, What Separates Good AI Dev Teams From Great Ones covers the team habits underneath it.

For engineers: Lead with the technical constraint section. Architects care about what they can't touch as much as what they're building. Put dependencies, scope limits, and non-functional requirements (performance, security, compatibility) near the top.

For product managers: Lead with the problem statement and expected outcomes. PMs need to know the why is locked before they trust the what. Include success metrics in the spec itself, not in a separate doc.

For AI agents: Structure matters more than prose. Agents parse specs more reliably when information is broken into labeled sections with explicit headers. Avoid paragraphs that require inference. Prefer lists and if/then conditions over narrative explanation.

Try It in Devplan

Devplan auto-generates specs and guardrails from structured prompts, syncs them with your Linear or Jira tickets, and keeps them live through the build. You're not managing a document, you're managing the source of truth the agent actually reads. If you want to see how it works in a real codebase, Using Devplan in Practice walks through a day-to-day example.

Get started with Devplan

Or copy this structure into your next Notion or Linear doc and use it manually. Either way, the spec does the work.

Frequently Asked Questions

What is the difference between a spec and a PRD?

A PRD (Product Requirements Document) describes what a product should do and why, usually written for cross-functional alignment before engineering starts. A spec is more granular and technical, describing how a specific feature or task should be implemented. For AI-assisted development, specs need to go further than traditional PRDs: they need acceptance criteria and constraints specific enough that an agent can complete the task without making judgment calls on your behalf. If you're looking for a starting point, the PRD Template and the Standard Project Template are both good references.

How long should a spec be?

It depends on the stage and complexity. A one-pager brief should be 300 to 600 words. A feature PRD runs 600 to 1,200 words. A technical RFC or architecture spec can reach 1,000 to 2,500 words. Length is not a proxy for quality. A 400-word spec with sharp acceptance criteria will produce better agent output than a 2,000-word spec full of vague prose. Write until you've answered why, what, and how at the right level of detail for your current stage, then stop.

What is the difference between a guardrail and an acceptance criterion?

Acceptance criteria define what a successful output looks like. Guardrails define what the agent must not do to produce it. They work together: acceptance criteria set the target, guardrails set the walls. A spec without guardrails leaves the agent free to reach the target any way it chooses, including ways that break something else. A guardrail without acceptance criteria gives the agent constraints but no definition of done.

Do you need a spec for every feature?

No. Bug fixes, minor UI changes, and clearly scoped one-liners don't need a full spec. The trigger for writing a spec is complexity, not size. If a task touches more than one system, involves a decision that a second engineer might make differently, or will be executed by an AI agent without review at every step, write a spec. The cost of writing it is almost always lower than the cost of the rework it prevents.

How do you write specs for AI agents specifically?

The core difference from a traditional spec is precision. Human engineers ask clarifying questions. Agents fill gaps with assumptions. That means every ambiguity in your spec is a place where the agent decides something you never reviewed. To write for agents: lock the problem statement before describing a solution, write acceptance criteria as verifiable conditions not descriptions, document edge cases explicitly, use labeled sections rather than narrative prose, and include a guardrail set that defines scope and fallback behavior. A spec good enough for a senior engineer to execute without questions is a spec good enough for an agent.

What should a definition of done include?

A definition of done should cover: all acceptance criteria pass, edge cases are handled and documented, dependencies are not broken (include a list of what to verify), output is in the required format, and the change is reviewable without additional context from the author. For agent-executed tasks, add: the agent's summary of what it changed matches the spec's intended scope. If the summary contains surprises, the spec wasn't tight enough.

Build better products faster.

We’re on a mission to transform how ambitious teams turn vision into software faster then ever before.

Get Started

Build better products faster.

We’re on a mission to transform how ambitious teams turn vision into software faster then ever before.

Get Started

Build better products faster.

We’re on a mission to transform how ambitious teams turn vision into software faster then ever before.

Get Started