Resources

AI Agent Spec Template

A spec tells an AI agent what to build. A guardrail tells it what not to do. Most teams write neither well, which is why AI-assisted development produces so much rework. The difference between a vague feature request and an AI-executable spec comes down to five things: a locked problem statement, acceptance criteria written before solutions, explicit constraints, defined edge cases, and a clear definition of done. Get those right, and agents build what you meant. Skip them, and you spend the next sprint fixing what they assumed.


Quick summary: This guide covers what makes a spec AI-executable, how to write guardrails that hold up in practice, a stage-by-stage spec template you can use today, and the most common mistakes teams make when they skip structure.


What Makes a Spec "AI-Executable"?


An AI-executable spec is one where an agent can complete the task without needing to make a judgment call you didn't intend. That means every ambiguity your spec leaves open is a decision the agent makes on its own, using assumptions you never reviewed.


Most specs fail because they describe the solution, not the problem. An agent given "build a dashboard with filters" will build something. It just won't be what you meant. An agent given a clear problem, acceptance criteria, defined inputs and outputs, and explicit constraints will build something close to what you'd have built yourself.


This is why intent is the actual bottleneck in AI development — not the model, not the tooling. If you want to go deeper on that, Why Intent Is the New Bottleneck in AI Development is worth reading first.


The three signals that a spec is agent-ready:


  1. You can read the acceptance criteria and know immediately whether the output passes or fails

  2. Every system dependency is named, not implied

  3. Edge cases are listed, not assumed away


When Do You Need a Full Spec vs. a Guardrail?


Not every task needs a full spec. Knowing which format to use saves time and keeps agents focused. For lighter-weight planning that lives inside your existing workflow, the Linear Product Brief Template is a good starting point before you graduate to a full spec.


A Spec
  • What it is: A complete description of what to build, why, and how success is measured

  • Use when: Building a new feature, integrating a system, or making changes that touch multiple components

  • What it contains: Problem statement, acceptance criteria, constraints, edge cases, definition of done

  • Length: 300 to 2,000 words depending on complexity

  • Who writes it: PM, tech lead, or founding engineer


A Guardrail
  • What it is: A constraint that limits agent behavior within a defined scope

  • Use when: Scoping an existing task, preventing drift, or setting non-negotiable limits

  • What it contains: What the agent must not do, what counts as out of scope, fallback behavior

  • Length: Usually a short list, 5 to 15 rules

  • Who writes it: Usually the person delegating the task


The short version: write a spec when you're starting something. Write guardrails when you're scoping something already in motion.


Spec Template by Development Stage


The most common spec mistake is writing a pre-build spec for an idea you haven't validated yet. Stage-appropriate depth is everything. Write too little too early and you're guessing. Write too much too early and you're locking in decisions before you have the information to make them. This is one of the core ideas behind spec-driven development — the spec is a living document, not a one-time artifact.


For tasks that are fully pre-build and require detailed architecture planning, the Technical Design Document Template covers the structural layer beneath a spec.


Early Exploration — Focus: Why are we building this?
  • Problem statement: Required, detailed

  • Solution description: Optional, directional

  • Acceptance criteria: Not yet

  • Constraints: Known blockers only

  • Edge cases: Flagged as open questions

  • Dependencies: Named but not detailed

  • Definition of done: Not yet


Mid-Design — Focus: What are we building?
  • Problem statement: Required, refined

  • Solution description: Required

  • Acceptance criteria: High-level

  • Constraints: Technical and business

  • Edge cases: Partially resolved

  • Dependencies: Mapped

  • Definition of done: Draft


Pre-Build — Focus: How exactly does it work?
  • Problem statement: Required, locked

  • Solution description: Required, specific

  • Acceptance criteria: Detailed and testable

  • Constraints: All constraints explicit

  • Edge cases: Fully documented

  • Dependencies: Fully specified

  • Definition of done: Required


If you're writing a spec and you don't know what stage you're in, ask: has the team agreed on why we're building this? If not, you're in early exploration, regardless of how much detail you have.


How to Write Acceptance Criteria That Agents Can Execute


Acceptance criteria are the most important part of any AI-executable spec. They are the test. If an agent can read your acceptance criteria and know with certainty whether its output passes or fails, the rest of the spec fills in around that.


The format that works: Given / When / Then.

  • Given — the starting state or context

  • When — the action or trigger

  • Then — the expected outcome, specific and verifiable


Bad vs. Good Acceptance Criteria


Bad: "The filter should work correctly" Good: "Given a list of 500 items, when a user applies a date filter for the last 30 days, then only items with a created_at date within that range are returned, sorted by newest first"


Bad: "Handle errors gracefully" Good: "When an API call fails, the UI displays a non-blocking error message within 300ms and retries once automatically after 2 seconds"


Bad: "Make it fast" Good: "P95 response time is under 400ms for all search queries with up to 10,000 indexed items"


Bad: "The form should validate" Good: "Given an email field left blank, when the user clicks Submit, then the field is outlined in red and an inline error reads 'Email is required' before any network request is made"


Step-by-Step: Writing Acceptance Criteria That Hold Up
  1. Start with the user action or system trigger, not the feature name

  2. State the expected output in terms a test could verify

  3. Define what "done" looks like for edge cases, not just the happy path

  4. Avoid words like "correctly," "properly," "appropriately" — replace them with measurable conditions

  5. Write one criterion per behavior — never bundle two conditions into one statement

  6. Read each criterion and ask: could an agent complete this without guessing? If not, rewrite it


What Guardrails Should Actually Contain


Guardrails are constraints, not instructions. The goal is to define the walls of the task so the agent operates inside a space you've reviewed, not one it constructed on its own.


A complete guardrail set has six components:

  1. Scope boundary — what the agent is and is not allowed to modify or touch

  2. Prohibited actions — specific things the agent must never do, with no exceptions

  3. Fallback behavior — what the agent should do when it hits an edge case outside its scope (ask, stop, log, or default to a defined state)

  4. Dependency constraints — which systems, APIs, or files the agent may or may not interact with

  5. Output format requirements — what the output must look like for the task to be considered complete

  6. Escalation trigger — the condition under which the agent should stop and surface a decision to a human rather than proceeding


Guardrail Example: Refactoring Task


Scope: Refactor authentication middleware only. Do not modify route handlers, database schemas, or session storage logic.


Prohibited: Do not remove existing error handling. Do not change public-facing API response shapes.


Fallback: If a dependency outside this scope needs to change for the refactor to work, stop and log the blocker. Do not proceed.


Output format: Refactored files only, with a summary comment at the top of each changed file describing what changed and why.


Escalation trigger: If the refactor requires touching more than 3 files outside the authentication directory, surface the decision before continuing.


Common Spec and Guardrail Mistakes (And How to Fix Them)


Writing the solution before locking the problem


What happens: agents build what you described, not what you needed. Fix: write a one-paragraph problem statement and get explicit sign-off before writing any solution details.


Acceptance criteria that describe behavior, not outcomes


What happens: agents pass their own tests. Fix: rewrite each criterion as a verifiable condition with no interpretation required.


No edge cases documented


What happens: agents make judgment calls you didn't intend. Fix: list at least 3 edge cases per feature, even if the answer is "out of scope for now."


Guardrails that say "don't break anything"


What happens: the agent has no idea what "anything" means. Fix: name specific files, systems, or behaviors that must not change.


Specs written post-hoc to document what was built


What happens: no alignment benefit, only documentation overhead. Fix: write specs before the build, even if rough. A bad spec written before is better than a perfect spec written after.


Stage mismatch


What happens: pre-build detail on a half-validated idea locks in wrong decisions. Fix: match spec depth to the current stage. Explore the why before you spec the how.


How Engineers, Product Managers, and AI Agents Should Each Read a Spec


Specs are read by different people for different reasons. The content stays consistent, but the emphasis shifts depending on who's in the room. For a broader look at what separates teams that get this right from those that don't, What Separates Good AI Dev Teams From Great Ones covers the team habits underneath it.


For engineers: Lead with the technical constraint section. Architects care about what they can't touch as much as what they're building. Put dependencies, scope limits, and non-functional requirements (performance, security, compatibility) near the top.


For product managers: Lead with the problem statement and expected outcomes. PMs need to know the why is locked before they trust the what. Include success metrics in the spec itself, not in a separate doc.


For AI agents: Structure matters more than prose. Agents parse specs more reliably when information is broken into labeled sections with explicit headers. Avoid paragraphs that require inference. Prefer lists and if/then conditions over narrative explanation.


Try It in Devplan


Devplan auto-generates specs and guardrails from structured prompts, syncs them with your Linear or Jira tickets, and keeps them live through the build. You're not managing a document, you're managing the source of truth the agent actually reads. If you want to see how it works in a real codebase, Using Devplan in Practice walks through a day-to-day example.


Get started with Devplan


Or copy this structure into your next Notion or Linear doc and use it manually. Either way, the spec does the work.


Frequently Asked Questions


What is the difference between a spec and a PRD?


A PRD (Product Requirements Document) describes what a product should do and why, usually written for cross-functional alignment before engineering starts. A spec is more granular and technical, describing how a specific feature or task should be implemented. For AI-assisted development, specs need to go further than traditional PRDs: they need acceptance criteria and constraints specific enough that an agent can complete the task without making judgment calls on your behalf. If you're looking for a starting point, the PRD Template and the Standard Project Template are both good references.


How long should a spec be?


It depends on the stage and complexity. A one-pager brief should be 300 to 600 words. A feature PRD runs 600 to 1,200 words. A technical RFC or architecture spec can reach 1,000 to 2,500 words. Length is not a proxy for quality. A 400-word spec with sharp acceptance criteria will produce better agent output than a 2,000-word spec full of vague prose. Write until you've answered why, what, and how at the right level of detail for your current stage, then stop.


What is the difference between a guardrail and an acceptance criterion?


Acceptance criteria define what a successful output looks like. Guardrails define what the agent must not do to produce it. They work together: acceptance criteria set the target, guardrails set the walls. A spec without guardrails leaves the agent free to reach the target any way it chooses, including ways that break something else. A guardrail without acceptance criteria gives the agent constraints but no definition of done.


Do you need a spec for every feature?


No. Bug fixes, minor UI changes, and clearly scoped one-liners don't need a full spec. The trigger for writing a spec is complexity, not size. If a task touches more than one system, involves a decision that a second engineer might make differently, or will be executed by an AI agent without review at every step, write a spec. The cost of writing it is almost always lower than the cost of the rework it prevents.


How do you write specs for AI agents specifically?


The core difference from a traditional spec is precision. Human engineers ask clarifying questions. Agents fill gaps with assumptions. That means every ambiguity in your spec is a place where the agent decides something you never reviewed. To write for agents: lock the problem statement before describing a solution, write acceptance criteria as verifiable conditions not descriptions, document edge cases explicitly, use labeled sections rather than narrative prose, and include a guardrail set that defines scope and fallback behavior. A spec good enough for a senior engineer to execute without questions is a spec good enough for an agent.


What should a definition of done include?


A definition of done should cover: all acceptance criteria pass, edge cases are handled and documented, dependencies are not broken (include a list of what to verify), output is in the required format, and the change is reviewable without additional context from the author. For agent-executed tasks, add: the agent's summary of what it changed matches the spec's intended scope. If the summary contains surprises, the spec wasn't tight enough.

Build better products faster.

We’re on a mission to transform how ambitious teams turn vision into software faster then ever before.

Build better products faster.

We’re on a mission to transform how ambitious teams turn vision into software faster then ever before.

Build better products faster.

We’re on a mission to transform how ambitious teams turn vision into software faster then ever before.