Agent Analytics Agent Analytics Agent-ready analytics
Guide

🦞 A/B Testing Your AI Agent Can Actually Use

Run browser-side experiments with declarative HTML variants, no heavy SDK, and an API your agent can drive end-to-end. Create tests, QA variants, read results, and ship winners.

🦞 A/B Testing Your AI Agent Can Actually Use

You want to test whether “Start Free Trial” converts better than “Sign Up.” With most A/B tools, that means a dashboard, an SDK, a snippet, and another place to remember to check later.

What if your AI agent could handle the whole loop?

A/B Testing, Agent-First

Agent Analytics has built-in browser-side experiments that your agent can create, wire up, QA, and review through the same API/CLI workflow it already uses for analytics.

That matters because the hard part usually isn’t launching a test. It’s remembering to:

  • define a real goal
  • wire the variant correctly
  • QA both branches
  • come back and decide what to ship

Agent Analytics is designed so your agent can do that work with you.

What makes it different

  • Declarative variants — for most tests, add data-aa-experiment and data-aa-variant-* attributes directly in HTML.
  • Goal-first workflow — create the experiment around a real business event like signup or checkout, not just page views.
  • QA-friendly forcing — load any variant directly with a URL param like ?aa_variant_signup_cta=new_cta.
  • Agent-driven lifecycle — create via CLI, MCP, or API; read results later and decide whether to keep running, pause, or complete the test.
  • Programmatic fallback — when text swaps are too limiting, use window.aa?.experiment() for more dynamic UI changes.

Experiments are available on paid plans. If you still need the base tracker install, start with the Getting Started guide.

The current docs recommend a simple pattern:

  1. Pick one UI element to test
  2. Pick one goal event that reflects a real outcome
  3. Create the experiment
  4. Wire the variant declaratively where possible
  5. QA both variants with forced URLs
  6. Read results and decide whether to keep running, pause, or complete it

That sounds obvious, but it’s what keeps tests interpretable.

Step 1: Choose one narrow change and one real goal

Good first experiments are usually a single CTA, headline, or pricing message tied to a real event.

Examples of good goals:

  • signup
  • checkout
  • trial_started
  • another core activation event

Avoid using page_view as the goal unless the page view itself is the outcome you care about.

A good prompt for your agent:

“I want to run an A/B test on my-site. Help me choose one page element to test and one goal event that reflects a real business outcome. Keep the scope to a single CTA or headline.”

Step 2: Make sure the goal event is already tracked

Before you create the experiment, make sure the goal event exists and is named consistently.

For simple interactions, the tracker docs now recommend declarative event tracking in markup instead of custom click handlers.

Example:

<button data-aa-event="signup" data-aa-event-plan="pro">
  Sign up for Pro
</button>

That sends a signup event with { plan: "pro" }.

If you prefer, your agent can also use window.aa?.track() when the event depends on runtime state.

Step 3: Create the experiment

Tell your agent:

“Create an experiment for the signup CTA on my-site with control and new_cta variants, using signup as the goal event.”

Or do it directly with the CLI:

npx @agent-analytics/cli experiments create my-site \
  --name signup_cta --variants control,new_cta --goal signup

A couple of small but important details from the docs:

  • Keep experiment names in snake_case
  • Keep the scope easy to explain later
  • signup_cta is much better than something vague like homepage_test

Step 4: Wire the variant declaratively in HTML

For most sites, this is the cleanest path. The existing text stays as control, and the variant lives in a data-aa-variant-* attribute.

<h1 data-aa-experiment="signup_cta"
    data-aa-variant-new_cta="Start free today">
  Start your free trial
</h1>

How it works:

  • the original element content is the control
  • data-aa-variant-{key} provides the replacement text for a non-control variant
  • multiple elements can share the same experiment name and stay in the same assigned variant

Example with multiple elements tied to one experiment:

<h1 data-aa-experiment="hero_test"
    data-aa-variant-b="Ship faster with AI">
  Build better products
</h1>

<p data-aa-experiment="hero_test"
   data-aa-variant-b="Your agent handles analytics while you code">
  Track what matters across all your projects
</p>

Want more than two variants? Add more data-aa-variant-* attributes.

<h1 data-aa-experiment="cta_test"
    data-aa-variant-b="Try it free"
    data-aa-variant-c="Get started now">
  Sign up today
</h1>

Step 5: Add the anti-flicker snippet

If you’re using declarative experiments, add this in <head> before tracker.js so visitors don’t briefly see the control before the assigned variant is applied:

<style>
  .aa-loading [data-aa-experiment] {
    visibility: hidden !important;
  }
</style>
<script>
  document.documentElement.classList.add('aa-loading');
  setTimeout(function () {
    document.documentElement.classList.remove('aa-loading');
  }, 3000);
</script>

A few details worth calling out:

  • it uses visibility: hidden, not display: none, so layout stays stable
  • it only hides experiment elements, not the whole page
  • the 3-second timeout is a safety fallback if config loading fails
  • tracker.js removes aa-loading after variants are applied

Step 6: QA both variants before sending real traffic

This part was missing from a lot of older experiment workflows, and it should not be skipped.

You can force variants with URL params:

  • ?aa_variant_signup_cta=control
  • ?aa_variant_signup_cta=new_cta

That makes it easy for your agent or your team to:

  • check that both versions actually render
  • verify the goal event still fires
  • confirm there are no broken styles or copy issues

A useful prompt:

“Show me how to force each variant locally so I can QA both versions, then verify that the signup event still fires correctly for each one.”

Step 7: Read results and make a decision

Once the experiment has real traffic, ask your agent:

“Check results for signup_cta and tell me whether we have enough data to pick a winner. If we do, recommend whether to keep running it, pause it, or complete it with a winner.”

Or check directly:

npx @agent-analytics/cli experiments get exp_abc123

The results include per-variant exposures, unique users, conversions, conversion rate, probability_best, lift, whether there is sufficient_data, and a recommendation.

Example shape:

signup_cta experiment results:

  control   — 2.1% conversion  | baseline
  new_cta   — 3.8% conversion  | +81% lift | 96% probability best

  Recommendation: ship new_cta

The key point: make the decision on the goal event, not on raw traffic.

A variant with more exposures is not automatically better if it does not improve the thing you actually care about.

The agent loop is the real advantage

The nice part is not just that experiments exist. It’s that your agent can operate the whole loop:

  1. Create — define the test and goal
  2. Implement — wire the markup or programmatic variant logic
  3. QA — force both variants and verify tracking
  4. Monitor — check results later through API, CLI, or MCP
  5. Decide — recommend keep running, pause, or complete with a winner
  6. Repeat — suggest the next focused test

That is the actual difference between “we launched a test” and “we built an experimentation habit.”

For more complex UI changes

If you need more than text replacement — different components, layout changes, conditional rendering, image swaps — use the programmatic API:

var variant = window.aa?.experiment('signup_cta', ['control', 'new_cta']);

if (variant === 'new_cta') {
  document.querySelector('.cta').textContent = 'Start Free Trial';
  document.querySelector('.hero-img').src = '/trial-hero.png';
}

window.aa?.experiment() gives deterministic client-side assignment, so the same user keeps seeing the same variant.

Use this when declarative HTML isn’t enough. For simple copy tests, declarative attributes are still the best default.

Common mistakes to avoid

Based on the current docs, these are the big ones:

  • testing too many elements at once
  • using page_view as the goal instead of a conversion event
  • creating the experiment before the goal event is actually tracked
  • forgetting to QA forced variants before sending traffic through the test
  • bundling copy, layout, and offer changes into one experiment
  • leaving an experiment active after you already know the winner

Why this fits agents better than dashboard-first tools

Traditional A/B platforms assume a human will keep opening the dashboard.

Agent Analytics assumes your agent can:

  • create the experiment
  • update the code
  • verify both branches
  • query results later
  • tell you what to do next

That makes experiments much easier to keep running in practice.

Get started


Previously: Set Up Agent Analytics with OpenClaw (5 Minutes)

Related posts