CPA (cost per acquisition) spikes are the single most-asked-about alert in paid search operations, and the one where most AI agents fail hardest. The failure is not that the model is weak. The failure is that almost every implementation — in marketing SaaS, in one-off ChatGPT prompts, in auto-generated dashboards — treats the problem as “CPA went up, fire an alert.” That heuristic is wrong at exactly the moments a marketer most needs it to be right.
This post walks through why, and how mureo’s anomaly detector is built to refuse false positives instead of emitting them.
The naive approach
Ask any LLM to “watch my Google Ads CPA and alert me if something’s off” and you get some version of:
if today's CPA > target CPA × 1.5: alert("CPA spike detected")It is the simplest thing that could possibly work, and it has been shipping in ad-ops tooling for fifteen years. On a single campaign in steady state, it can even be useful. Applied naively across an account by an AI agent, it becomes noise generator.
Three failure modes that make the naive approach wrong
1. Sample size is almost never enough
A campaign that converts 40 times on Monday and 5 times on Tuesday has not “spiked” — Tuesday is a different sample size. The apparent CPA on a 5-conversion day is so volatile that a single atypical lead inflates it by 30%. On a 2-conversion day, a single outlier can double it.
The naive alert fires. The marketer looks. There is nothing to fix. Five false positives later, the marketer stops reading the alerts, and the system that was supposed to be the safety net is now background noise.
2. Baseline is usually the wrong comparison
“Compare today’s CPA to target CPA” is intuitive but statistically naive. Target CPA is an aspiration, not a baseline. The relevant comparison is:
What is this campaign’s typical CPA, on days similar to today, given what we actually know about it?
Which means you want a median-based baseline over a recent window of same-shape days, not a threshold someone typed into a briefing deck eighteen months ago. Median (not mean) because one bad day should not distort the reference.
3. Severity is not binary
The naive alert either fires or does not. In reality, CPA at 1.4× baseline and CPA at 2.5× baseline require very different human responses. Collapsing them into one boolean wastes the agent’s most useful channel — priority — on a decision the system could have made for the operator.
mureo’s design
The anomaly detector in mureo/analysis/anomaly_detector.py is
deliberately small. It does three things, and refuses to do any of the
fourth:
A. Median baseline from the action log
Every mureo workflow records a CampaignSnapshot to the append-only
action_log. The detector builds a baseline by taking the median
CPA (or CTR, or spend) over a configurable window of recent snapshots
for the same campaign. Median is chosen over mean because it is robust
to the single-day outliers that a marketer should not need to
hand-filter.
B. Sample-size gates
Below a statistical threshold, the detector does not alert at all. The
numbers come from the mureo-learning skill’s sample-size rules, which
codify what is and is not a trustworthy signal:
| Metric | Minimum sample per day | Rationale |
|---|---|---|
| CPA spike | 30 conversions | Below this, a single atypical lead moves CPA too much to call it a “spike” |
| CTR drop | 1000 impressions | Below this, the delivery mix is too noisy to trust the CTR number |
These are not arbitrary. They come out of the sample-size rules in the
mureo-learning skill, which encodes what signal looks like at each
metric — the point below which a “bad day” is day-to-day noise rather
than a shift worth acting on. Below the gate, the detector returns
nothing. The metric is surfaced for monitoring in the /daily-check
report, not for action.
C. Severity tiers tied to effect size
When the gate is cleared, the detector emits one of two tiers:
| Tier | CPA condition | CTR condition |
|---|---|---|
| HIGH | ≥ 1.5× baseline | ≤ 0.5× baseline |
| CRITICAL | ≥ 2.0× baseline | ≤ 0.3× baseline |
Two tiers, not five. Five tiers would imply a precision the detector does not have. The two tiers map to different operator actions:
- HIGH — investigate before the next daily check; likely a structural cause (bid change, new competitor, landing page break).
- CRITICAL — pause-worthy without explanation; budget is actively burning against something that stopped working.
D. What the detector refuses to do
Three things the naive version ships that mureo’s does not:
- It does not alert on zero-conversion days unless the campaign previously had non-zero conversions and spent money today. Zero conversions on a paused campaign is the correct state.
- It does not alert on brand campaigns unless the baseline includes brand. Non-brand baseline applied to brand campaign will always look like a “CPA spike.” The detector inherits the brand flag from the campaign snapshot rather than guessing.
- It does not infer root cause. The anomaly is a shape, not an
explanation. Root cause belongs in the
/rescueworkflow, which consults the diagnostic knowledge base with the anomaly as input. Mixing detection and explanation is how SaaS dashboards end up telling you CPA rose “due to increased competition” on days when the actual cause was a bidding strategy that flipped into learning mode.
When you should still override the agent
The detector is tuned for the median account. It is wrong — and you should override it — in at least these cases:
-
Known promotional pulse. If you are running a 48-hour flash sale and CPA doubles on hour 2, that is the promotion working (high CPC auction, high volume), not a spike. Tell mureo with
/learn; future runs will factor the pulse in. -
Attribution lag. Some ad types — view-through, app-install, offline conversion imports — report conversions 1-7 days late. Same-day CPA will show as “spiked” because the numerator is real but the denominator is partial. The detector does not currently correct for this; a wrapper that suppresses alerts within the lookback window is on the roadmap.
-
Sample-gate boundary. If you have a CPA metric that genuinely matters at 20 conversions/day (niche B2B, high LTV), the 30 threshold is too loose. Operator override: pass a smaller
min_conversionsto the tool invocation. The default is the default, not the ceiling.
Bottom line
The job of an anomaly detector on a money-touching account is not to notice that a number went up. It is to emit an alert rarely enough that, when it does fire, it is worth acting on.
mureo’s detector is not clever. It refuses to fire below sample-size gates; it uses a median rather than a mean; it picks two severity tiers instead of five; it lets humans override when local context demands it. Every one of those choices trades “ability to look impressive on a slide” for “being trusted at 3 AM.”
If that trade is wrong for your account, mureo is wrong for your
account. If it is right, the code is at
mureo/analysis/anomaly_detector.py.
This article is part of the mureo methodology series. The source
numbers cited (1.5×/2.0× CPA, 30 conversions, 1000 impressions) are
the current defaults in anomaly_detector.py as of mureo 0.5.0; they
are versioned with the OSS release and may be retuned as the
diagnostic knowledge base grows.