Have you ever implemented the best-performing variation of an A/B test of PPC ad copy, but didn’t actually see any improvement?
This happens more often than you think.
A/B testing works – just avoid some common pitfalls.
This article addresses the top mistakes that cause PPC A/B tests to fail, along with practical tips to ensure your tests deliver meaningful results. We will cover issues such as:
Pursuing statistical significance at the expense of business impact. Not running tests long enough to get enough data. Failure to segment traffic sources and other critical factors.
Aiming for 95% statistical significance is often an exaggeration
When you do A/B testing, general good practices say you want to start with a strong hypothesis. Something along the lines of:
“By adding urgency to my e-commerce ad, we predict that CTR will increase by four percentage points.”
It’s a good way to start. Having a proper description of the test perimeter, its control and experimental cells, the primary KPI (and potentially secondary KPIs, too), and the estimated results helps to structure the tests and subsequent analysis.
However, when marketers start using this methodology, they often start to mislead themselves and hear about the “Holy Grail” of valid results: reaching statistical significance (or statistical sig). That’s when things get confusing quickly.
(I’m going to assume you know what a stat sig is, but if you don’t, you want to start here i play with this tool to better understand the rest of this article.)
If you’ve been in the PPC business for a while, you’ve noticed common patterns like:
Which usually works: Urgent messages, limited stocks and exclusive offers.
It doesn’t necessarily work: Environmental and social messages (sorry, Earth!).
Which usually works: Place this contact form above the fold on your landing page.
It doesn’t necessarily work: Complex and long lead forms.
So if you’re 99% sure you can get those quick wins right now, go for it. You don’t need to prove everything using A/B tests and statistical results.
You might be thinking, “Okay, but how can I convince my client that we can simply implement this change without even testing it first?”
To fix this, I would recommend:
Document your evidence in a structured way so you can present relevant case studies later. Comparison of competitors (and players outside your target industry). If they all do pretty much the same thing, there may be a valid reason. Sharing relevant results from relevant articles titled “The Top 50 Tests Every Marketer Should Know” (e.g. Tasty A/B, Chameleon).
Your goal here should be to skip the line and save time. And we all know time is money, so your customers (or CMOs and CFOs) will thank you.
Statistical significance does not stop your test
We’ve heard some marketers say, “You should only end a test when you have enough information to make it statistically significant.” Caution here: this is only partially true!
Don’t get me wrong, a test reaching 95% statistical significance is a good thing. Unfortunately, this does not mean that you can completely trust the test results.
In fact, when your A/B testing tool tells you that you have reached the sig statistic, it means that your control and experiment cells are really different. This is.
How is it useful when you already know? After all, you designed your test to be an A/B test, not an A/A test (unless you’re a statistical researcher).
In other words, reaching the statistical sig does not mean that your experimental cell performed better (or worse) than the control cell.
So how do you know your test results are correctly indicating the best performing asset? You might think that your results indicate that cell B outperforms cell A by five percentage points. What else do you need?
As mentioned above, reaching 95% recognizes that control and experiment cells behave differently. But your best performance might change from cell A to B and then from cell B to A even after reaching 95% of the sig statistic.
Now that’s a problem: A/B test results are unreliable as soon as they reach the 95% sig statistic. How unreliable, you ask? 26.1%. Wow…
If you want to go into more detail, here you go a larger analysis by Evan Miller (i a broader perspective on Harvard Business Review).
So how do you know your results are really reliable? First, you want to refrain from stopping your tests until they reach 95%. And you also want to design your A/B tests differently. Here’s how.
Assess your target audience
If you’re not a math person, you’ll want to read Bradd Libby’s article first.
TL;DR: Tossing a coin 10 times will hardly prove that coin is perfectly balanced. 100 is better, and 1 million is great. An infinite amount of time will be perfect. Really, try tossing coins and see for yourself.
In PPC terms, this means A/B test design should start with getting to know your audience. Is it 10 people or 1 million? Based on this, you know where you stand: in A/B testing, more data means more accuracy.
Get the daily search newsletter marketers trust.
Size matters in A/B testing
Not all projects or clients have high volume platforms (be it sessions, clicks, conversions, etc.).
But you only need a large audience size if you expect small incremental changes. So my first point in this article is not to make evidence that states the obvious.
So what is the ideal audience size for an estimated increase of just a few percentage points?
Good news: A/B Tasty developed a sample size calculator. I’m not affiliated with A/B Tasty in any way, but I find their tool easier to understand. Here are other tools if you want to compare: optimize, Adobei Evan Miller.
With these tools, look at your historical data to see if your test can reach a state where its results are reliable.
But wait, you’re not done yet!
The customer journey is also critical
For example, let’s say you see a 5% conversion rate for a group of 7,000 visitors (your average weekly visitor volume).
The sample size calculators above will tell you that you need less than 8 days if you predict that your conversion rate will increase by 1.5 percentage points (so from 5% to 6.5%).
Eight days to increase your conversion rate by 1.5 percentage points?! Now that’s a bargain if you ask me. Too bad you fell into the other trap!
The metric you want to review first was these 8 days. Do they cover at least one (if not two) stages of the customer journey?
Otherwise, you’ll have had two cohorts entering the results of your A/B test (eg your clicks) but only one cohort going through the entire customer journey (having the potential to convert).
And that skews your results dramatically.
Again, this highlights that the longer the test is run, the more accurate its results will be, which can be especially difficult in B2B, where buying cycles can span years.
In this case, you’ll probably want to review your pre-purchase process milestones and make sure your conversion rate variations are somewhat flat. This will indicate that your results are accurate.
As you can see, reaching the statistical sig is far from enough to decide whether your test results are accurate. First you need to plan your audience and let your test run long enough.
Other Common PPC A/B Testing Mistakes
While the above is critical in my mind, I can’t help but point out other bugs just for “fun”.
Don’t segment your traffic sources
PPC pros know this by heart: branded search traffic is worth far more than cold, unretargeted Facebook Ads audiences.
Imagine a test where, for some reason, your share of brand search traffic is inflated relative to your share of Facebook ad traffic (thanks to a PR stunt, let’s say).
Your results would look much better! But would these results be accurate? Probably not.
Bottom line: You want to segment your test by traffic source as much as possible.
Sources I would recommend checking before starting the test:
SEO (often this is 90% brand traffic). Email and SMS (Existing customers over-perform most of the time). Retargeting (those people already know you; they’re not your average Joe). Brand Paid Search.
Make sure you are comparing like things in your tests.
For example, although Google suggests doing a Peak performance versus shopping experiment “Helps you determine which type of campaign drives the best results for your business” is not an apples-to-apples comparison.
They fail to mention that peak performance covers a wider range of ad placements than Shopping campaigns. This makes A/B testing ineffective from the start.
For accurate results, compare peak performance with all Google Ads settings, unless you use brand exclusions. In this case, you’ll want to compare peak performance with all Google ads except search and brand shopping campaigns.
Ignore critical segments
Again, most marketers know that mobile devices work very differently than their desktop counterparts. So why would you combine desktop and mobile data in this A/B test?
Same with geographic data: you shouldn’t compare data from the US to data from France or India. Because?
The competition is not the same. CPMs vary widely. Product-market fit is not identical.
Be sure to “localize” your tests as much as possible.
Final segment: seasonality.
Unless you’re working in this kind of always-on-promotion business, your average customer isn’t the same as your Black Friday/Summer/Mother’s Day customer. Don’t lump all these A/B tests into one.
Avoid A/B testing pitfalls for better PPC results
Understanding these key issues helps you design rigorous A/B tests that really move the needle on your most important metrics.
With a few tweaks to your process, your testing will start paying dividends.
The views expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.
[ad_2]
Source link