Cheating Statistics - What's Fair Game & What Isn't?

Alejandro

The HOTH - Google Me.
May 21, 2009
465
5
0
Chi-city
Ok-

So I'm testing 14 images, with 4 different ad texts = 56 ad copies

at 56K impressions in my campaign, each ad has only gotten 1K impressions, which is not enough to actually eliminate any individual ads.

However, if I aggregate my numbers by ad text i find that
Text 1 = 9 Clicks / 14k Impression
Text 2 = 6 Click / 14k
Text 3 = 4 Click / 14k
Text 4 = 1 Click / 14k

So given those aggregates, Ad Text 4 loser to Text 1, by a 96-98% confidence interval.

Q1: Is it fair game to slash all 14 ads with Ad Text 4 then?

I'm pretty sure this kind of aggregation wouldn't fly in ultra sensitive studies (i.e. AIDS medication testing), but for more pragmatic and time sensitive issues like marketing, is it fair game?

Q2: By the same logic, can I weed out certain images by aggregating their performance across all 3 texts?

I'm curious to hear what the prevailing wisdom is regarding this.

Finally, In a campaign, I've heard of people starting with a HUGE amount of ad copies in the first round (100-5000), but then work to narrow those down for further testing.

Q3: Do you work to narrow it down to 1 best ad copy or ~10?

My problem has been that with so many ad copies, it gets down to about 10. Even with 10K impressions or so, some copies will easily have 2x the amount of clicks as others, but thats not statistically significant. I'm really tempted to just say "fuck it" and optimize around the winning ad, but my OCD-self says "Don't do that. You know that a decision based on shit data is a shit decision."

OK - I shut up & listen now. thx
 


You have too many variations.

The concept of high volume multivariate testing is sexy but as you are finding out it does not work unless your volume is really high.

Go old school and do A/B testing, find a winner, then test C against the winner of the first contest. Make sure you only change one significant thing each time.
 
As far as I've ever learned (so far), the real answer to this is in developing your own testing methodology. I guess this depends a lot on your ad channel, and it'd generally be a lot more time consuming to create 500 different adcopies for an LP than to create 500 adcopies on Facebook.

When split testing, I try to do it in rounds to isolate the different variables: headline, body, image, demographics. Yeah, it's possible that a certain headline will work better when combined with a certain image, but to try to test every combination of all those is usually impractical if only because it would get stupidly expensive. Initially, the best approach is to create a smaller set of ads, maybe like 10 at a time, which are radically different from each other. The winners will give you a hint as to which direction to continue in.

If anybody knows of specific testing theories you can use for all kinds of statistical tests, let me know, because I haven't been able to find anything and that makes me think I'm overlooking them.

pz dudz
 
I find A/B testing the best. Starting with 4 isn't a terrible idea, but narrow it down to the best two, then keep tweaking the weaker of the two until it beats the first. Then you tweak the former best one until it regains its throne...ad nauseum infinitum.