Howto: See if Your Split Test Data is Statistically Relevant

aeisn

New member
May 6, 2009
715
12
0
Let me preface this by saying I'm far from a math genius, this is just stuff I learned reading stuff on the internet. So if you are a math whiz and any of what I say is incorrect please let me know so I can fix it. I also apologize if everyone already learned all this in AM kindergarten while I was eating sand.

There have been quite a few threads on WF asking how many clicks/views/conversions are needed before they know offer/ad/landing page sucks. Well guess what, with a little bit of math you can take out the guess work and know exactly how much data we before you know we know a campaign is absolute shit or the stone cold nuts.

Lets say we have two landing pages. LP1 gets 10,000 views and 70 conversions for a 0.7% conversion rate. LP2 gets 5,000 views and 120 conversions for a 2.4% conversion rate. But do we have enough data to say LP2 is better?

Lets find out, go to Calculator for confidence intervals for odds ratio unmatched case control study and fill out the forms like this:

784Yt.png


Click Calculate Results and we get...

LS4ph.png


So what exactly do these results tell us?

Odds ratio OR = 0.2867

Basically tells us LP1 does 28.67% of the conversions of LP2. Interesting, but we kind of already knew that.

95% confidence interval = from to 0.2131 to 0.3858

This is the important number. This is tells us that over infinity time LP1 will do between 21.31% and 38.58% of the conversions LP2 does. So even if you left your campaign running until the sun burnt out your LP1 would still be a shitty landing page. So we can say with mathematical certainty that LP2 is better.

If the confidence interval is above 100% that means we don't have enough data and can't be sure which landing page is better. For example if the range was instead 0.92 to 1.03 that would mean LP1 does between 92% and 103% of the conversions LP2 does. Basically it tells us that over infinity time we can't be sure if LP1 is going to do worse or better than LP2 with that amount of data.

Sometimes people use a 90% confidence interval instead of 95%. IRL Scientists go off a 95% confidence interval though so I tend to stick with that though.

Keep in mind all these calculations assume all other variables are the same and the only difference is the two landing pages.

I think that covers everything, feel free to post questions if you have them.
 
  • Like
Reactions: erect and Benji49


I forgot to mention this in the original post. A is the number of conversions on the LP, B is the number of non conversions. So A = conversions and B = impressions - conversions.
 
Good post. Another (simpler) tool is :::SplitTester.com:::

It's also worth mentioning that while 95% confidence is exciting,
you'll still be wrong 5 out of 100 times - something to keep in
mind. I've had many disappointments with Split/Multivariate
testing throughout the years once I've swapped my control
for the winner. On the other hand, all it takes is one really good
test and you can knock it out of the park.
 
I wouldn't put too much confidence in these tools. There are too many variables that are not accounted for. What could be a 'shitty' landing page in one scenario would be a great one in another. It all depends on your ads, the users you are targeting them to, the frame of mind they're in at the time, the offer you're promoting, etc.

In the end you should retest any time there is a change in another variable (eg. traffic source). No formula can accurately tell you lp A is 'better' than lp B.
 
Good post. Another (simpler) tool is :::SplitTester.com:::

It's also worth mentioning that while 95% confidence is exciting,
you'll still be wrong 5 out of 100 times - something to keep in
mind. I've had many disappointments with Split/Multivariate
testing throughout the years once I've swapped my control
for the winner. On the other hand, all it takes is one really good
test and you can knock it out of the park.

Thanks for the link, it's a lot simpler/more suited for AM than the tool I posted.
 
I wouldn't put too much confidence in these tools. There are too many variables that are not accounted for. What could be a 'shitty' landing page in one scenario would be a great one in another. It all depends on your ads, the users you are targeting them to, the frame of mind they're in at the time, the offer you're promoting, etc.

In the end you should retest any time there is a change in another variable (eg. traffic source). No formula can accurately tell you lp A is 'better' than lp B.
This post seems to seriously misunderstand the point in split testing. You can test well and you can test poorly, all the tool does is calculate the confidence interval using accepted scholarly statistical methods. If you disagree with the confidence calculation, you must disagree with the ability to measure things.

You certainly need to make sure you understand your test. i.e., you don't attribute a good conversion rate to a good LP, when it was really that you ran it in December instead of February and got all the Christmas sales.

This tool is cool for sure. +rep.
 
You certainly need to make sure you understand your test. i.e., you don't attribute a good conversion rate to a good LP, when it was really that you ran it in December instead of February and got all the Christmas sales.

Absolutely... for any given traffic source, landing pages must be tested at the same time.
 
Doesn't Google Website optimizer have the same calculation? ( Degree of confidence in results)

I'm running a split test right now, dynamic keyword insertion on the LP....vs not.