Any practical tips on statistical significance?

Here is another way to look at this data:

Code:
K34.B,CC.A    .13289 61/459    .10675 49/459
K34.B,CC.B    .20487 84/410    .16829 69/410

Code:
CC.A
4
6
36
7
38
25
6
3
4
2
11
7
1
10
4
10
7
27
13
3
8
4
11
1
6
1
7
1
18
6
2
5
9
12
24
4
6
8
15
1
8
3
10
5
8
1
20
8
1

Code:
CC.B
3
3
4
9
7
4
1
7
12
10
20
5
3
3
2
6
5
7
12
1
4
6
14
4
1
16
3
15
7
4
14
15
9
6
4
2
1
11
7
2
4
25
4
1
10
5
5
3
1
3
6
2
4
2
4
8
1
3
1
2
3
3
3
2
2
6
5
3
13

It shows how many visitors it took to make the "next" conversion.

Basically, once a conversion occurs, it resets the counter. After that it counts each visitor. Once a conversion occurs, it records how many visitors the page received, stores the number and resets the counter.

It's not based on actual time, but on visitor identifier in cookes to establish the "timeline".

That can give you standard distribution and a few other properties to play with.
But I couldn't figure out a good way to use it.
 


Bottom line - if you are wrong 1 out of 20 times you'll still be making a lot of progress. 19 steps forward, 1 step back, 19 steps forward, 1 step back. Get it?
 
Test to 95% and move on with your life, fuck.

LOL. What makes you think I'm not moving on with my life?

Do you think I actually stopped all my marketing over this?
 
Do you have any tips on dealing with the fundamental flaw in all estimates of statistical significance when split testing?

Let's say you aim for 95% confidence level. It means that by definition 1 out of 20 tests will yield pretty much a random result.

I'm not sure "random result" is the best way to look at it.

If you're running an A/B test of a control vs test page you're starting out with the null hypothesis that the conversion rate of A and the conversion rate of B are the same. One out of 20 times you'll fail to reject that, meaning you think that your test page doesn't perform better than the control when in fact it is better.

All that's happening in that one case is that the user's behavior is on the outer parts of the distribution and just by luck one of the tests looks better or worse than it normally would.

So that one time out of 20 you can't tell the difference between your current landing page and your test landing page, you unintentionally end up throwing it out, and forgo the marginal income.

As for what to do, you don't get much choice but to accept it or increase your confidence level. It would seem that if you've got a real big winner on your hands the odds are better that you'll find it than if they were close.

That's my take on it.

Sean
 
It sounds like what you want to do is to factor in confidence interval.

For an explanation of confidence interval & using it with confidence level you can check out this surveysystem page. Ignore/skip the calculator at the beginning of the page (That one isn't tailored towards a split test and is very limited in choices for values however the explanation is useful.)

You can use this split test sample size calculator which is more focused on split testings/conversions to see what I mean and see the difference that it makes.

By adjusting the confidence interval, in addition to the confidence level you can estimate the balance how much traffic to allocate to each segment to make sure that the +/- is factored in.
 
The point is we don't know shit.

Think about it, were you ever able to continuously increase the conversion rate of any sales page? Or did you get stuck at some point?

And when you did get stuck, what did it look like?
Were you constantly finding better versions and constantly updating your control, yet for some reason the overall (historic) conversion rate didn't continue to increase?

Sounds familiar?

It's as if you are constantly improving (according to each test you run), yet that overall improvement doesn't materialize on the greater scale. Why is that?

You have to make a difference between historical conversion rate and last test conversion rate. The historical conversion rate is influenced by all the previous tests.
Lets say you started the campaign with a conversion rate of 8.9%. After x number of test you managed to arrive at 10%. (after 3000 conversions)
On the x+1 test you get a better conversion rate of 10.4%(it took 100 conversion to reach statistical significance). From now on your conversion rate will be 10.4% but the historical conversion rate will be closer to 10%. If you'll stop the tests, you'll see that after 2000 leads the overall conversion rate will come closer to 10.4%.
I never check the overall conversion rate - the last test conversion rate is the real one.
 
Trust your gut. If your tests aren't giving you good data, increase the size of your tests until you feel that you're having data that you can be confident in. You might want to run your statistical tests for at least a full day, as I find variation in CTR and CR depending on the time of day (differs from product to product). There are other factors you might want to control for as well, such as source and demographic. Statistical significance should be fairly accurate if you control for all but one variable, but you really can't control that much (but you can sure as hell try).

Remember - there are lies, damned lies, and statistics.
 
thebigspender, that's an interesting point.

But I was thinking about moving average or simply taking some time interval. Even then it won't be improving indefinitely.

So if you throw away your historic data and just look at recent (enough for statistical significance) numbers, you'll still notice that at some point you won't be making any more progress.

To continue your example, your conversion rate is 10.4%.

Then you run your next text and get 10.4% vs. 10.7%.

Then the next text 10.7% vs 11.2%.

and so on.

Of course, I'm assuming an ideal world where your conversion stays the same and doesn't fluctuate because of the sample's variance.

But even then, you wouldn't be able to go on indefinitely.

At some point, you'll have your control 11.2% and new test 11.6%.

Once you make it your new control, it won't maintain 12.6%, but will fall down to 11.2% (possibly lower).

In other words, and further improvements will be just the type I and type II errors, not true results.
 
Trust your gut. If your tests aren't giving you good data, increase the size of your tests until you feel that you're having data that you can be confident in.

That's what I normally do. I'm just interested in theory behind it and what people do in practice.

Remember - there are lies, damned lies, and statistics.

Yep. And among the top business models in the world are banking, insuring, and gambling -- all rely on statistics :)