What To Do When Back to Back Split Tests Conflict?

Alejandro

The HOTH - Google Me.
May 21, 2009
465
5
0
Chi-city
So we run this split test on our ecomm site. 2 sidebars.

Sidebar #1 (control) = All graphics, explains 4 reasons why customers should buy from us, takes up more room, more "flashy." Headline "Let Us Hook You Up!"

Sidebar #2 = Typographic, has 5 reasons why customers should buy from us including a flash widget that shows all positive reviews of our site from ResellerRatings.com. Very clean & professional. More compact. Direct Headline "5 Reasons to Choose Us"

I'm testing using Optimizely.

Split test #1 shows sidebar 2 increase sales by a whopping 100%+, with 96.4% confidence.

Im jumping for joy, running around holding my ass cheeks with both hands.

I think to myself "this is too good to be true."

I talk to my sales rep who tells me that there are a handful of transactions that she placed on behalf of customers that called in. I think to myself "fuck, so much for clean data." There were only a couple of transactions tainted, but in the interest of making a clean, data-based decision, I decide to run the test again, this time tracking more variables such as the amount of people that clicked to chat with us (very important in our space because we can give discounts over chat that we can't give elsewhere).

So I run the same test again, this time filtering out our sales rep's ip. For the first 2 days, sidebar 2 is killing it just like first time. Then sidebar 1 unexpectedly catches up and takes the lead. The test has been running for ~2 weeks now and its showing that sidebar 2 is producing 25% less sales than the control, at a 75% confidence. Also, I'm seeing sidebar 2 perform significantly worse for certain key chat buttons. Overall, sidebar 2 is getting 14.9% less chats, however this is only 85% significant. The only metric that is >99% confident is "engagement" (aka clicks on page), for which sidebar 2 is winning by 7%.

Where I'm at now

I'm honestly not sure what to do right now.

Should I keep letting this test run? If so, for how long? Its already been 3 weeks between the 1st 2 tests.

Should I restart this test using a new tracking platform (e.g. visual website optimizer)? I suppose this could act as a tie breaker.

I'm sure I'm not the first person to experience something like this.
Any wisdom is welcome.

Thanks in advance,

-AP

P.S. Here are some tits for your troubles

jenny_brccMAIN.jpg
 


Length of time you've been running the test is relevant, but more importantly how many page views and sales are you talking about? Also, is one getting more people to the order page than the other and maybe can improve abandonment there? If you think you have enough data, run your winner but keep testing other options with 10-20% of your traffic. You can pretty much always improve your stats and might be surprised to find what works.

How about traffic source. Is it relatively the same traffic across your test and consistent with what you typically see?

Not sure if that's helpful at all, but just some thoughts off the top of my head...
 
i would split test more, and not just test 2 pages. do 1 variant of the graphic with different colors. from my experience, people click based on color. you see this a lot with the 110x80 ads out there right now with colored borders, you see different people testing different colors.

also, as stated above lengh of time is relevant. i would make sure that you have tested both for a period of 1 month. some products do great only on the first 3 days of the month (when disability and welfare checks come out) and the other 27 days of the month suck. also check times of day, perhaps one of the sidebars do better at a certain time of day.

ive gone mad split testing ads. it gets even worse when its actually working. there comes a point in time with a website (not ads) when you should just keep something. with ads ive found that there is definitive truth to banner blindness.

just my .02. good job man!
 
Split test #1 shows sidebar 2 increase sales by a whopping 100%+, with 96.4% confidence.

How many actions for each variation and what were the conversion rates?
It sounds like this test didn't run long enough despite the 96.4% confidence.
Take that # with a grain of salt, especially with low total sales volume.

I've had things like this happen to me ALL the time only to let the test run longer and it evens out or performs worse.
 
Thank you guys so much for responding. Mega & Dombo, I PM'd both of you guys with the data you asked about. Looking forward to hearing back from you both.

Shawn - your ideas are very good.

I like the idea of testing different colors. Its kind of weird for us because we use the sidebar space as a "selling" space, kind of the way a CPA landing page would have bullet points listing the benefits. With that said, we don't necessarily want more people to click on the contents, but to be convinced to patronize the site as opposed to other choices out there.

Regarding running tests for a month, in theory this is great, but it seems like way longer than the time spans I hear more people running tests for. It just seems inefficient, no? We sell consumer electronics, so we do see slowdowns associated with rent dues & increases on paydays, but it seems like 2 weeks should be solid if the data says so.

Open to hearing your thoughts on this.