Here is how to protect your forum from XRumer, at least for now



You do know xrumer can skip the div's with style="display:none;" right?
I don't know anything about that software from the user's perspective. That's what makes it interesting.

But if it can deal with display:none, can it deal with content hidden using other means? Like behind an image posted on top of it or font size 1px with the same color as background?

Or css width:0 height:0 for input elements?
 
You planning on leaving the form as working so I can finish this code up my friend?

Sure. It's on my laptop. So as long I'm looking at boobs, the form is working :)
 
Always code against lameness first. That will solve 99% of your problems;
You are right. But we are not talking about a production system. That's just a form that goes to nowhere. Why spend time coding for the cases that aren't "interesting".

But overall, I agree. Such "use cases" should be considered and dealt with right away.
 
Add style="position:absolute; left:-2000px; top:-2000px:" to the mix while you are at it.
 
One more case, using an image for submit button and checking that x and y are not 0.
Can XRumer handle that too?
 
One more case, using an image for submit button and checking that x and y are not 0.
Can XRumer handle that too?

That I've never tried.

Also you should see some failed attempts on the queue, if I wasn't too tired and could regex more than a 12 year old I'd have it done - but the filtering for the ID# was simple in PHP just FYI.

Code:
$hello = strip_tags($hello, '<form><div><html><body><input>');

$asdf = explode("<div>Password (confirm): <input type=\"password\" name=\"", $hello);

$fdsa = explode("\"></div>", $asdf[1]);

echo "Good ID: {$fdsa[0]}";
I just didn't do the filter for the other ID#'s in the POST to finish the script up, but you get the idea. If you shift it over to CSS visibility it would be harder. Not impossible, but harder.

But yea, at this point it can still be done fairly easily for someone that does this a bit.
 
But yea, at this point it can still be done fairly easily for someone that does this a bit.
You see, that's the problem with software.
I wasn't sure how it works. If it actually renders the page using IE's engine (I'm assuming if it's Win-based), then you could code it to ignore anything that's not visible.

But if the software parses html and then makes a post request, then it's much easier to protect against it.

I can easily randomize the names of css classes and insert comments randomly and do a lot of other stuff that makes parsing and regexing out almost impossible.
 
I can easily randomize the names of classes and insert comments randomly and do a lot of other stuff that makes parsing and regexing out almost impossible.

Keep in mind, this one little line killed -all- of your comments, no matter how fucked up they were. The only way you're gonna get around that is including the shit via a 3rd file that's included through javascript, which once again, can be bypassed:

Code:
$hello = strip_tags($hello, '<form><div><html><body><input>');
 
Oh yeah, I remember in the demo, the guy had like 40 threads running.
So it definitely doesn't render pages. 40 concurrent renderings would pretty much slow a win box down to a crawl. You wouldn't even be able to move your mouse.

It must be simply fetching html and parsing it.
 
Oh yeah, I remember in the demo, the guy had like 40 threads running.
So it definitely doesn't render pages. 40 concurrent renderings would pretty much slow a win box down to a crawl. You wouldn't even be able to move your mouse.

It must be simply fetching html and parsing it.

Don't forget, theres are command line rendering engines for languages like Ruby that can execute JS ;)

That's all I'm going to say.

Edit: and now I'm going to bed, bai
 
including the shit via a 3rd file that's included through javascript

Actually, now that I think about it, that would be a bad idea.
It would probably cause "flicker" for the user if the browser starts rendering before all external files are loaded.
 
Oh yeah, I remember in the demo, the guy had like 40 threads running.
So it definitely doesn't render pages. 40 concurrent renderings would pretty much slow a win box down to a crawl. You wouldn't even be able to move your mouse.

It must be simply fetching html and parsing it.

40 is nothing compared to what it can really handle.
 
I removed the comments because it's really easy to strip them out. I agree.

When you are back, look at the source again.

Looking at it, you can easily figure out which one is the real field and which ones are traps.

So programmatically, it wouldn't be hard to "pick out" the correct one. If you can work with a DOM tree, then yeah. But it would be pretty hard to do with just a parser.
 
I've had it up to 500, baby purrrr
I don't think you understood what I said.

If there were a real rendering engine running underneath, then it wouldn't scale.
That made me conclude that it's just a parser. Specifically because it scales.
 
Can your desktop (or an average rented server for that matter) handle RENDERING 500 html pages at the same time? They don't need to be displayed like in a browser, but they still need to be rendered in an off-screen buffer.

Anything else is just parsing.
 
Can xrumer access cookies? (I wasn't able to determine from a quick google search and I have only a passing curiosity in this experiment)

Edit: doesn't matter really, what I was batting around in my mind would force the user/bot to be funneled through a page before getting to sign up.
 
Edit: doesn't matter really, what I was batting around in my mind would force the user/bot to be funneled through a page before getting to sign up.

It already does that. At least based on what I see in my logs, it hits some page, then the registration page with the same cookies.