Is this CAPTCHA breakable?

Ahh shit. Sorry I completely misunderstood the last post. I thought you were criticizing me for being lazy. Didn't know what ` meant in php, not a php coder. Sorry I came across as a dick I have been drinking it been a Sunday and all.
 


Sorry if I'm late to the party. This PHP code converts the HTML captcha to a regular image.

Since there’s consistant and equal spacing and no noise or distortion to the image, I’m sure that if you sat down and mapped out each character then you could probably convert the captcha to regular text without even converting it to an image and running it through OCR software.

pp5Dn.png


Code:
<?php
//Scrape the registration page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.nujij.nl/registreren.2051061.lynkx?_showInPopup=true');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
 
preg_match("/<div class='lynkx-captcha'>(.+?)<\/div>/", $data, $captcha_data);
 
$horizontal_lines = explode('<br/>', $captcha_data[1]);
 
 
//Convert HTML captcha to simple 0/1 
foreach($horizontal_lines as $i => $line){
	$line = preg_replace('/<b style=\"background\:(.+?)\"><\/b>/', '1', $line);
	$line = str_replace('<b></b>', '0', $line);
	$horizontal_lines[$i] = $line;
}
 
 
//Create image
$img = imagecreatetruecolor(70, 23);
 
//Fill image with white background
$white = imagecolorallocate($img, 255, 255, 255);
imagefill($img, 0, 0, $white);
 
$black = imagecolorallocate($img, 0, 0, 0);
 
//Go through line by line and paint any '1' as a black pixel
foreach($horizontal_lines as $x => $line){
$x++;
 
	$line_length = strlen($line); //Find out the horizontal length
 
	for($y=1;$y<70;$y++){
	if($line[$y] == "1"){
	imagesetpixel($img,$y,$x,$black);
	}
	}
 
}
 
 
//Output image
header('Content-Type: image/png');
imagepng($img);
?>
 
  • Like
Reactions: dchuk
very nice esrun, probably the best first post anyone has ever had here

Cheers :) I've been reading through the forums now and then. Looking for inspiration lol. I just hadn't really come across something where I felt I had anything worth saying!

Are you still intending to take a crack at it? I'd like to see someone convert the HTML to regular text without using OCR software. If I had a use for it or the captcha was more wildly used then I'd try. But as it is, it's just one page using the captcha and I don't even know how badly the guy want's to get around it :p
 
Cheers :) I've been reading through the forums now and then. Looking for inspiration lol. I just hadn't really come across something where I felt I had anything worth saying!

Are you still intending to take a crack at it? I'd like to see someone convert the HTML to regular text without using OCR software. If I had a use for it or the captcha was more wildly used then I'd try. But as it is, it's just one page using the captcha and I don't even know how badly the guy want's to get around it :p

that's pretty much my thoughts too, if this was a popular captcha I'd do it, but as it stands, it would just be an exercise in coding. What I'd do would just be creating a translation array of each character, and then a function to convert the grid of points to a multi-dimensional array. Then, break apart the letters, run through the translation array, and you should have your captcha.

Quite frankly, your solution is probably just as good or better than what I was thinking of doing.
 
Impressive esrun! My hoster couldn't install wkhtmltopdf on my FreeBSD system so I've been toying around with your solution, but I can't figure it out.

I changed it to this, but get no output. Am I doing something wrong?

Code:
<?php
//Scrape the registration page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.nujij.nl/registreren.2051061.lynkx?_showInPopup=true');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
 
preg_match("/<div class='lynkx-captcha'>(.+?)<\/div>/", $data, $captcha_data);
 
$horizontal_lines = explode('<br/>', $captcha_data[1]);
 
 
//Convert HTML captcha to simple 0/1 
foreach($horizontal_lines as $i => $line){
    $line = preg_replace('/<b style=\"background\:(.+?)\"><\/b>/', '1', $line);
    $line = str_replace('<b></b>', '0', $line);
    $horizontal_lines[$i] = $line;
}
 
 
//Create image
$img = imagecreatetruecolor(70, 23);
 
//Fill image with white background
$white = imagecolorallocate($img, 255, 255, 255);
imagefill($img, 0, 0, $white);
 
$black = imagecolorallocate($img, 0, 0, 0);
 
//Go through line by line and paint any '1' as a black pixel
foreach($horizontal_lines as $x => $line){
$x++;
 
    $line_length = strlen($line); //Find out the horizontal length
 
    for($y=1;$y<70;$y++){
    if($line[$y] == "1"){
    imagesetpixel($img,$y,$x,$black);
    }
    }
 
}

$img = imagepng($img,'captcha.png');

$solved_captcha = shell_exec('gocr '. $_SERVER['DOCUMENT_ROOT'] .'/captcha.png');

echo $solved_captcha;

?>

By the way, the website the CAPTCHA is from is a Social Media site like Digg, but then for my country. It takes about 40 votes to get to the frontpage, that's why I'd like to break this CAPTCHA ;-)
 
Impressive esrun! My hoster couldn't install wkhtmltopdf on my FreeBSD system so I've been toying around with your solution, but I can't figure it out.

I changed it to this, but get no output. Am I doing something wrong?

I think those functions use GD library, so before you dig too deep, it's worth doing phpinfo() to make sure it is included.
 
I think those functions use GD library, so before you dig too deep, it's worth doing phpinfo() to make sure it is included.

Yes, I do know PHP & GD pretty well, but I'm a complete noob if it comes down to shell commands and linux programs... It probably has to do something with settings or so, but Google doesn't help me much either. Bummer!
 
Yes, I do know PHP & GD pretty well, but I'm a complete noob if it comes down to shell commands and linux programs... It probably has to do something with settings or so, but Google doesn't help me much either. Bummer!

Just to make sure we're all on the same page..

Did you try run the script exactly as I posted it? Was that successful?

If yes: GD, cURL etc is enabled and working fine on your server.
If no: Make sure GD, cURL is enabled


Next. The change you made simply saves the image locally and tries to feed it through gocr.

1) Did you check the path of the created image? When passing filenames to third party apps, it's always worth at least trying the full path /home/scito/public_html/captcha/captcha.png

2) Is gocr installed on the server? It's quite a specialist software that isn't installed on shared hosting by default. Are they able to give you SSH access so you can test via the terminal?


On the note of it being a social bookmarking site, you should also consider just manually filling out the captcha like I did for my cheating Reddit post (http://www.esrun.co.uk/blog/cheating-reddit-auto-votes/)

You could spend days fussing around with this captcha stuff or just 10 minutes filling out 100 of them manually..
 
  • Like
Reactions: gutterseo
On a interesting note, it looks like this method is becoming more popular. I just found this one Whois Domain Name Lookup - Dynadot.com

It's interesting how they chose 4x4 blocks and put a little randomness into each block to make the colour just a little different.

it's pretty easy to overcome the color shit, just grayscale the image and then use a threshold to separate the pixels between light and dark
 
Yes every Captcha is breakable "OCR" or Hire a bunch of people in the Philippines or India to type them in
 
I was reading up on html/text/ascii art style captchas. Trying to get an idea on how popular they are, what methods they use to disguise the characters and so on. Although a lot of them seem a long way off being as good as a skewy image captcha, they do seem to be getting there.

Here's the code to crack that Dynadot.com captcha:

Code:
<?php
/*
http://www.esrun.co.uk
*/

//Scrape the WHOIS page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.dynadot.com/whois.html');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7');
$captcha_data = curl_exec($ch);


$character_map_array = array(0=>'0110100110011001100110010110', 1=>'0010011000100010001000100111',2=>'0110100100010010010010001111',3=>'0110100100010110000110010110',4=>'0001001101011001111100010001',5=>'1111100011100001000110010110',6=>'0110100110001110100110010110',7=>'1111000100100100010010001000',8=>'0110100110010110100110010110',9=>'0110100110010111000110010110');


//Break the captcha into 4 individual characters
preg_match_all("/<table cellspacing=\"0\" cellpadding=\"0\">(.+?)<\/table>/s", $captcha_data, $characters);

//Go through each character
foreach($characters[0] as $i => $character){
	
	//Grab the background colors
	preg_match_all("/bgcolor=\"#(.+?)\" width=\"4\"/", $character, $output);
	
	
	foreach($output[1] as $pos => $hex_color){
		//Convert hex color to decimal
		$color_dec = hexdec($hex_color);
		
		//If the decminal value is below 8000000 then it's not part of the character
		if($color_dec <= 8000000){
		$character_maps[$i].= '1';
		} else {
		$character_maps[$i].= '0';
		}
	
	
	}
	
	echo $character.'<br><br>';
}


//Compare the characters we mapped out above against previously stored character maps
foreach($character_maps as $character_map){
//	echo $character_map.'<br>'; //Uncomment this to see the character map and store it in $character_map_array
	$character_answer = array_search($character_map, $character_map_array);
	$captcha_answer.=$character_answer;
}

echo 'ANSWER: '.$captcha_answer;


?>

hjqHF.png
 
  • Like
Reactions: Insomniac
Yes every CaTptcha is breakable "OCR" or Hire a bunch of people in the Philippines or India to type them in

You're right mediasup, if a human can read a captcha (which is the whole point) then it can be broken. However there are many strong captchas out there that can't be (reliably) read with OCR software.

There are some methods people use to fight against outsourced/human based captcha breaking, such as:

  • Short expiry on the lifetime of the captcha
  • Flash based captcha - Harder to capture, usually has to be screenscraped which is a little slow and resource heavy
  • Tie the captcha to the same IP that fetched it. This means you'll need a reliable set of IPs
  • Limit the number of captchas that can be fetched from one IP address. This mean that you need to use multiple IPs for the fetching of the captcha as well as the posting/signup/whatever
  • Require javascript support to render the captcha - again forcing people to screenscrape

Probably more than I'm not even thinking about right now. But my point is that there are also ways to fight against the outsourcing of captcha breaking.
 
There are some methods people use to fight against outsourced/human based captcha breaking, such as:

  • Short expiry on the lifetime of the captcha
  • Flash based captcha - Harder to capture, usually has to be screenscraped which is a little slow and resource heavy
  • Tie the captcha to the same IP that fetched it. This means you'll need a reliable set of IPs
  • Limit the number of captchas that can be fetched from one IP address. This mean that you need to use multiple IPs for the fetching of the captcha as well as the posting/signup/whatever
  • Require javascript support to render the captcha - again forcing people to screenscrape
True, but like breaking captchas, defending them becomes an arms race.

Short expiry time is problematic since it will annoy the users if they take to long, and considering antigate/captchabot typically take less than 30 seconds to respond, timeouts aren't that feasible.

Tying the captcha an IP wouldn't matter since the IP that requested it would be the same IP that would submit it. Unless the bot was crap and just randomly selected a proxy/IP for each request instead of per session.

Limiting the number of captchas that can be fetched from one IP could be bypassed using proxies.

Flash based captures/Requiring javascript are basically the same in terms of difficulty and would only be a problem if you had a bot randomly crawling the internet and it has never seen the captcha before. But using javascript or flash on a site that the person is writing a dedicated breaker for wouldn't hinder them.

Don't get me wrong, I'm not trying to shit on your posts, your code is impressive, I'm just pointing out that no matter what it done to make it harder to break, there is always a way to get around it.

And it becomes an arms race, and personally I wouldn't have it any other way :D
 
I was reading up on html/text/ascii art style captchas. Trying to get an idea on how popular they are, what methods they use to disguise the characters and so on. Although a lot of them seem a long way off being as good as a skewy image captcha, they do seem to be getting there.

Here's the code to crack that Dynadot.com captcha:

+rep nice. I'm surprised you used array_search rather than just using the pattern as a key. I really don't understand what they hope to gain from this html method. I could see something cool done with javascript to make this more difficult, but overall it seems very futile.

You're right mediasup, if a human can read a captcha (which is the whole point) then it can be broken. However there are many strong captchas out there that can't be (reliably) read with OCR software.

There are some methods people use to fight against outsourced/human based captcha breaking, such as:

  • Short expiry on the lifetime of the captcha
  • Flash based captcha - Harder to capture, usually has to be screenscraped which is a little slow and resource heavy
  • Tie the captcha to the same IP that fetched it. This means you'll need a reliable set of IPs
  • Limit the number of captchas that can be fetched from one IP address. This mean that you need to use multiple IPs for the fetching of the captcha as well as the posting/signup/whatever
  • Require javascript support to render the captcha - again forcing people to screenscrape

Probably more than I'm not even thinking about right now. But my point is that there are also ways to fight against the outsourcing of captcha breaking.

I had a fun one today, they have a limit per IP of ~75 requests, and a second limit of around 200 per C block. I found a workaround but it was quite unusual to see 'fresh' IP's tarnished.
 
I'm surprised you used array_search rather than just using the pattern as a key.

LOL you're right. I dunno why I did that! and thanks for the rep :)

True, but like breaking captchas, defending them becomes an arms race.
...
...
Don't get me wrong, I'm not trying to shit on your posts, your code is impressive, I'm just pointing out that no matter what it done to make it harder to break, there is always a way to get around it.

And it becomes an arms race, and personally I wouldn't have it any other way

Thanks for the feedback. I won't take it the wrong way. One of the best ways to improve something is to have people throw ideas at each other and point out problems or ways to improve them.. then tweak tweak tweak.

You're right that none of those ideas alone can stop an automator, they're more about slowing someone down. I see Captchas kinda like burglar alarms. A burglar alarm wont stop someone from breaking into your house but it might serve as a deterrent to put them off and will probably slow them down a bit.