Ahh shit. Sorry I completely misunderstood the last post. I thought you were criticizing me for being lazy. Didn't know what ` meant in php, not a php coder. Sorry I came across as a dick I have been drinking it been a Sunday and all.
<?php
//Scrape the registration page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.nujij.nl/registreren.2051061.lynkx?_showInPopup=true');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
preg_match("/<div class='lynkx-captcha'>(.+?)<\/div>/", $data, $captcha_data);
$horizontal_lines = explode('<br/>', $captcha_data[1]);
//Convert HTML captcha to simple 0/1
foreach($horizontal_lines as $i => $line){
$line = preg_replace('/<b style=\"background\:(.+?)\"><\/b>/', '1', $line);
$line = str_replace('<b></b>', '0', $line);
$horizontal_lines[$i] = $line;
}
//Create image
$img = imagecreatetruecolor(70, 23);
//Fill image with white background
$white = imagecolorallocate($img, 255, 255, 255);
imagefill($img, 0, 0, $white);
$black = imagecolorallocate($img, 0, 0, 0);
//Go through line by line and paint any '1' as a black pixel
foreach($horizontal_lines as $x => $line){
$x++;
$line_length = strlen($line); //Find out the horizontal length
for($y=1;$y<70;$y++){
if($line[$y] == "1"){
imagesetpixel($img,$y,$x,$black);
}
}
}
//Output image
header('Content-Type: image/png');
imagepng($img);
?>
very nice esrun, probably the best first post anyone has ever had here
CheersI've been reading through the forums now and then. Looking for inspiration lol. I just hadn't really come across something where I felt I had anything worth saying!
Are you still intending to take a crack at it? I'd like to see someone convert the HTML to regular text without using OCR software. If I had a use for it or the captcha was more wildly used then I'd try. But as it is, it's just one page using the captcha and I don't even know how badly the guy want's to get around it![]()
<?php
//Scrape the registration page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.nujij.nl/registreren.2051061.lynkx?_showInPopup=true');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
preg_match("/<div class='lynkx-captcha'>(.+?)<\/div>/", $data, $captcha_data);
$horizontal_lines = explode('<br/>', $captcha_data[1]);
//Convert HTML captcha to simple 0/1
foreach($horizontal_lines as $i => $line){
$line = preg_replace('/<b style=\"background\:(.+?)\"><\/b>/', '1', $line);
$line = str_replace('<b></b>', '0', $line);
$horizontal_lines[$i] = $line;
}
//Create image
$img = imagecreatetruecolor(70, 23);
//Fill image with white background
$white = imagecolorallocate($img, 255, 255, 255);
imagefill($img, 0, 0, $white);
$black = imagecolorallocate($img, 0, 0, 0);
//Go through line by line and paint any '1' as a black pixel
foreach($horizontal_lines as $x => $line){
$x++;
$line_length = strlen($line); //Find out the horizontal length
for($y=1;$y<70;$y++){
if($line[$y] == "1"){
imagesetpixel($img,$y,$x,$black);
}
}
}
$img = imagepng($img,'captcha.png');
$solved_captcha = shell_exec('gocr '. $_SERVER['DOCUMENT_ROOT'] .'/captcha.png');
echo $solved_captcha;
?>
Impressive esrun! My hoster couldn't install wkhtmltopdf on my FreeBSD system so I've been toying around with your solution, but I can't figure it out.
I changed it to this, but get no output. Am I doing something wrong?
I think those functions use GD library, so before you dig too deep, it's worth doing phpinfo() to make sure it is included.
Yes, I do know PHP & GD pretty well, but I'm a complete noob if it comes down to shell commands and linux programs... It probably has to do something with settings or so, but Google doesn't help me much either. Bummer!
You could spend days fussing around with this captcha stuff or just 10 minutes filling out 100 of them manually..
On a interesting note, it looks like this method is becoming more popular. I just found this one Whois Domain Name Lookup - Dynadot.com
It's interesting how they chose 4x4 blocks and put a little randomness into each block to make the colour just a little different.
<?php
/*
http://www.esrun.co.uk
*/
//Scrape the WHOIS page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.dynadot.com/whois.html');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7');
$captcha_data = curl_exec($ch);
$character_map_array = array(0=>'0110100110011001100110010110', 1=>'0010011000100010001000100111',2=>'0110100100010010010010001111',3=>'0110100100010110000110010110',4=>'0001001101011001111100010001',5=>'1111100011100001000110010110',6=>'0110100110001110100110010110',7=>'1111000100100100010010001000',8=>'0110100110010110100110010110',9=>'0110100110010111000110010110');
//Break the captcha into 4 individual characters
preg_match_all("/<table cellspacing=\"0\" cellpadding=\"0\">(.+?)<\/table>/s", $captcha_data, $characters);
//Go through each character
foreach($characters[0] as $i => $character){
//Grab the background colors
preg_match_all("/bgcolor=\"#(.+?)\" width=\"4\"/", $character, $output);
foreach($output[1] as $pos => $hex_color){
//Convert hex color to decimal
$color_dec = hexdec($hex_color);
//If the decminal value is below 8000000 then it's not part of the character
if($color_dec <= 8000000){
$character_maps[$i].= '1';
} else {
$character_maps[$i].= '0';
}
}
echo $character.'<br><br>';
}
//Compare the characters we mapped out above against previously stored character maps
foreach($character_maps as $character_map){
// echo $character_map.'<br>'; //Uncomment this to see the character map and store it in $character_map_array
$character_answer = array_search($character_map, $character_map_array);
$captcha_answer.=$character_answer;
}
echo 'ANSWER: '.$captcha_answer;
?>
Yes every CaTptcha is breakable "OCR" or Hire a bunch of people in the Philippines or India to type them in
True, but like breaking captchas, defending them becomes an arms race.There are some methods people use to fight against outsourced/human based captcha breaking, such as:
- Short expiry on the lifetime of the captcha
- Flash based captcha - Harder to capture, usually has to be screenscraped which is a little slow and resource heavy
- Tie the captcha to the same IP that fetched it. This means you'll need a reliable set of IPs
- Limit the number of captchas that can be fetched from one IP address. This mean that you need to use multiple IPs for the fetching of the captcha as well as the posting/signup/whatever
- Require javascript support to render the captcha - again forcing people to screenscrape
I was reading up on html/text/ascii art style captchas. Trying to get an idea on how popular they are, what methods they use to disguise the characters and so on. Although a lot of them seem a long way off being as good as a skewy image captcha, they do seem to be getting there.
Here's the code to crack that Dynadot.com captcha:
You're right mediasup, if a human can read a captcha (which is the whole point) then it can be broken. However there are many strong captchas out there that can't be (reliably) read with OCR software.
There are some methods people use to fight against outsourced/human based captcha breaking, such as:
- Short expiry on the lifetime of the captcha
- Flash based captcha - Harder to capture, usually has to be screenscraped which is a little slow and resource heavy
- Tie the captcha to the same IP that fetched it. This means you'll need a reliable set of IPs
- Limit the number of captchas that can be fetched from one IP address. This mean that you need to use multiple IPs for the fetching of the captcha as well as the posting/signup/whatever
- Require javascript support to render the captcha - again forcing people to screenscrape
Probably more than I'm not even thinking about right now. But my point is that there are also ways to fight against the outsourcing of captcha breaking.
I'm surprised you used array_search rather than just using the pattern as a key.
True, but like breaking captchas, defending them becomes an arms race.
...
...
Don't get me wrong, I'm not trying to shit on your posts, your code is impressive, I'm just pointing out that no matter what it done to make it harder to break, there is always a way to get around it.
And it becomes an arms race, and personally I wouldn't have it any other way