Howdy folks, here I am with yet another question that is really busting my balls. Worst thing is I think I tackled the same thing about two months ago and just can't remember wtf I done. I know this might not be the best forum to post this on but the other one is down and I know some of you guys are great at php.
A quick overview of the situation. I am using php/curl to try automate a webform.
The captcha url is always static ie. http://site.com/act/captcha.jpg
There is a token on the register page that also needs to be scraped.
However my problem lies herein. I scrape the register page and parse the token number. I then scrape the captcha url alas all my efforts are for naught as this image is not the same as the one originally displayed on the register page. I think I verified this as when I right click and view image on the original captcha it also changes the image when displayed.
What I think is happening is that when I request the captcha after already scraping the register page it is altering the cookie and giving me a new image. here is a quick sampling of the function I am using, the scrape_page function is shown below:
Major kudos to anyone who can help me solve this.
A quick overview of the situation. I am using php/curl to try automate a webform.
The captcha url is always static ie. http://site.com/act/captcha.jpg
There is a token on the register page that also needs to be scraped.
However my problem lies herein. I scrape the register page and parse the token number. I then scrape the captcha url alas all my efforts are for naught as this image is not the same as the one originally displayed on the register page. I think I verified this as when I right click and view image on the original captcha it also changes the image when displayed.
What I think is happening is that when I request the captcha after already scraping the register page it is altering the cookie and giving me a new image. here is a quick sampling of the function I am using, the scrape_page function is shown below:
Code:
function get_token($site)
{
// delete any remaining cookies
if(file_exists("cookies/cookies.tmp"))
unlink("cookies/cookies.tmp");
$register_page = scrape_page($site, "https://site.com/act/register");
preg_match("/name=\"token\" value=\"(.*?)\" \/>/", $register_page, $matches);
preg_replace("/name=\"token\" value=\"/", "", $matches[0]);
$token = $matches[1];
$captcha_url = "https://site.com/act/Captcha.jpg";
$fp = fopen("captcha/captcha.jpg", "w");
fwrite($fp, scrape_page($captcha_url, "https://site.com/act/register"));
fclose($fp);
return $token;
}
Code:
function scrape_page($page, $reffer)
{
// cookie path
$file_cookie = "cookies/cookies.tmp";
$ch = curl_init($page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $file_cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $file_cookie);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)");
$response = curl_exec($ch);
curl_close($ch);
//echo curl_error($ch);
return $response;
}