I have a script where a user enters in something, and then in realtime I scrape google (unfortunately a part of Google that doesn't have API yet or else my job would be a lot easier).
Because of this, as you might expect I've run into the "Automated Queries" issue with Google (after 20 or so people use the site the error comes up). I figured this wouldn't be that big of a deal... I could just show Google's captcha to the person on my website, and then take what the user types in and give it back to Google.
So what happens is
Shows the content of this url:
google.com/sorry/Captcha
So I scraped the ID of the captcha, showed the captcha to the user. Then when they hit submit I do the following:
(And of course I would pull the captcha ID and whatever the user entered in for the captcha to replace id and captcha).
Unfortunately this isn't working (and when you echo the contents, you see you're still echoing the captcha page). Now when I iframe it, it does work... meaning I've pulled the right url. However when I do file_get_contents it does not work. The problem with the iframe is that Google sees it as the users IP address and not my servers IP. So then when I try to do file_get_contents later and pull up the data I need to scrape Google still sees my IP as blacklisted and I get the automated queries error.
So the question is this: How do I take Google's captcha (which the user types in for me) and post it back to Google in a way so that my server will no longer be blacklisted by Google?
Any suggestions? And if you give me a good explanation that I can get working I'll send some $$ your way.
Because of this, as you might expect I've run into the "Automated Queries" issue with Google (after 20 or so people use the site the error comes up). I figured this wouldn't be that big of a deal... I could just show Google's captcha to the person on my website, and then take what the user types in and give it back to Google.
So what happens is
PHP:
echo file_get_contents('http://google.com/something');
Shows the content of this url:
google.com/sorry/Captcha
So I scraped the ID of the captcha, showed the captcha to the user. Then when they hit submit I do the following:
PHP:
file_get_contents('http://google.com/sorry/Captcha?continue=http://google.com/something&id=8722769811594829024&captcha=troing&submit=I\'m+human!');
Unfortunately this isn't working (and when you echo the contents, you see you're still echoing the captcha page). Now when I iframe it, it does work... meaning I've pulled the right url. However when I do file_get_contents it does not work. The problem with the iframe is that Google sees it as the users IP address and not my servers IP. So then when I try to do file_get_contents later and pull up the data I need to scrape Google still sees my IP as blacklisted and I get the automated queries error.
So the question is this: How do I take Google's captcha (which the user types in for me) and post it back to Google in a way so that my server will no longer be blacklisted by Google?
Any suggestions? And if you give me a good explanation that I can get working I'll send some $$ your way.