Scraping Google / Search API

slayerment

CrowdFreedom.com
Sep 5, 2007
766
33
0
Los Angeles, CA
www.slayerment.com
I have been scraping Google for a while with pretty much no problem as I have always been able to pull 100 results at a time by adding the num parameter to my query. It seems as though this is no longer working and I can only get 10 results at a time now. I probably could change my search preferences to use 100 results, but it looks like this is set through some JS that I would rather not deal with if possible.

So my question is, how do you guys recommend grabbing 100 results at a time from Google? Should I scrape 10 pages of 10 results? Should I find a way to save my search preferences to display 100 results? Do they even have an API that you can use? It looks like the only API is some custom search BS that does nothing. Is there some more obvious thing that I am missing? Does anybody have any thoughts on this?

Thanks much!
 


I started a thread recently about coding a rank tracker and I had originally planned to pull 100 results at a time too

Technically I found that it is possible to pull 100 results at a time using num=100 in the query string and by adding as_qdr=all which sets the date restriction parameter to all dates.

The problem I did find with this though was that the various results were being clustered by domain in the results and therefore it wasn't reliable

For example, if a domain had URLs ranked in 1st, 50th and 100th position, when you pull 100 results using the num=100, those three pages would be clustered to 1st, 2nd, 3rd.

Because of this, I'm now pulling 10 results per page. If the order of the URLs that you're pulling isn't important (i.e. you're not trying to track rankings) then I guess this won't be an issue for you

If anyone has a better way of doing this to retrieve 100 results at a time without the clustering, I'd love to hear about it (please)

In relation to your other questions:

- You could save your search preferences but then this will also result in more personalisation AFAIK, which I wouldn't have thought is a good thing when you're trying to trying to track rankings. If you're not tracking rankings, then it's not an issue

- I think there is an API but I think there are restrictions on the amount of volume you can scrape. I guess the suitability of this this would depend on what you intend on using the data and how many queries you want to run. For me personally, I want ranking data as close to that which would be displayed to a regular person searching so I wouldn't trust the API anyway.

Look forward to hearing some other people's thoughts