Creating my own Google ranking tracker script. Advice?

Mahzkrieg

New member
Nov 13, 2007
585
31
0
Austin, Texas
I'd like to make a script that lets me specify my websites and the keywords I want to track for each one. Then the script will iterate through my list daily and track what position my websites are for each keyword.

It's basically my own minimalist version of the traditional "rank tracking" software like Link Assistant and SEOMoz's Rank Tracker.

Mostly it's just for fun in an effort to learn real practical programming.

Question

Google's APIs seem very limited. Ajax API for example seems to be way off from real Google results.

Is it better to have my webserver crawl Google results itself to find the position of my websites or am I just overlooking the right API?
 


Should've posted this in the Newbie section.

Googling "scraping google with <programming language>" offers more than enough help. Looks like Google's restrictive API forces you to scrape its SERPs.
 
If you want accurate results then you should scrape the regular serps.

You'll have to be careful about the number of queries you run in a short period of time, especially if you use Google search operators like site: inurl:
 
for the love of god don't use google's ajax api...I posted a script a few months back in ruby that does rank tracking...if I hadn't just put down a 12 pack of beer I'd post the link for you...pm if you want to discuss options..
 
I actually planned to write this with Ruby on Rails since its syntax/forceful-mvc always appealed to me, but I work with such different environments (between home desktop and coffee-shop laptop) that I was spending more time in linux command line than I was actually getting started.

I'm too ignorant of linux/terminal to learn how to get things to work that don't instantly work, so I'll just go with PHP + Cloakfish (for proxies).

My goal is to make a website where I can register, add websites to my account, and then add keywords for each website. The scraper would run daily and track positioning + make graphs.

Then I'd like to learn how to scale it so I can let others register, particularly WF newbies that could use the utility.
 
I actually planned to write this with Ruby on Rails since its syntax/forceful-mvc always appealed to me, but I work with such different environments (between home desktop and coffee-shop laptop) that I was spending more time in linux command line than I was actually getting started.

I'm too ignorant of linux/terminal to learn how to get things to work that don't instantly work, so I'll just go with PHP + Cloakfish (for proxies).

My goal is to make a website where I can register, add websites to my account, and then add keywords for each website. The scraper would run daily and track positioning + make graphs.

Then I'd like to learn how to scale it so I can let others register, particularly WF newbies that could use the utility.

there are a few members here who have a rank tracking service, sescout is what it's called I believe.

What you use to track rankings daily and what you use to manage and display the data should be completely different apps. PHP would not be a good choice for running the rank trackings in my opinion, I'd do ruby or python that feeds data into a database. Front end can be rails/php/blah.
 
Youll have to watch the number of requests you send to google as if it gets out of hand they could ban your server IP. To get around this you could use multiple ips to do the scaping
 
Youll have to watch the number of requests you send to google as if it gets out of hand they could ban your server IP. To get around this you could use multiple ips to do the scaping

This ^^. Get a bunch of reliable proxies AND don't get greedy. My theory is that they tolerate scraping as long as it doesn't burn too much of their resources.
 
What you use to track rankings daily and what you use to manage and display the data should be completely different apps. PHP would not be a good choice for running the rank trackings in my opinion, I'd do ruby or python that feeds data into a database. Front end can be rails/php/blah.

As this is mostly an educational venture (i don't have very much practical programming experience + i'd like to see if i like ruby or python better), i'm making this app each in PHP, Python, and Ruby. Then I'll run with my favorite. So far, I have the PHP version working.

Question: Why is PHP a bad choice for the rank tracking musclework? And why would Ruby or Python be better choices for it?
 
As this is mostly an educational venture (i don't have very much practical programming experience + i'd like to see if i like ruby or python better), i'm making this app each in PHP, Python, and Ruby. Then I'll run with my favorite. So far, I have the PHP version working.

Question: Why is PHP a bad choice for the rank tracking musclework? And why would Ruby or Python be better choices for it?

php isn't threaded and isn't designed to handle long run tasks like ruby and python are
 
php isn't threaded and isn't designed to handle long run tasks like ruby and python are

curl is multithreaded but waits for all requests to complete before it moves to the next round of multithreading (so it's not true multithreading, only a hack). Seeing these are simple one page scrapes, this aspect isn't a roadblock for building a serp scraper.
 
curl is multithreaded but waits for all requests to complete before it moves to the next round of multithreading (so it's not true multithreading, only a hack). Seeing these are simple one page scrapes, this aspect isn't a roadblock for building a serp scraper.

this is true, two ways to slice the pie. I like to separate the back end bots from the front end web app and any chance I get to play with Ruby I take :)
 
I'm going to assume you don't need this to be really fast. PHP / CURL is fine if you're comfortable with it. I have let a script loop through and save serps for over 1,000,000 keywords without running into rate limit problems. I've only had a captcha come up when I had obvious footprints in my terms but that doesn't matter for what you're doing. It goes fast too. No proxies needed. If you use multiple sockets, captchas come up quick.

If you want to start building multi threaded scrapers and stuff look into python or even visual studio if you like to create easy GUIs. Most of the time I just need a command line app.
 
This project really got me to cover huge educational ground as someone who only dabbled in PHP and is very reliant on my managed server's customer support to do anything.

As per dchuck's first post, I decided to dive into Python (after reading the differences between Python and Ruby). Within an hour, I had imported the BeautifulSoup module to parse html and built an application that would scrape the first 10 pages of a list of keywords and return the position of any domain you want to track. I got it to read a CSV file of 100 keywords and, as danny said, it crawled them all without a hitch.

Then I looked into how to setup up a pure Python application onto my webserver (instead of using, say, PHP frontend with a Python backend). Started a $10 VPS at Linode.com, installed Ubuntu Server, Apache2, then the Flask microframework for Python (but I'll probably move to Django because I'm pretty new and it has a huge community).

Finals slowed me down a bit since I made this thread, but I'd like to soon have a Django website running my python scraper and get working a registration/login/add-website/add-keywords-to-track interface this coming week. My biggest weakness is not knowing how to smartly design/structure an application and having zero knowledge of best practice. i.e. Not sure how I should have my scraper.py autorun daily to check for Google SERP ranking changes. Is that just something I put into a cronjob?
 
I can make a lawnmower go 100 mph, doesn't make it a sportscar

*cough ruby blows chunks cough*

4329878511_7ea19a804f.jpg


4329878353_cfa899439a.jpg