Ruby/Watir: Adwords Scraper

hehejo

Developer
Sep 22, 2009
803
12
18
Switzerland
www.peakinformatik.com
Go for it, good practice if anything ;)

dchuk wrote an awesome tutorial on using Watir, may come in handy: How To Write Your First Ruby Web Bot In Watir

How can I save the captcha image?

I think I can select it like that but when I try to use save it gives an error 'not implemented'.
@browser.image(:xpath, "//*[@id='gwt-debug-captcha-container']/div[2]/img")

I'm using watir webdriver

I will share the bot when I'm done, but I need a little help here...
 


Lemme add a few things for anyone else reading.

Let me recommend something similar to FirePath: Introducing SelectorGadget: point and click CSS selectors
It's not even an app/add-on. You just click and drag it into your book, click it when ya wanna use it, and baby now you've got a stew goin.

It's incredible for finding out what css selectors and xpath will select on a page. You can click/unclick elements and it'll try to figure out what other elements you may want.

For instance, I just clicked the first forecast and it assumed I wanted all of them. If I only wanted the first two, I could deselect the rest and it'd tell me what xpath/css selectors to use.

SelectorGadget tells me I can use `forecasts = doc.css(".twc-forecast-temperature")` with Nokogiri to grab those nodes.

slse3.png


Since you're using that tutorial, know that you can cut out the bloat of Watir by just using Nokogiri itself. Watir/Selenium/etc. are for when you need browser rendering, functionality, and drivers.

Example of the tutorial with just Nokogiri.
Code:
require 'open-uri'
require 'nokogiri'

zipcode = ARGV[0] || 78705
url = "http://www.weather.com/weather/hourbyhour/graph/#{zipcode}"

doc = Nokogiri::HTML(open(url))
raise('Zipcode not found') if /can't find the page you requested/ =~ doc.text
hours = doc.css('.hbhWxHour') 
hours.each do |hour|
 time = hour.css('.hbhWxTime').text
 temp = hour.css('.hbhWxTemp').text
 precip = /\d*[%]/.match hour.css('.hbhWxPrecip').text
 puts "%5s: %6s (%s)\n" % [time, temp, precip]
end
XSKwk.png
 
You know the URL via XPath.

Why don't you fire another request via Mechanize or HTTParty to grab & save the image?

Didn't know about ChunkyPNG, seems like an overkill here, but will check it out.