Hey,
Selenium 2 seems to be a big step up from the old pre-webdriver selenium.
There are still a few issues with the python connector being unable to set firefox proxies, but I think that is being fixed soon.
If anyone want's to give it a shot with python you might find this useful. It's a little function to retrieve an item from the in memory firefox cache and save it to the hard drive. You could probably edit it to send the binary data directly to your captcha solver of choice if you wanted.
[high=python]
def recover_file_from_cache(browser, key):
'''
@arg: browser - instance of selenium.webdriver.Firefox
@arg: key - the url of the image
@usage: recover_file_from_cache(browser, 'http://example.com/myjpeg.jpg')
'''
cachepath = ''.join(['about:cache-entry?client=HTTP&sb=1&key=',key])
b = browser
# open a new window to retrieve the cached image from
b.execute_script('window.open()')
main_window = browser.current_window_handle
cache_window = [a for a in browser.window_handles if a != main_window][0]
b.switch_to_window(cache_window)
b.get(cachepath)
# extract the hex data from the page and save it to the harddrive.
representation = b.find_element_by_tag_name('pre').text
cleandump = [a[11:73] for a in representation.lstrip().strip().split('\n')]
hs = ' '.join(cleandump).replace(' ','') # hex string.
hb = binascii.a2b_hex(hs) # hex to binary
# replace with the path you want to save the captchas to.
f = open('/home/h/Desktop/captchas/'+str(uuid.uuid4().time_low)+'.jpg','wb')
f.write(hb) # write binary data to file
f.close()
# switch back to the main browser window
b.close() # close browser window
b.switch_to_window(main_window)
[/high]
Selenium 2 seems to be a big step up from the old pre-webdriver selenium.
There are still a few issues with the python connector being unable to set firefox proxies, but I think that is being fixed soon.
If anyone want's to give it a shot with python you might find this useful. It's a little function to retrieve an item from the in memory firefox cache and save it to the hard drive. You could probably edit it to send the binary data directly to your captcha solver of choice if you wanted.
[high=python]
def recover_file_from_cache(browser, key):
'''
@arg: browser - instance of selenium.webdriver.Firefox
@arg: key - the url of the image
@usage: recover_file_from_cache(browser, 'http://example.com/myjpeg.jpg')
'''
cachepath = ''.join(['about:cache-entry?client=HTTP&sb=1&key=',key])
b = browser
# open a new window to retrieve the cached image from
b.execute_script('window.open()')
main_window = browser.current_window_handle
cache_window = [a for a in browser.window_handles if a != main_window][0]
b.switch_to_window(cache_window)
b.get(cachepath)
# extract the hex data from the page and save it to the harddrive.
representation = b.find_element_by_tag_name('pre').text
cleandump = [a[11:73] for a in representation.lstrip().strip().split('\n')]
hs = ' '.join(cleandump).replace(' ','') # hex string.
hb = binascii.a2b_hex(hs) # hex to binary
# replace with the path you want to save the captchas to.
f = open('/home/h/Desktop/captchas/'+str(uuid.uuid4().time_low)+'.jpg','wb')
f.write(hb) # write binary data to file
f.close()
# switch back to the main browser window
b.close() # close browser window
b.switch_to_window(main_window)
[/high]
Last edited: