Node.js (Coffeescript):
For fun, here's what I came up with in Ruby and Python:
Ruby:
Python (code I wrote years ago):
Of course, the Node version is asynchronous (the entire purpose of Node) and launches all the requests at once while the synchronous Ruby/Python examples only launch a request when the previous returns:
for 10 requests:
- Node.js: 20 seconds
- Ruby & Python: over a minute
You'd use any of Ruby/Python's great parallel http libraries if that mattered or even just launch a few threads manually. But numbers are always fun, especially when the comparison is nonsensical.
My main point was just to demonstrate how to launch an http request with Node and use the simple jQuery module. I find myself doing ad hoc scraping from the command line all the time and was tired of looking up xpath selectors and shit. I already know jQuery. I'm also still in my latest Coffeescript kick.
Code:
log = console.log
http = require "http"
$ = require "jquery"
{parse} = require "url"
# visits url, returns html response body
getHtml = (url, callback) ->
{hostname, path} = parse url
log "sending GET: #{url}"
body = ""
http.get {hostname, path, port: 80}, (res) ->
res.setEncoding "utf8"
res.on "data", (chunk) -> body += chunk
res.on "end", -> callback body
getHtml "http://danneu.com", (html) ->
$page = $("body").append html # the jquery module has an internal "body" to latch onto.
$links = $page.find("a")
log(link.href) for link in $links
For fun, here's what I came up with in Ruby and Python:
Ruby:
Code:
require "open-uri"
require "nokogiri"
doc = Nokogiri::HTML open("http://danneu.com")
links = doc.css "a"
urls = links.map { |link| link.attribute("href") }
puts urls
Python (code I wrote years ago):
Code:
import sys
import urllib2
if __name__ == "__main__":
sys.path.append("./BeautifulSoup")
from BeautifulSoup import BeautifulSoup
url = "http://danneu.com"
page = urllib2.build_opener().open(url)
soup = BeautifulSoup(page)
for link in soup.findAll("a"):
print link.href
Of course, the Node version is asynchronous (the entire purpose of Node) and launches all the requests at once while the synchronous Ruby/Python examples only launch a request when the previous returns:
for 10 requests:
- Node.js: 20 seconds
- Ruby & Python: over a minute
You'd use any of Ruby/Python's great parallel http libraries if that mattered or even just launch a few threads manually. But numbers are always fun, especially when the comparison is nonsensical.
My main point was just to demonstrate how to launch an http request with Node and use the simple jQuery module. I find myself doing ad hoc scraping from the command line all the time and was tired of looking up xpath selectors and shit. I already know jQuery. I'm also still in my latest Coffeescript kick.