Hey guys,
Just something I whipped up in about 10 minutes. It's an interactive scraper. That is, you plug in a url you want to scrape. A query string of tags, classes (or ids) and click the button.
Just unzip it to your desktop and open it with IE. Click 'allow active conent' and plug in the sample data I have below.
Currently I have tested in IE7 and it works. Testing on pages with javascript on them will throw some warnings at the bottom of the page but it's nothing that should stop the scrape. Once I package it into an Adobe Air app, those errors will be non-existant. The intent here is to demonstrate my example for with little to no effort.
It currently renders the scraped code into the normal html you would see in your browser but it only displays the elements you tell it to in the query.
Here's an example. Lets say that on your target page, you know the data you want is in a table with an id of 'mytable'. Inside that table are some rows but you only want the ones that have a class of 'thisrow'. Your query string would look like this.
#mytable tr.thisrow
If the table didn't have an id you could use
table tr.thisrow
The scraper would only target elements that fit that pattern. If you wanted only the cells that are in each of those rows you would use
table tr.thisrow td
Now as an example, I'm including a url and pattern that I just tested. It's for apple iphones and it grabs all the rows on the page. I have other scrapers that allow you to filter and grab more than one page at a time. (Like all of hundred kajillion of them).
Let me know what you think. If you have questions, or want to see something different let me know.
Thanks,
Just something I whipped up in about 10 minutes. It's an interactive scraper. That is, you plug in a url you want to scrape. A query string of tags, classes (or ids) and click the button.
Just unzip it to your desktop and open it with IE. Click 'allow active conent' and plug in the sample data I have below.
Currently I have tested in IE7 and it works. Testing on pages with javascript on them will throw some warnings at the bottom of the page but it's nothing that should stop the scrape. Once I package it into an Adobe Air app, those errors will be non-existant. The intent here is to demonstrate my example for with little to no effort.
It currently renders the scraped code into the normal html you would see in your browser but it only displays the elements you tell it to in the query.
Here's an example. Lets say that on your target page, you know the data you want is in a table with an id of 'mytable'. Inside that table are some rows but you only want the ones that have a class of 'thisrow'. Your query string would look like this.
#mytable tr.thisrow
If the table didn't have an id you could use
table tr.thisrow
The scraper would only target elements that fit that pattern. If you wanted only the cells that are in each of those rows you would use
table tr.thisrow td
Now as an example, I'm including a url and pattern that I just tested. It's for apple iphones and it grabs all the rows on the page. I have other scrapers that allow you to filter and grab more than one page at a time. (Like all of hundred kajillion of them).
Code:
url to use: http://search [B]DOT[/B] ebay [B]DOT[/B] com/search/search.dll?from=R40&_trksid=m37&satitle=apple+iPhone&category0=
pattern to use: pattern table.ebItemlist tbody tr
Let me know what you think. If you have questions, or want to see something different let me know.
Thanks,