YetAnotherScraper - YellowPages .com & .ca (PICS)

Status
Not open for further replies.

WildPointer

New member
May 20, 2007
47
0
0
Rating - 0%
0   0   0
Hey guys. I finished working on my scraper. It comes in two versions, one for US and one for Canada.

YetAnotherScraper

Features
  • Collect all business listings easily and efficiently
  • Available for US (.com) & Canada (.ca)!
  • Hassle-free installation - Just upload and begin using
  • No limit on the number of entries it can extract!
  • Can be used on your computer or run remotely
  • Doesn't eat up RAM, meaning it can be used locally in the background!
  • Platform independent - operates completely in browser
  • Fetch a specific page, a range of pages, or ALL pages
  • Output in XML, CSV, or HTML
  • Files automatically compressed after completion
thumb_scraper_main_ready.jpg



thumb_scraper_usa_action.jpg



thumb_scraper_output.jpg



Price is currently $49.99 for each, or $79.99 for both together.

I hope the FAQ, documentation and tutorial on the main site cover all your questions and concerns, but feel free to send a PM or a reply if you want to know more.

More details on the official site: Easily Extract Data from YellowPages (US & CAN) - YetAnotherScraper.com

It's Friday night, and I'm going out, so I won't be able to reply or process orders until late tonight or early tomorrow (but no later!)
 


Two biggest advantages are compression support, which is good when you're using the script remotely to fetch tens of thousands of files, and the fact that you can specify specific pages, as well as a range of pages. This lets you resume your progress if it gets disturbed, or you can fetch a specific set of records little by little if they're too much to get at one time.

It also has XML support and is probably more verbose than the other scripts floating around, updating you in real time on what it's doing.

btw, Insomniac's script has an error. In the CSV file he posted:

"Name","Phone","Address","City","Zip"
"Toronto Cosmetic Clinic","416-221-5554","5400 Yonge Street, Unit 110","Toronto","ON M2N 5R5"
"Kester David A Dr","1-877-733-9005","#350, 943 W Broadway","Vancouver","BC V5Z 4E1"

"ON" and "BC" are part of the zip, but they aren't supposed to be. They are actually provinces. It would be like having "CA 90210" as a zip. I'm particularly picky about my work and take details VERY seriously. It took me about 20 minutes to finish the script, and then 2 days to test and tweak each function to handle errors gracefully and make sure every character, space and symbol is where it's supposed to be.

but I'm not gonna bash krazyjosh or insomniac's work. I've never tried their scripts.


 
HOW LONG it took to write it does not matter, the fact he wrote it and it may work is what does. I don't care if it took 20 mins if it relieves me from having to write it myself.

Pay the man for the time he saves you and the money you make in the future.

And no, I have not and do not need to buy his program :-)
 
you shouldn't admit it just took 20 minutes.
The "time" required to complete a task isn't really an accurate way to appraise the task. Everyday, we pay for things that take other people a minimum amount of effort or time to create or do. (Basically what cashflowrusty said). I agree it feels bad paying $50 for something that takes 5 minutes of work... but you're not counting the time it took me to learn how to code, my experience, learning regular expressions, etc. We trade money for time on a regular basis.

A mistake on my part, I mentioned ranges above, but the Canadian version doesn't support page ranges, only the US does (it's in the documentation). Also, the script isn't encoded, so you're free to edit it, learn from the code, use parts of the code for your other projects, etc..., granted you aren't re-selling it or distributing the code without permission.
 
Two biggest advantages are compression support, which is good when you're using the script remotely to fetch tens of thousands of files, and the fact that you can specify specific pages, as well as a range of pages. This lets you resume your progress if it gets disturbed, or you can fetch a specific set of records little by little if they're too much to get at one time.

It also has XML support and is probably more verbose than the other scripts floating around, updating you in real time on what it's doing.

btw, Insomniac's script has an error. In the CSV file he posted:

"Name","Phone","Address","City","Zip"
"Toronto Cosmetic Clinic","416-221-5554","5400 Yonge Street, Unit 110","Toronto","ON M2N 5R5"
"Kester David A Dr","1-877-733-9005","#350, 943 W Broadway","Vancouver","BC V5Z 4E1"

"ON" and "BC" are part of the zip, but they aren't supposed to be. They are actually provinces. It would be like having "CA 90210" as a zip. I'm particularly picky about my work and take details VERY seriously. It took me about 20 minutes to finish the script, and then 2 days to test and tweak each function to handle errors gracefully and make sure every character, space and symbol is where it's supposed to be.

but I'm not gonna bash krazyjosh or insomniac's work. I've never tried their scripts.



For the sake of comparison, if I gave two shits about Canada (which I don't) I would have split the province up, but since they use a retarded post code system then this is something I didn't know about.

Mine also automatically resumes scraping, hence range support is completely unnecessary and redundant. Also, who gives two shits about compression, speed is all that matters.
 
Oh, and while I'm at it, here are the portions of my disclaimer you violated:

"Any attempt to bug me about this product will result in public humiliation"

Sending me PM's like acting like a potential customer is considered bugging me, hence you get what you agreed to receive, public humiliation.

"This product is not for resale, and the code is obfuscated to look like ugly shit"

So don't ask me if my code is obfuscated even though I've already said it is in my thread. Asking for unobfuscated code is basically asking to steal my product dumbass.

Questions you asked:

"how much and what if page format changes? also, do you only accept PP?"

All of these questions are answered in my thread, this proves you're a dumbshit you can't read. RTFM.
 
but you're not counting the time it took me to learn how to code, my experience, learning regular expressions

People aren't paying you to learn to code, they are paying you for your code. Do not charge them for shit you invested your own time into, only what they are directly paying for.
 
First off, I didn't PM you because I wanted to humiliate you. I PMd you because I wanted a Canadian harvester. I decided to write my own after you told me your update policy. Want me to share that here? I did you a favor by not doing so.

I didn't ignore reading your fine print because I couldn't, I did so because your arrogant 1px by 1px disclaimer strains my eyes. But that's my fault, I'll take the blame for this.

I like how you mention performance when your code is obfuscated, making it slower, but performance isn't much of an issue in this case due to the nature of this app. You can make your code a few milliseconds faster at most, but the bottlenecks are environmental. Server speed, time of day, etc, all influence performance far more than our code would. The only way to substantially make this program faster would be to use multiple threads gathering multiple pages at the same time.

People aren't paying you to learn to code, they are paying you for your code. Do not charge them for shit you invested your own time into, only what they are directly paying for.
We're charging people $50 for a scraper that took us each what.. 30 mins, max, to code? Why charge $50 then? Because the people who buy it are people who can't, or don't have time to, make one themselves.

My point was that the time it takes something to get done isn't an accurate way to appraise its value. Just because a doctor can diagnose me in 10 minutes doesn't mean the appointment isn't worth $300. I can go see a mechanic for a diagnosis, and he can charge me $10, but he didn't invest his time to go to med school.

and I like how you try to put me on the spot for potentially over-charging, when in another thread you claimed to be selling scrapers for $2,000/copy.

and on top of that, your new code is still buggy. Look at your first entry:
http://ypharvest.com/PerformanceParts.xml

Now can you please publicly humiliate me in another thread? I'm not talking shit in your thread. I even PMd you the bug instead of posting it publicly.
 
First off, I didn't PM you because I wanted to humiliate you.

Trust me, no humiliation taken, but you asked to be humiliated by doing so.


I didn't ignore reading your fine print because I couldn't, I did so because your arrogant 1px by 1px disclaimer strains my eyes. But that's my fault, I'll take the blame for this.

1) Ctrl+C
2) Ctrl+V
3) Change font and read

I like how you mention performance when your code is obfuscated, making it slower, but performance isn't much of an issue in this case due to the nature of this app.

Wrong, you clearly don't understand all the facits of obfuscation. This is obfuscated, NOT encoded. All variables, functions and class names have been rewritten to be random strings, all whitespace has been stripped and any comments have been stripped. If anything this form of obfuscation makes the code run faster as there is less code to interpret.

You can make your code a few milliseconds faster at most, but the bottlenecks are environmental. Server speed, time of day, etc, all influence performance far more than our code would. The only way to substantially make this program faster would be to use multiple threads gathering multiple pages at the same time.

Or you can simply use multi curl, which while I'm at it, I'll point out that it is not threaded even though it has thread like behavior.

We're charging people $50 for a scraper that took us each what.. 30 mins, max, to code? Why charge $50 then? Because the people who buy it are people who can't, or don't have time to, make one themselves

Wrong, you are charging for a product, your mistake was to try to justify the price you are selling it for. If people want it they will pay it.

when in another thread you claimed to be selling scrapers for $2,000/copy.

You are correct, the other app was far more advanced, and in fact had several weeks of development in it, plus the target market was corporations doing manual data mining. I charge what I wish at whatever time, and in this situation I feel CA data is worthless and will treat it as such.

and on top of that, your new code is still buggy. Look at your first entry: http://ypharvest.com/PerformanceParts.xml

If you're refering to the odd characters that is NOT a bug, that is simply a character UTF-8 encoded and your browser is having trouble displaying it.

Now can you please publicly humiliate me in another thread? I'm not talking shit in your thread.

Nope, you requested it when you PM'd me, it's in my disclaimer and you should have read it before wasting any of my time or mentioning me in this thread.
 
Status
Not open for further replies.