Hi All,
OK I swiped this gem from BHW but didn't see it here, but it's really too good to pass up folks. Apparently Yahoo Voices and the Yahoo contributor network are closing down permanently. While this sucks for those of us who built links there, it also is pretty sweet in another way. All this content is getting deleted from the interwebz. Gone. Unless of course, someone were to scrape the content and repost it to save it for posterity. If it happens to be on the topic of your website and include links to your website, that's just the cost of doing business.
If I'm not being clear, let me help.
1) Yahoo Voices is going down
2) Their content is getting deleted
3) You're going to scrape/save it
4) Feeding Frenzy
Here's HOW you're going to do this
1) Footprint runs like this. site:voices.yahoo.com "my keywords"
2) Scrape all the URL's using scrapebox. All of them. You'll have to turn off delete duplicate domains
3) Export the URL's to a text file
4) Use Google cache operators and strip images operators to give you a raw, unadulterated version of the articles
5) Use wget and the normal circumvention parameters (delay, random delay, user agent) to download these. ALL of them.
It's a literal feeding frenzy. I've got scrapebox screaming right now getting my proxies ready. Scraping begins shortly. Going to load up a whole range of keywords and the footprint. I'm also going to be running a few instances of wget through proxies in order to make this thing work.
Of course only do this if you're desperate and can't afford your own high-quality viral-style copy from a writer here on BHW. Or Tier 1's. Imagine what you can do with...
10k relevant topic articles no longer indexed...
-Word AI (not required)
Get it while it's still hot in the Google cache. I'm not sure an easy way to scrape bing cache otherwise I'd recommend that as well.
Questions, post them. Otherwise get scrapebox and wget whistling
XH
OK I swiped this gem from BHW but didn't see it here, but it's really too good to pass up folks. Apparently Yahoo Voices and the Yahoo contributor network are closing down permanently. While this sucks for those of us who built links there, it also is pretty sweet in another way. All this content is getting deleted from the interwebz. Gone. Unless of course, someone were to scrape the content and repost it to save it for posterity. If it happens to be on the topic of your website and include links to your website, that's just the cost of doing business.
If I'm not being clear, let me help.
1) Yahoo Voices is going down
2) Their content is getting deleted
3) You're going to scrape/save it
4) Feeding Frenzy
Here's HOW you're going to do this
1) Footprint runs like this. site:voices.yahoo.com "my keywords"
2) Scrape all the URL's using scrapebox. All of them. You'll have to turn off delete duplicate domains
3) Export the URL's to a text file
4) Use Google cache operators and strip images operators to give you a raw, unadulterated version of the articles
5) Use wget and the normal circumvention parameters (delay, random delay, user agent) to download these. ALL of them.
It's a literal feeding frenzy. I've got scrapebox screaming right now getting my proxies ready. Scraping begins shortly. Going to load up a whole range of keywords and the footprint. I'm also going to be running a few instances of wget through proxies in order to make this thing work.
Of course only do this if you're desperate and can't afford your own high-quality viral-style copy from a writer here on BHW. Or Tier 1's. Imagine what you can do with...
10k relevant topic articles no longer indexed...
-Word AI (not required)
Get it while it's still hot in the Google cache. I'm not sure an easy way to scrape bing cache otherwise I'd recommend that as well.
Questions, post them. Otherwise get scrapebox and wget whistling
XH