Near Unlimited Free Content on Almost Any Topic

xha44a · Aug 3, 2014

Hi All,

OK I swiped this gem from BHW but didn't see it here, but it's really too good to pass up folks. Apparently Yahoo Voices and the Yahoo contributor network are closing down permanently. While this sucks for those of us who built links there, it also is pretty sweet in another way. All this content is getting deleted from the interwebz. Gone. Unless of course, someone were to scrape the content and repost it to save it for posterity. If it happens to be on the topic of your website and include links to your website, that's just the cost of doing business.

If I'm not being clear, let me help.

1) Yahoo Voices is going down
2) Their content is getting deleted
3) You're going to scrape/save it
4) Feeding Frenzy

Here's HOW you're going to do this

1) Footprint runs like this. site:voices.yahoo.com "my keywords"
2) Scrape all the URL's using scrapebox. All of them. You'll have to turn off delete duplicate domains
3) Export the URL's to a text file
4) Use Google cache operators and strip images operators to give you a raw, unadulterated version of the articles
5) Use wget and the normal circumvention parameters (delay, random delay, user agent) to download these. ALL of them.

It's a literal feeding frenzy. I've got scrapebox screaming right now getting my proxies ready. Scraping begins shortly. Going to load up a whole range of keywords and the footprint. I'm also going to be running a few instances of wget through proxies in order to make this thing work.

Of course only do this if you're desperate and can't afford your own high-quality viral-style copy from a writer here on BHW. Or Tier 1's. Imagine what you can do with...

10k relevant topic articles no longer indexed...
-Word AI (not required)

Get it while it's still hot in the Google cache. I'm not sure an easy way to scrape bing cache otherwise I'd recommend that as well.

Questions, post them. Otherwise get scrapebox and wget whistling

XH

evolutionvision · Aug 4, 2014

Correct me if I'm wrong here, but I would assume that there are already numerous scraper sites syndicating this content. So if you were to use it, your site would be seen just as another one of those.
?

dzianis · Aug 4, 2014

evolutionvision said:
there are already numerous scraper sites syndicating this content. So if you were to use it, your site would be seen just as another one of those.
?

I think so too, but I just ran a quick check.

I searched for "yahoo voices anus" and extracted Goog's cache of this page

Code:

http://voices.yahoo.com/what-causes-anal-itching-4127261.html

...then googled the first paragraph in quotes:

No scrapers here!

dynamicsoul · Aug 4, 2014

I googled in quotes around 10 articles from voices cache in my niche.. around 70% of them would be seen a dupes due to the articles being published on original author personal sites, or in news syndication.

If you are going to do this for a blog network (not bad idea).. I'd set up some way of checking your scrapes in the serps for dupes..

cardine · Aug 4, 2014

I have 100% of Yahoo Voices scraped and downloaded and we've done some experiments to see how abused the content is. It really depends on the articles and niches. There are many articles that have been syndicated a lot, but just as many that haven't been syndicated at all.

So there are a lot of gems for putting together your own site by just re-using the content, but in all other cases there is still the option of spinning.

conjamuk · Aug 4, 2014

I am pretty sure it will still be in some google database even if it does not show in search and still be seen as dupe content.

Only way to get high quality content for free that will be unlikely duped is to search in a different language and outsource someone to clean it up. Some parts are very readable.
This was from google.de

Update 2.0 is not a Data Refresh the algorithm change "Payday Loan Update" , but as a comprehensive algorithm change by Google to understand, as the underlying technology for the detection of Webspam and illegal SEO techniques significantly has evolved and improved.

Google also intends to use this algorithm change especially on very spam-prone queries as well as specific areas such as credit, pornography, gambling, drugs and prescription pharmaceuticals, from.

The two Payday Loan Google updates have nothing with another algorithm change, such as the Panda update or Penguin Update , emulate.

What is the Payday Loan differs from Penguin Update 2.0 Update 2.0?
Both Google updates , while pursuing the same goal, to improve the quality on the search results pages, but do it completely differently before.

The Penguin Update 2.0 algorithmically attempts an unnatural or manipulated by paid links back link profile of a website to track down and punish with sufficient facts from the entire domain. For prevention, as well as with an existing algorithmic penalty by the Penguin update, webmasters should their study site on an unnatural backlink profile and by means SISTRIX link rating test automatically.

The Payday Loan Update 2.0 does not focus directly on by "classical" Link Building bought or links but punishes sites from which try to manipulate their rankings by illegal SEO tactics. Google thinks in this context of "illegal SEO tactics", also known as blackhat methods that explicitly generated by eg chopped or malware-infected

xha44a · Aug 4, 2014

Hey,

I didn't think of syndication - I ran about 4 articles from my niche and came up with NOTHING. So it's possible depending on the competition level, the articles may be unique, or they may be widely used. I guess it depends on your niche. That said, I've got a proxy mesh running right now scraping these babies to HD as we speak.

Worse case scenario I download 5k articles related to my niche and I can run em all through WordAI as suggested above - and have 5k decently unique articles.

As far as Google databases go, even if I let them site for 6 months - still 5k unique articles is nothing to sniff at.

Also, the idea of downloading from other languages is pretty AWESOME indeed. Never really thought of this.

Cheers guys! Enjoy, and if you're having trouble with wget and the Google IP bans (cache) lmk I got a sweet solution

XH

cardine · Aug 4, 2014

Your best bet would have been to do this before Yahoo Voices went offline. Now that it has gone offline, I think you'd get better results pulling articles from the Wayback Machine than you would from Google Cache.

xha44a · Aug 4, 2014

cardine said:
Your best bet would have been to do this before Yahoo Voices went offline. Now that it has gone offline, I think you'd get better results pulling articles from the Wayback Machine than you would from Google Cache.

Wayback machine has very little yahoo voices content. Google has a ton. I'm hammering Google's cache using a random rotating proxy service with wget. 1 article download per 10 seconds. While scraping Google for more URL's.

Scrapebox FTW!

conjamuk · Aug 4, 2014

SEO is an acronym for the English Search Engine Optimization . In Swedish also used the term search engine optimization .

The goal of SEO is about to be at the top of Google for the keywords that are most popular and relevant to your site. Then get lots of visitors and maybe even earn big money.

You can divide the SEO in two parts based on the two main criteria used for the assessment:

Google analyzes your website's relevance and how it is built and therefore is SEO go hand in hand with substance web design , web development , accessibility , ease of use and copywriting .
Google analyzes how many and who links to you and therefore can SEO go hand in hand with materials viral marketing , PR (public relations) and the creation of high quality content .

A sitemap - also called sitemap or site map (in English Site Map ) - is a subpage of a website that is often seen in a dynamic displays links in a logically to all pages on a site.

The site map helps visitors quickly find what they are looking for and is also a means for search engines to easily find all the pages on the website.

The site map can happily contain text and headings, and should be linked on the site all the pages in a visible place, in the footer for example.

As a rule of thumb, I usually say that if you can not reach all the pages from one click from the home page so you should have a site map.

Holy shit Swedish articles translate really well.

potentialeight · Aug 4, 2014

It's pretty interesting to me that they're going down. I used to write for them a lot about 7-8 years ago when they were Associated Content back before they realized they were paying entirely too much. I didn't realize how good I had it at the time since you could write on literally almost anything that wasn't adult and get something in the neighborhood of $0.03 to $0.05 per work.

I was in college and playing chess pretty seriously, and I would write out my analysis for chess games I had played and then add an intro paragraph and submit that for like $10-$15 a pop. It was pretty sweet, and it's what planted the seed to eventually transition to writing with what I do now.

lukep · Aug 4, 2014

Wow, I made a ton of cash with associated content back in the day... They really were paying too damn much. I can't imagine why this happened to them?!

Search

Search

Near Unlimited Free Content on Almost Any Topic

xha44a

New member

evolutionvision

New member

dzianis

New member

dynamicsoul

New member

cardine

...

conjamuk

Stakin Stacks

xha44a

New member

cardine

...

xha44a

New member

conjamuk

Stakin Stacks

potentialeight

Expert Gambling Writer

lukep

He Hath Arisen