Wayback Machine Scraper?



I was thinking about this very issue, earlier today. It would RULE to have this for when I buy an aged domain. In fact, I want archive.org+Google Cache+Baidu Cache. That would rule. But I don't have the inclination to code it this evening.

In the meantime, I suggest you try this little-known tool called Warrick. I haven't ever heard anyone mention it here or on any other forum I lurk on, so I assume either no one knows about it or everyone knows about it and they think it sucks.

But it works OK. When you have a dozen new domains you need to get content for, you let it run and it saves whatever pages it can find.

I would rather have something prettier and with more options but right now this is the best thing I know of for what you need.
 
The tough part is that Archive.org is not wget, or curl friendly. The links in the content are a disaster, which I could personally remedy, I just need all the images, content and everything for a website contained theirin. It's a tougher problem than you think.
 
Why not? (I just checked with bluehatseo but maybe other sites are worse)

Save images if not already exists
Save html code
Get all links on page

Repeat for all links that you haven't visisted yet

If you need help feel free to hit me up on skype: patrickhehejo