Wayback Machine Scraper?

Enigmabomb · Sep 28, 2011

Anyone have a wayback machine scraper?

igl00 · Sep 28, 2011

this shit is hard, we have it on list tho but not enough demand. currently just google cache was added to best seo suite

plepco · Sep 28, 2011

I was thinking about this very issue, earlier today. It would RULE to have this for when I buy an aged domain. In fact, I want archive.org+Google Cache+Baidu Cache. That would rule. But I don't have the inclination to code it this evening.

In the meantime, I suggest you try this little-known tool called Warrick. I haven't ever heard anyone mention it here or on any other forum I lurk on, so I assume either no one knows about it or everyone knows about it and they think it sucks.

But it works OK. When you have a dozen new domains you need to get content for, you let it run and it saves whatever pages it can find.

I would rather have something prettier and with more options but right now this is the best thing I know of for what you need.

hehejo · Sep 28, 2011

What exactly do you need?

A script that recovers an old website?

Or do you want to scrape old content that is not used anymore?

Enigmabomb · Sep 28, 2011

The tough part is that Archive.org is not wget, or curl friendly. The links in the content are a disaster, which I could personally remedy, I just need all the images, content and everything for a website contained theirin. It's a tougher problem than you think.

hehejo · Sep 28, 2011

Why not? (I just checked with bluehatseo but maybe other sites are worse)

Save images if not already exists
Save html code
Get all links on page

Repeat for all links that you haven't visisted yet

If you need help feel free to hit me up on skype: patrickhehejo

Search

Search

Wayback Machine Scraper?

Enigmabomb

New member

igl00

Elite Blackhatter

plepco

New member

hehejo

Developer

Enigmabomb

New member

hehejo

Developer