I'm a little over 4 hours into scraping about 4,000 RSS feeds with some software I wrote and I'm 40% done. I'm only scraping the first page of each feed as well, with 15 "products" per page. Some of these feeds have 300+ pages. I'd need my own mini-google to pull off scraping that much data!
The file is already so big that I can't fit it into memory all at once. I'm sure the site I'm pulling the content from just loves me by now. If I had a little more foresight into how big the file would get, I would've broken it into chunks automatically. As it stands, I'll have to write more software to load chunks of it into memory at a time just to break the file apart into CSV files.
Just thought I'd share
The file is already so big that I can't fit it into memory all at once. I'm sure the site I'm pulling the content from just loves me by now. If I had a little more foresight into how big the file would get, I would've broken it into chunks automatically. As it stands, I'll have to write more software to load chunks of it into memory at a time just to break the file apart into CSV files.
Just thought I'd share
