I've been writing this scraper in PHP for the past several months and have it pretty much collecting all the data I want it to. The problem I'm having, is the time frame to go out and grab all of the data is pretty long - like 5-6 hrs for a complete "set"...and when I run a job that size, I almost never get a complete set of data back. Either the app crashes, the browser crashes or something undetermined goes wrong. I'm closing curl and mysql sessions.
I've already started to write some error handling stuff so I can pick up where I left off, but it still pisses me off when I get up in the morning expecting to see 1000 results, and I only have 200. I'm guessing that if I ran this command line I might not run into as many issues, but I still need to test that. Does anyone know how to pass post variables using CLI...because I have no idea. Any other suggestions? I know some of you do some pretty aggressive scraping so any advice you can give me is much appreciated...
I've already started to write some error handling stuff so I can pick up where I left off, but it still pisses me off when I get up in the morning expecting to see 1000 results, and I only have 200. I'm guessing that if I ran this command line I might not run into as many issues, but I still need to test that. Does anyone know how to pass post variables using CLI...because I have no idea. Any other suggestions? I know some of you do some pretty aggressive scraping so any advice you can give me is much appreciated...