I'd like to program a script that would scrape content off a target site
Anyone have a tutorial or give a general pointer?:rasta:
class spider
{
// This class grabs the content from the sites
function setup()
{
if(ini_get('open_basedir') == '' && ini_get('safe_mode' == 'Off'));
$cookieJar = 'cookies.txt';
curl_setopt($this->curl,CURLOPT_COOKIEJAR, $cookieJar);
curl_setopt($this->curl,CURLOPT_COOKIEFILE, $cookieJar);
curl_setopt($this->curl,CURLOPT_AUTOREFERER,true);
curl_setopt($this->curl,CURLOPT_FOLLOWLOCATION,true);
curl_setopt($this->curl,CURLOPT_RETURNTRANSFER, true);
}
function get($url)
{
$this->curl = curl_init($url);
$this->setup();
return $this->request();
}
function request()
{
return curl_exec($this->curl);
}
}
$content = htmlspecialchars(file_get_contents(URL));
Its not what you do with the scraper, its what you do when you get the code when you get it, I hope you know all of your php string functions because you'll need them as well as some regex expressions to strip out the parts you need, for directories and shit it should be easy enough.