CURL question

Status
Not open for further replies.

alexb

Señor Member
Dec 6, 2007
1,510
21
0
Indiana
I don't know a damn thing about curl, so I was hoping one of you web dev type people could help me out.

Lets say I'm trying to scrape a website for every phrase between italics tags. Would that be a relatively simple script?
 


theres libraries in alot of languages that help with that.. i use ruby and it would be < 10 lines of code or so.. dunno bout php but im sure its not too hard
 
PHP:
<? 


$ch = curl_init();
        curl_setopt($ch, CURLOPT_URL,"http://www.tizag.com/htmlT/htmlitalic.php");
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_USERAGENT, "YahooSeeker-Testing/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; http://search.yahoo.com/)");
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $data = curl_exec($ch);



    preg_match_all ("/<i>([^`]*?)<\/i>/", $data, $matches);
    print_r($matches);



?>

output:
Array
(
[0] => Array
(
[0] => <i>tag</i>
[1] => <i>Emphasized</i>
[2] => <i>blockquote</i>
[3] => <i>addresses</i>
[4] => <i>Hyper Text Markup Language</i>
[5] => <i>Hyper Text Markup Language</i>
[6] => <i>MD</i>
[7] => <i>HTML Links</i>
)

[1] => Array
(
[0] => tag
[1] => Emphasized
[2] => blockquote
[3] => addresses
[4] => Hyper Text Markup Language
[5] => Hyper Text Markup Language
[6] => MD
[7] => HTML Links
)

)
 
i wrote a function to do this (i spend alot of time scraping values)

Code:
    function svalueall($source,$tag1,$tag2)
    {
        /**
         * modification of the above function that returns an array of all the matches on the page
         */
        $source=str_replace($tag1,'<tagged>',$source);
        $source=str_replace($tag2,'</tagged>',$source);

        preg_match_all('#<tagged>(.*?)</tagged>#',$source,$result);

        $return=array();
        
        /**
         * haha i get some crap on these sometimes for some reason.
         */
        foreach($result as $item){$item=Func::idiotfilter($item);array_push($return,$item);}
        return($return[0]);
    }
    
    function idiotfilter($string)
    {
        $string=str_replace("<tagged>",null,$string);
        $string=str_replace("</tagged>",null,$string);
        return($string);
    }
 
and the complimenting function, svalue($source, 'front','back');

Code:
    function svalue($source,$tag1,$tag2)
    {
        /**
         * this is a great scraping function for pulling a value from between 2 strings
         */
        $source=str_replace($tag1,"<tagged>",$source);
        $source=str_replace($tag2,"</tagged>",$source);
        preg_match('#<tagged>(.*)</tagged>#',$source,$result);
        
        /**
         * some crap may get attached so filter it out
         */
        
        return(@Func::idiotfilter($result[1]));
        
    }

these were the first functions i ever wrote in php, 3 years ago.
 
Status
Not open for further replies.