Php script to scrape Google Ads

cipheriam

New member
Aug 31, 2008
3
0
0
I searched the net and cant find a script to scrape the ads in the google serps. Does any one know of a php script that can do this? Specifically, I want a script to go through a set of keywords and record the top 10 ads for each keyword and then return to me the top 5 common ads for all of the keywords. Should I be off to rentacoder?
 


have you searched this forum?
(my guess is no because you'd probably find a script thats similar).

Having the script go back and refresh a few times to see which ads stick to the top though is not something i've seen posted here.. frankly i don't see the point when you're already grabbing the ads but all you'd have to do is run the same script a couple of time with the code posted in the 'war chest' (see the post at the top of this forum) to gauge frequency.
 
Yes, I searched the forum, nothing on the forum or warchest thread. If anyone has a link to a script I would much appreciate it.

Cheers!
 
This is not everything you wanted but it's a start. Also you're going to have to write your own curl class to use with this that returns the html code of the site your scraping which should be super easy. Part of my curl class was written by someone else and I don't want to give out their code without their permission. Now all you have to do is build your own code / db to check for similar ads across multiple keywords. Good luck :P

Code:
class google{


    function __construct(){

        $this->c = new curl(); // replace this with your curl
    }

    function getAds($keyword){

        $keyword = str_replace(' ', "+", $keyword);
        echo "Checking $keyword\n";

        $html = $this->c->getFile("http://www.google.com/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=en&q=$keyword&btnG=Google+Search");



        preg_match_all('/<a id=pa[1-5] href=".*?&q=(.*?)">(.*?)<\/a><\/h3><cite>(.*?)<\/cite>      (.*?)<li/is', $html, $topads);

        $strippedtopads = array();

        $strippedtopads['url'] = $this->stripTags($topads[1]);
        $strippedtopads['title'] = $this->stripTags($topads[2]);
        $strippedtopads['adbody'] = $this->stripTags($topads[4]);
        $strippedtopads['displayurl'] = $this->stripTags($topads[3]);
        $strippedtopads['baseurl'] = $this->baseurls($strippedtopads['url']);
        //print_r($strippedtopads);



        preg_match_all('/<a id=an[1-9] href=".*?&q=(.*?)">(.*?)<\/a><\/h3>(.*?)<br><cite>(.*?)<\/cite></is', $html, $sideads);

        //print_r($sideads);

        $strippedsideads = array();

        $strippedsideads['url'] = $this->stripTags($sideads[1]);
        $strippedsideads['title'] = $this->stripTags($sideads[2]);
        $strippedsideads['adbody'] = $this->stripTags($sideads[3]);
        $strippedsideads['displayurl'] = $this->stripTags($sideads[4]);
        $strippedsideads['baseurl'] = $this->baseurls($strippedsideads['url']);
        //print_r($strippedsideads);
        //echo $html;
        
        
        $ads = array($strippedtopads, $strippedsideads);
        
        return $ads;

    }

    function stripTags($text){

        if(is_array($text)){

            $strippedtext = array();

            foreach($text as $t){
                $t = str_replace("<br>", " ", $t);
                $strippedtext[] = trim(urldecode(html_entity_decode(strip_tags($t), ENT_QUOTES)));
            }
        }



        if(is_string($text))
        $text = strip_tags($text);

        return $strippedtext;
    }

function baseurls($urls){

        if(is_array($urls)){

            foreach($urls as $url){

                $parseurl = parse_url($url);

                //print_r($parseurl);
                $baseurls[] = strtolower($parseurl['host'] . @$parseurl['path']);

            }
            return $baseurls;
        }
    }


}