Automated Drupal Commenting with cURL

keliix06

New member
Feb 13, 2007
262
2
0
I'm trying to build an auto-commenter that posts to Drupal sites. I'm pretty sure my code is good, but it looks like Drupal creates a form_build_id that's different every time you load the page. I have an ugly hack with file_get_contents that gets the value, but then when I try to post to the form with cURL the value must be different because it's not getting saved.

Can I load the page, get the value, and post to the form in the same session?

This is the code I'm using right now

Code:
// SETUP THE FORM DATA
$url = 'http://'.$cs.'/?q=comment/reply/'.$node;

$html = file_get_contents($url);
$form_build_id = strstr($html,'name="form_build_id" id="');
$form_build_id = strstr($form_build_id,'value="');
$form_build_id = substr($form_build_id,7);
$pos = strpos($form_build_id,'"');
$form_build_id = substr($form_build_id,0,$pos);

$post_data['form_build_id'] = $form_build_id;
$post_data['form_id'] = 'comment_form';
$post_data['name'] = $name;
$post_data['mail'] = $email;
$post_data['homepage'] = $website;
$post_data['comment'] = urlencode($comment);

foreach ( $post_data as $key => $value) {
    $post_items[] = $key . '=' . $value;
}
$post_string = implode ('&', $post_items);

// POST THE FORM
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $ua);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);
$result = curl_exec($ch);
print_r(curl_getinfo($ch));
echo curl_errno($ch) . '-' . curl_error($ch);
curl_close($ch);
The error checking stuff is really just there while figuring this out.

I've never built any tools like this before, but doing these by hand is a pain in the ass, so the option is to have one of my employees do it or automate it. I'd much rather automate. I know some of you must have this figured out already, any help is appreciated.

I'll be more than happy to add this to the war chest once I can get it figured out.
 
  • Like
Reactions: LogicFlux


i think you'll need cookies enabled, then grab the page, grab the id using a regular expression, then do your post with the cookies and the id. the id is most likely per session.
 
Expanding on what mattseh said, yes drupal protects against CSRF by creating a hidden "token" in form id, this will be checked against a value that is stored in a session cookie

If you do a search on wickedfire for "Logging into Wordpress" or something similar you will come across a post which shows logging in logic which translates to commenting.

Here are the higlights, you need to make your curl session have a cookie jar

Code:
$cookie = tempnam('tmp','COO');

// init CURL
$ch = curl_init();
curl_setopt_array($ch, 
 array(
       CURLOPT_VERBOSE => true,
       CURLOPT_FOLLOWLOCATION => true,
       CURLOPT_COOKIEJAR => $cookie,
       CURLOPT_COOKIEFILE => $cookie,
       CURLOPT_USERAGENT => 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)',
       CURLOPT_SSL_VERIFYPEER => false,
       CURLOPT_SSL_VERIFYHOST => false,
       CURLOPT_HEADER => false,
       CURLOPT_RETURNTRANSFER => true
       ));

// go to login page and grab data
curl_setopt_array($ch, array(
                             CURLOPT_URL => "https://www.site.faux/page.html",
                             CURLOPT_HTTPGET => true,
                             ));
$content = curl_exec($ch);

This gets your cookie jar set up.

Now you need to parse out the input form. Don't use REGEXs on HTML. HTML is not made to be parsed with regex, its mean to be tokenized.

Code:
$doc = new DOMDocument();
$doc->loadHTML($content);
$the_form=$doc->getElementById("id-of-comment-form");

$inputs=$the_form->getElementsByTagName('input');

foreach($inputs as $input){
  $input_name=$input->getAttribute("name");
  $input_value=$input->getAttribute("value");

  if(strlen($input_name) == 0)
    continue;
  $post_data[$input_name]=$input_value;
}


$post_data['comment']="your comment goes here";

$comment_query = '';
foreach($post_data as $key => $val) {
  $comment_query .= $key.'='.rawurlencode($val).'&';
}


curl_setopt_array($ch, array(
// note you can get this CURLOPT_URL from the forms "action" attribute, or 
// in this case it is hard coded.
                             CURLOPT_URL => "https://www.site.faux/post_comment",
                             CURLOPT_HTTPGET => false,
                             CURLOPT_POST => true,
                             CURLOPT_POSTFIELDS => $comment_query
                             ));
$content = curl_exec($ch);

Tell us how you make out.
 
  • Like
Reactions: keliix06
Looks like we're getting closer. I had to close the cURL connection between the requests, otherwise it looks like it was only executing the first one twice. Now it's executing them as separate requests, and the form_build_id is getting added to the postfields, correctly from what I can tell. I'm not sure how to read the temp cookie from the script, but looking at it from command line it only has the session id, which is all I get in the browser, so that should be good.

This is what the code looks like now

Code:
// POST THE FORM
$cookie = tempnam('tmp','COO');

$ch = curl_init();
curl_setopt_array($ch, 
array(
    CURLOPT_VERBOSE => true,
    CURLOPT_FOLLOWLOCATION => 10,
    CURLOPT_COOKIEJAR => $cookie,
    CURLOPT_COOKIEFILE => $cookie,
    CURLOPT_USERAGENT => $ua,
    CURLOPT_SSL_VERIFYPEER => false,
    CURLOPT_SSL_VERIFYHOST => false,
    CURLOPT_HEADER => false,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_URL => $url,
    CURLOPT_HTTPGET => true,
    ));
$content = curl_exec($ch);
echo '<br />';
print_r(curl_getinfo($ch));
curl_close($ch);

// GET THE FIELDS
$doc = new DOMDocument();
$doc->loadHTML($content);
$the_form=$doc->getElementById("comment-form");

$inputs=$the_form->getElementsByTagName('input');

foreach($inputs as $input){
  $input_name=$input->getAttribute("name");
  $input_value=$input->getAttribute("value");

  if($input_name == 'form_build_id'){
      $post_data[$input_name]=$input_value;
  }
}


$post_data['comment']= urlencode($comment);
$post_data['name']= $name;
$post_data['mail']= $email;
$post_data['homepage']= $website;

$comment_query = '';
foreach ( $post_data as $key => $value) {
    $post_items[] = $key . '=' . $value;
}
$comment_query = implode ('&', $post_items);

// POST THE COMMENT
$ch = curl_init();
curl_setopt_array($ch, array(
    CURLOPT_VERBOSE => true,
    CURLOPT_FOLLOWLOCATION => 10,
    CURLOPT_COOKIEJAR => $cookie,
    CURLOPT_COOKIEFILE => $cookie,
    CURLOPT_USERAGENT => $ua,
    CURLOPT_SSL_VERIFYPEER => false,
    CURLOPT_SSL_VERIFYHOST => false,
    CURLOPT_HEADER => false,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_URL => $url,
    CURLOPT_HTTPGET => false,
    CURLOPT_POST => true,
    CURLOPT_POSTFIELDS => $comment_query
    ));
$content = curl_exec($ch);
echo '<br />';
print_r(curl_getinfo($ch));
curl_close($ch);

This is the output from the curl_getinfos as well as the $comment_query

Array ( => http://REMOVED/?q=comment/reply...p on the dom parsing, that makes life easier.
 
Well if you install a local copy of Drupal and loop through $form_values you'll see a couple of fields you're missing.

Edit - Actually that's for FAPI forms - I believe comments may be different. I'm installing a new copy of Drupal tonight so I'll see what I come up with.
 
Here is what's in comment_values

array(12) {
["subject"]=> string(19) "this is the subject"
["comment"]=> string(19) "this is the comment"
["format"]=> string(1) "1"
["cid"]=> NULL
["pid"]=> string(1) "2"
["nid"]=> string(1) "1"
["uid"]=> NULL
["op"]=> string(4) "Save"
["submit"]=> string(4) "Save"
["preview"]=> string(7) "Preview"
["form_build_id"]=> string(37) "form-7d41a1f6ad5b6c5b4ca2cb340039834f"
["form_id"]=> string(12) "comment_form" }
 
  • Like
Reactions: LogicFlux