Regex question, PHP

tomaszjot · Dec 18, 2011

I have this HTML, need to grab price

Code:

<div class="somediv">                                                                                                                                    <p class="pricediv">  
               £33.11             </p> 
                                             </div>

Price is 33.11 (as you probably see).

I try to build regex to get it, no success. I'm totally rubbish with it so got few questions:

1. Should I treat empty spaces between lines in some special way? Like use /n or something to go to the next line?

2. What if the price div is wrapped with <noscript> tag, should it change my approach?

3. I built very simple regex to get price:

Code:

$regex = '/.*?<div class="somediv "><p class="pricediv">(.+?)<\/p<\/div>.*?/';

It doesn't work of course. I built successful regex for some other pages but stuck with this one.

Any help much appreciated.

pickledegg · Dec 18, 2011

Don't have the actual answer but check out http://gskinner.com/RegExr/ it's perfect for playing about with regex on the fly and learning by doing.

Insomniac · Dec 18, 2011

Given the available information, I'd be inclined to just be lazy:

$regex = '/£([0-9\.]*)\s/';

mattinator · Dec 18, 2011

I would go about solving the problem using existing tools, namely: PHP Simple HTML DOM Parser

It looks like once you grab the page in question you may be able to just do something like this:

$ret = $html->find('div[id=foo]')

Unless you had a good reason not to do this (e.g. you're trying to write very lean code), using this DOM parser would be the way I'd go about it.

rish3 · Dec 18, 2011

If you want to use the regex, you'll need the 's' operator for the . to match across a line:

Code:

'/regex/s'

tomaszjot · Dec 18, 2011

Thanks for help guys, will work on it. Thanks

jdrmar · Dec 18, 2011

PHP Simple HTML DOM Parser is a nice tool but tends to break down on big, javascript etc heavy pages. In those cases use xpath. It's got a bit of a learning curve but well worth it if you're going to do any serious scraping

PHP: DOMXPath - Manual

nickCR · Dec 18, 2011

I would leave out the DIV, just look for the P and the pound. The dom path is probably your best bet with the paragraph.

harrymouni · Dec 18, 2011

Umm, unless I'm mising something [\s] character class matches all whitespace so this works on your string, I tested at preg_match - regular expression PHP functions - functions-online

Code:

/<div class="somediv">[\s]+<p class="pricediv">[\s]+(.+?)[\s]+<\/p>[\s]+<\/div>/

Also, you're probably better off asking all programming questions on StackOverflow these days

Search

Search

Regex question, PHP

tomaszjot

Membership Suspended

pickledegg

Active member

Insomniac

New member

mattinator

Ad Astra Per Alia Porci

rish3

New member

tomaszjot

Membership Suspended

jdrmar

New member

nickCR

New member

harrymouni

Active member