Anyone good with Regex? Know a good tutorial?

Status
Not open for further replies.

Enigmabomb

New member
Feb 26, 2007
2,035
66
0
Than Franthithco
I'm trying to parse something that looks like this:

<tr>
<td><b><a href="/foo/Apple" title="Apple">Apples</a></b></td>
<td>A lot of apples</td>
<td>
I want to get the info after /foo/.

I'm currently using this, but it's not working.

"^<a href=\"/foo(.*)\""

Thanks for your help!

Josh
 


Well, I found a Regex parser. It's bad ass. My query now works in this, but not in PHP?

\"/foo/(.*?)\"

Thats the winner, it returns Apple in the above engine. However,

preg_match_all("\"/foo/(.*?)\"/",$html,$imgarray);

Puts /foo/

I suspect Im missing something.
 
"^<a href=\"/foo(.*)\""
^ matches the beginning of a line.

Since <a href... isn't at the beginning of a line your original regex will not match it.

I've been programming since I was 12 and using/mastering regular expressions has been one of the most painstaking things ever.
 
I agree with you whole heartedly. I've spent HOURS on what fucking line of code because I want this thing to be sexy. Sure I could do with with strpos and substr, but that would be like 50 lines, and god knows how many iterations.

What'd you use to learn this cryptic shit?
 
What'd you use to learn this cryptic shit?
Nothing. Without looking at the above answers, I could probably tell you how to do it in 3 languages. It's not hard or complicated once you know how to use what you are given.

In PHP...
preg_match("/<td><b><a href=\"/foo/(.*)\" title=(.*)<\/td>/",$VAR,$matches);
echo $matches[0];

Jason
 
It's definitely a learned skill to develop good RegExes. It's also good to 'future-proof' regexes from breaking, especially if using scrapers. This can be done with character groupings and some foresight. Without seeing the entire block of html you're working with, i'd offer up:

/href=['"]\/foo\/([^'"]*)/


would match:

Code:
 <tr>
<td><b><a href="/foo/Apple" title="Apple">Apples</a></b></td>
<td>A lot of apples</td>
<td>
...or....

Code:
 <tr>
<td><b><a href='/foo/Apple' title='Apple'>Apples</a></b></td>
<td>A lot of apples</td>
<td>
...or....

Code:
 <tr>
<td><b><a title="Apple" href="/foo/Apple">Apples</a></b></td>
<td>A lot of apples</td>
<td>
 
I use a little desktop program called "The regex coach" at [SIZE=-1]weitz.de/regex-coach/[/SIZE] which allows you to step through the matching process character by character, so you can find out exactly where it goes wrong (or right). Very useful.

I find regex usefull, but sometimes mind numbing. I guess if you use it all the time its fine but I use it once a month and seem to have to relearn most times.

Crispin
 
Status
Not open for further replies.