I am scraping content from a site, and for the most part things are going okay. 'Cept I have a problem with a portion of the scraped content - I want to remove part of it and keep part of it.
Here's the portion I am dealing with:
Original text:
What I want to do is strip everything out so I end up with simply "JACK SAYS: 'blah blah blah'" and none of the other junk.
One reason for the difficulty is that I have like 100 entries that have different dates, different AUTHOR info, etc. and I want to do the same thing to each entry.
(I would post what I've tried already except none of it has worked out.) If you can solve this it will be much appreciated.
Here's the portion I am dealing with:
Original text:
HTML:
<span class="prefix">DATE:</span> May 25, 2006 <span class="prefix">AUTHOR:</span> <a href="http://samplesite.com/authors/Jack" target="_blank">JACK</a> </p> <p><span class="prefix">JACK SAYS:</span> “blah blah blah etc.”</p></div>
What I want to do is strip everything out so I end up with simply "JACK SAYS: 'blah blah blah'" and none of the other junk.
One reason for the difficulty is that I have like 100 entries that have different dates, different AUTHOR info, etc. and I want to do the same thing to each entry.
(I would post what I've tried already except none of it has worked out.) If you can solve this it will be much appreciated.