Need a way to chop keywords

Status
Not open for further replies.

ummdav1d

New member
Dec 15, 2007
84
0
0
Hey, I run into this problem a lot. Say I have a lengthy list of keywords like this:

<asdjoashdjkasfasdasfa>Keyword I want
<akdjasljkhgkjhnaskdlasasdasks>Keyword I want again
<askodjljaslfnaldaslk>Another keyword I watn

etc.

Usually I just use a text editor's search and replace option to get rid of all the extra stuff.. In cases like these though, I want a way to define custom things to look for to get rid of. Basically, look for opening '<' and somehow have a wildcard for both the content and number of characters and then look for the closing '>'

I imagine there's something out there. This is just one of my examples.. there have been times where I just sit here for 45 minutes cleaning them up manually.

I guess I could program a script if I really had to, but I'm rather rusty and was hoping someone knew of an existing option. Thanks
 


can you be a little more specific with what your typical raw keyword data looks like, then what it looks like after you clean it up?
 
Usually I just use a text editor's search and replace option to get rid of all the extra stuff.. In cases like these though, I want a way to define custom things to look for to get rid of. Basically, look for opening '<' and somehow have a wildcard for both the content and number of characters and then look for the closing '>'

It sounds like a regular expression will get you there.

You could do replacements like <.*>

If that regex ends up consuming most of your document then you will need to Google "regular expression greedy".

You can control the number of characters on a wildcard using {n,m}

So something like .*{2,6} will match any character sequence with a length of 2 to 6 characters. E.g. <.*{2,6}>

There are a couple of text editors that I know of that will replace text using regular expressions. JEdit is free and well regarded.

I hope that helps.
 
Here's one way (may not be the best way, but I use this occasionally to get rid of excess data). Save the list as a .txt file, and then go into Excel and open the file. Excel will then bring up an import dialog box, and choose "Delimited" option under "Original Data Type", and hit next. On the next screen, uncheck "Tab", check "Other", and type > in the "Other" field, and hit Finish. That'll break the data up into 2 columns, the first being the stuff inside the brackets, and the second being the keywords. You can copy the second column to whatever you want.
 
  • Like
Reactions: andyt
Excellent guys, Thanks a lot. I downloaded JEdit and it seems to have some great features that I can definitely see myself using in the future.. I used the excel method to manipulate this particular list. Thanks again guys!
 
If you're just getting rid of HTML tags, the php strip_tags function would work fine. PM me if you need me to fire up a quick script for you.

- Q.
 
Status
Not open for further replies.