Anybody up for some regexp help?

nis · Jun 14, 2007

In that noble quest for non-duplication of content I thought a nice little thing.

If I write a text about, oh lets say bunnies. It would be nice to upload it to several places and have it different each place (Of course with links to my sites about bunnies and stuffed with keywords (the text of course, not the bunnies (Ooh! Nested paragraphs in a sentence, who's still with me?

)))

So what if I wrote a small text like this: "Bunnies are %small|furry|delicious in a stew|dangerous|%." and had a quick regexp searching for a pattern matching keyword|+ inside and replacing the whole block with just one of the keywords/phrases inside it.

An example of the text example run through the regexp could be:
"Bunnies are furry." or
"Bunnies are delicious in a stew."

It can probably be done with several commands, but, hell, regular expressions are so powerful they must be able to do this too

Can it be done?

BlueMendicant · Jun 14, 2007

PHP:

//$page is the page of text
$matches = array();
preg_match_all("/%(.*)%/", $page, $matches);
$matches = split("|", $matches);
srand((double)microtime()*1000000);
echo $matches[rand(0,count($matches))];

didn't test it, should randomly echo one of the matches, you can figure out the rest...I apologize if there are any bugs.

nis · Jun 14, 2007

You beat me

I was doing something along the same lines to pass the time.
Here is what I came up with:

Code:

<?php
	$text = 'Bunnies are %small|furry|delicious in a stew|dangerous%, and taste like %shit|chicken|gumdrops%.';

	echo uniqify($text);
	
	function uniqify($text)
	{
		preg_match_all('/\%.+?\%/', $text, $matches);
		foreach ($matches[0] as $m) {
			$m = str_replace('%', '', $m);
			$ms = explode('|', $m);
			$text = str_replace('%'.$m.'%', $ms[rand(0, count($ms)-1)], $text);
		}
		return $text;
	}
?>

Here is a couple of samples:

Bunnies are dangerous, and taste like chicken.
Bunnies are furry, and taste like chicken.
Bunnies are delicious in a stew, and taste like chicken.
Bunnies are delicious in a stew, and taste like shit.
Bunnies are small, and taste like shit.
Bunnies are furry, and taste like shit.
Bunnies are dangerous, and taste like gumdrops.
Bunnies are furry, and taste like chicken.
Bunnies are small, and taste like gumdrops.
Bunnies are furry, and taste like shit.
Bunnies are furry, and taste like chicken.
Bunnies are delicious in a stew, and taste like chicken.

Tomorrow, I will do a Ruby version.

BlueMendicant · Jun 14, 2007

It'd be interesting to hook a similar script up to a thesaurus, then you have a nice blackhat script to rewrite content automatically. Run it wikipedia through it and you have an entire bh site. It'll be more readable than just markov text too.

aim · Jun 14, 2007

I used to play this game about 10 years ago. I think it was called MadLibs ;-)

The problem here is that there is still going to be a lot of manual work to associate the main keyword (bunnies) with their adjectives... but I guess that is where the thesaurus idea could come in.

krazyjosh5 · Jun 14, 2007

regexbuddy really is your buddy

nis · Jun 15, 2007

BlueMendicant said:
It'd be interesting to hook a similar script up to a thesaurus, then you have a nice blackhat script to rewrite content automatically. Run it wikipedia through it and you have an entire bh site. It'll be more readable than just markov text too.

Well... How would the script find out which words to look up in the thesaurus? If you are going to mark them manually anyway, its not much more work to find a few replacement words yourself. Plus the end result will be so much more readable.

I did a search for "net" in a thesaurus. Here is what came up:

cyberspace, earnings, internet, lucre, mesh, meshing, meshwork, net income, net profit, network, profit, profits
clear, nett, sack, sack up, web
final, last, nett

All of them a synonyms for net, only some are valid in any given context:

$text = 'The %cyberspace|earnings|internet|lucre|mesh|meshing|meshwork|net income|net profit|network|profit|profits|clear|nett|sack|sack up|web|final|last|nett%, is the best place to be online.'; =>

The sack up, is the best place to be online.
The network, is the best place to be online.
The internet, is the best place to be online.
The net profit, is the best place to be online.
The sack, is the best place to be online.
The web, is the best place to be online.
The meshwork, is the best place to be online.
The clear, is the best place to be online.
The profit, is the best place to be online.
The cyberspace, is the best place to be online.
The lucre, is the best place to be online.
The cyberspace, is the best place to be online.
The internet, is the best place to be online.
The net income, is the best place to be online.
The profit, is the best place to be online.
The net profit, is the best place to be online.
The web, is the best place to be online.
The sack up, is the best place to be online.
The net profit, is the best place to be online.

BlueMendicant · Jun 15, 2007

If it's a BH/MFA site you don't really care how much sense the content makes as long as it ranks for some really long tail keyword. If you were trying to build a quality content site you wouldn't be doing this in the first place. Of course if you want something better than complete garbage, the thesaurus is not the way to go.

web2jerk · Jun 17, 2007

i am also interested.This is a very hard question for me,but i am very interested who is the best solution!who know post here!I wait a great post!

nis · Jun 17, 2007

web2jerk said:
i am also interested.This is a very hard question for me,but i am very interested who is the best solution!who know post here!I wait a great post!

Well... So far nobody have come up with a pure regexp solution. But I made a usable php-function. As you can see above.

web2jerk · Jun 17, 2007

no,i think that nobady have a good regexo solution...well,who have a good idea please post here!

ashbeats · Jun 18, 2007

Another function that accomplishes the same thing.

Code:

<?php

$subject = "Bunnies are %small|furry|delicious in a stew|dangerous|% and they like to %fly above|run under|play with|kill|% girafes";

echo replacer($subject); 

function replacer($subject)
{
if (preg_match_all('/%[\\w \\|]*?%/', $subject, $regs)) {
  foreach ($regs[0] as $m) {
        if (preg_match_all('/(?=%|\\|)([%\\|])([^|\\t\\r\\n]*)(?:\\|%)?/', $m, $regs)) {
         $subject = str_replace($m, $regs[2][rand(0, count($regs[2])-1)], $subject);     
        }         
  }
 return $subject;
} 
}
?>

nis · Jun 18, 2007

Almost the same thing. With your version you have to put a | after the last word in the list.

%small|furry|% Will work.
%small|furry% Will not work.

I know I put it that way in the original post as I thought it would be easier when using only regexp. But took it out in the function I made. Not having to put a trailing | on the last word makes it more "natural" so its easier to write for us mere humans. Which I think is better for the purposes I am going to use it for.

Yours might be easier to use in a fully automated environment.

ashbeats · Jun 18, 2007

Code:

<?php

$subject = "Bunnies are %small|furry|delicious in a stew|dangerous% and they like to %|fly above|run under|play with|kill|% girafes";

echo replacer($subject); 

function replacer($subject)
{
if (preg_match_all('/%[\\w \\|]*?%/', $subject, $regs)) {
  foreach ($regs[0] as $m) {
        if (preg_match_all('/(?=%|\\|)([%\\|])([^|\\t\\r\\n%]*)(?:\\|?%)?/', $m, $regs)) {
         $subject = str_replace($m, $regs[2][rand(0, count($regs[2])-1)], $subject);     
        }         
  }
 return $subject;
} 
}
?>

DomainRealty · Jun 18, 2007

web2jerk said:
no,i think that nobady have a good regexo solution...well,who have a good idea please post here!

Here's my solution:

1. Go to fucking school and learn English.
2. Go learn some fucking PHP and PERL
3. Read the ENTIRE forum.
4. Rinse, lather and repeat.

dealasite · Jun 18, 2007

DomainRealty said:
Here's my solution:

1. Go to fucking school and learn English.
2. Go learn some fucking PHP and PERL
3. Read the ENTIRE forum.
4. Rinse, lather and repeat.

lol

nis · Jun 18, 2007

DomainRealty said:
Here's my solution:

1. Go to fucking school and learn English.
2. Go learn some fucking PHP and PERL
3. Read the ENTIRE forum.
4. Rinse, lather and repeat.

I thought about saying something like that, but then I figured that someone else might have more fun doing it than me

Oh, and Ashbeats it seems like you have way better regexp-fu than I. I take one look at at that second preg_matc_all pattern and I get a headache

ashbeats · Jun 18, 2007

Regex buddy is a good friend.

Search

Search

Anybody up for some regexp help?

nis

New member

BlueMendicant

JavaScript Developer

nis

New member

BlueMendicant

JavaScript Developer

aim

New member

krazyjosh5

theres GOLD in dem tubes!

nis

New member

BlueMendicant

JavaScript Developer

web2jerk

Banned

nis

New member

web2jerk

Banned

ashbeats

Member

nis

New member

ashbeats

Member

DomainRealty

I'm a Coder

dealasite

proud interweb member

nis

New member

ashbeats

Member