Anybody up for some regexp help?

Status
Not open for further replies.

nis

New member
Mar 4, 2007
357
4
0
In that noble quest for non-duplication of content I thought a nice little thing.

If I write a text about, oh lets say bunnies. It would be nice to upload it to several places and have it different each place (Of course with links to my sites about bunnies and stuffed with keywords (the text of course, not the bunnies (Ooh! Nested paragraphs in a sentence, who's still with me?;) )))

So what if I wrote a small text like this: "Bunnies are %small|furry|delicious in a stew|dangerous|%." and had a quick regexp searching for a pattern matching keyword|+ inside and replacing the whole block with just one of the keywords/phrases inside it.

An example of the text example run through the regexp could be:
"Bunnies are furry." or
"Bunnies are delicious in a stew."

It can probably be done with several commands, but, hell, regular expressions are so powerful they must be able to do this too :)

Can it be done?
 


PHP:
//$page is the page of text
$matches = array();
preg_match_all("/%(.*)%/", $page, $matches);
$matches = split("|", $matches);
srand((double)microtime()*1000000);
echo $matches[rand(0,count($matches))];
didn't test it, should randomly echo one of the matches, you can figure out the rest...I apologize if there are any bugs.
 
Last edited:
You beat me :)
I was doing something along the same lines to pass the time.
Here is what I came up with:

Code:
<?php
	$text = 'Bunnies are %small|furry|delicious in a stew|dangerous%, and taste like %shit|chicken|gumdrops%.';

	echo uniqify($text);
	
	function uniqify($text)
	{
		preg_match_all('/\%.+?\%/', $text, $matches);
		foreach ($matches[0] as $m) {
			$m = str_replace('%', '', $m);
			$ms = explode('|', $m);
			$text = str_replace('%'.$m.'%', $ms[rand(0, count($ms)-1)], $text);
		}
		return $text;
	}
?>

Here is a couple of samples:

Bunnies are dangerous, and taste like chicken.
Bunnies are furry, and taste like chicken.
Bunnies are delicious in a stew, and taste like chicken.
Bunnies are delicious in a stew, and taste like shit.
Bunnies are small, and taste like shit.
Bunnies are furry, and taste like shit.
Bunnies are dangerous, and taste like gumdrops.
Bunnies are furry, and taste like chicken.
Bunnies are small, and taste like gumdrops.
Bunnies are furry, and taste like shit.
Bunnies are furry, and taste like chicken.
Bunnies are delicious in a stew, and taste like chicken.

Tomorrow, I will do a Ruby version.
 
It'd be interesting to hook a similar script up to a thesaurus, then you have a nice blackhat script to rewrite content automatically. Run it wikipedia through it and you have an entire bh site. It'll be more readable than just markov text too.
 
I used to play this game about 10 years ago. I think it was called MadLibs ;-)

The problem here is that there is still going to be a lot of manual work to associate the main keyword (bunnies) with their adjectives... but I guess that is where the thesaurus idea could come in.
 
It'd be interesting to hook a similar script up to a thesaurus, then you have a nice blackhat script to rewrite content automatically. Run it wikipedia through it and you have an entire bh site. It'll be more readable than just markov text too.

Well... How would the script find out which words to look up in the thesaurus? If you are going to mark them manually anyway, its not much more work to find a few replacement words yourself. Plus the end result will be so much more readable.

I did a search for "net" in a thesaurus. Here is what came up:

cyberspace, earnings, internet, lucre, mesh, meshing, meshwork, net income, net profit, network, profit, profits
clear, nett, sack, sack up, web
final, last, nett

All of them a synonyms for net, only some are valid in any given context:

$text = 'The %cyberspace|earnings|internet|lucre|mesh|meshing|meshwork|net income|net profit|network|profit|profits|clear|nett|sack|sack up|web|final|last|nett%, is the best place to be online.'; =>

The sack up, is the best place to be online.
The network, is the best place to be online.
The internet, is the best place to be online.
The net profit, is the best place to be online.
The sack, is the best place to be online.
The web, is the best place to be online.
The meshwork, is the best place to be online.
The clear, is the best place to be online.
The profit, is the best place to be online.
The cyberspace, is the best place to be online.
The lucre, is the best place to be online.
The cyberspace, is the best place to be online.
The internet, is the best place to be online.
The net income, is the best place to be online.
The profit, is the best place to be online.
The net profit, is the best place to be online.
The web, is the best place to be online.
The sack up, is the best place to be online.
The net profit, is the best place to be online.
 
If it's a BH/MFA site you don't really care how much sense the content makes as long as it ranks for some really long tail keyword. If you were trying to build a quality content site you wouldn't be doing this in the first place. Of course if you want something better than complete garbage, the thesaurus is not the way to go.
 
i am also interested.This is a very hard question for me,but i am very interested who is the best solution!who know post here!I wait a great post!
 
i am also interested.This is a very hard question for me,but i am very interested who is the best solution!who know post here!I wait a great post!

Well... So far nobody have come up with a pure regexp solution. But I made a usable php-function. As you can see above.
 
Another function that accomplishes the same thing. :)

Code:
<?php

$subject = "Bunnies are %small|furry|delicious in a stew|dangerous|% and they like to %fly above|run under|play with|kill|% girafes";

echo replacer($subject); 

function replacer($subject)
{
if (preg_match_all('/%[\\w \\|]*?%/', $subject, $regs)) {
  foreach ($regs[0] as $m) {
        if (preg_match_all('/(?=%|\\|)([%\\|])([^|\\t\\r\\n]*)(?:\\|%)?/', $m, $regs)) {
         $subject = str_replace($m, $regs[2][rand(0, count($regs[2])-1)], $subject);     
        }         
  }
 return $subject;
} 
}
?>
 
Almost the same thing. With your version you have to put a | after the last word in the list.

%small|furry|% Will work.
%small|furry% Will not work.

I know I put it that way in the original post as I thought it would be easier when using only regexp. But took it out in the function I made. Not having to put a trailing | on the last word makes it more "natural" so its easier to write for us mere humans. Which I think is better for the purposes I am going to use it for.

Yours might be easier to use in a fully automated environment.
 
Hey nis,

lol.. that was my mistake. Fixed it now. works with
- "%small|big|furry|%"
- "%small|big|furry%"

Code:
<?php

$subject = "Bunnies are %small|furry|delicious in a stew|dangerous% and they like to %|fly above|run under|play with|kill|% girafes";

echo replacer($subject); 

function replacer($subject)
{
if (preg_match_all('/%[\\w \\|]*?%/', $subject, $regs)) {
  foreach ($regs[0] as $m) {
        if (preg_match_all('/(?=%|\\|)([%\\|])([^|\\t\\r\\n%]*)(?:\\|?%)?/', $m, $regs)) {
         $subject = str_replace($m, $regs[2][rand(0, count($regs[2])-1)], $subject);     
        }         
  }
 return $subject;
} 
}
?>
 
Here's my solution:

1. Go to fucking school and learn English.
2. Go learn some fucking PHP and PERL
3. Read the ENTIRE forum.
4. Rinse, lather and repeat.

I thought about saying something like that, but then I figured that someone else might have more fun doing it than me :)

Oh, and Ashbeats it seems like you have way better regexp-fu than I. I take one look at at that second preg_matc_all pattern and I get a headache ;)
 
Status
Not open for further replies.