this is entering the realm of NLP but if you want a fairly simple and effective approach...
establish max keyword length (i.e. num words).
normalize your text, i.e. convert all to lower case, stem the words (turn 'cats' into cat so they can compare correctly).
remove stopwords (google it to get a list of common ones).
create a separate array for 1, 2, 3 etc word combos, use the actual word combo as the array key and pass through the text creating a count for every combo so you get a frequency count. Should be something pretty easy like
Code:
$wordCombo = getWordCombo($wordPosition); //you'll have write that function yourself of course.
if ($combos[$wordCombo] == null)
$combos[$wordCombo] = 1;
else
$combos[$wordCombo] = $combos[$wordCombo] + 1;
average the frequency count for each length of word combo, then divide each individual combo's frequency by the average frequency to give you a relative frequency.
last step which is very useful for establishing relative importance of the combination Vs the constituent words... for each word in each word combo, divide the number of times it appears in that combo by the total number of times it appears as a single word. Do that for each word in the combo and multiply result by the relative frequency you calced earlier then multiply by the length of the word combo to cancel out the effect of multiple divisions (this last step should make your 3 word combo scores comparable to your 2 word and 4 words scores) with a slight weighting in favor of longer word combos...
sort by score...
this might be helpful in getting your head around the important concepts here:
LingPipe: Signicant Phrases Tutorial
It sounds a lot tricker than it is... just break up your code into logical chunks and build one a time making sure you've got the right output for each stage before you move onto the next...