For the purpose of identifying [possible] bot-generated usernames.
Suppose you have a username like \"bilbomoothof\" .. it may be nonsense, but it still contains pro
Off the top of my head, you could look for syllables, making use of soundex. That's the direction I would explore, based on the assumption that a pronounceable word has at least one syllable.
EDIT: Here's a function for counting syllables:
function count_syllables($word) {
$subsyl = Array(
'cial'
,'tia'
,'cius'
,'cious'
,'giu'
,'ion'
,'iou'
,'sia$'
,'.ely$'
);
$addsyl = Array(
'ia'
,'riet'
,'dien'
,'iu'
,'io'
,'ii'
,'[aeiouym]bl$'
,'[aeiou]{3}'
,'^mc'
,'ism$'
,'([^aeiouy])\1l$'
,'[^l]lien'
,'^coa[dglx].'
,'[^gq]ua[^auieo]'
,'dnt$'
);
// Based on Greg Fast's Perl module Lingua::EN::Syllables
$word = preg_replace('/[^a-z]/is', '', strtolower($word));
$word_parts = preg_split('/[^aeiouy]+/', $word);
foreach ($word_parts as $key => $value) {
if ($value <> '') {
$valid_word_parts[] = $value;
}
}
$syllables = 0;
// Thanks to Joe Kovar for correcting a bug in the following lines
foreach ($subsyl as $syl) {
$syllables -= preg_match('~'.$syl.'~', $word);
}
foreach ($addsyl as $syl) {
$syllables += preg_match('~'.$syl.'~', $word);
}
if (strlen($word) == 1) {
$syllables++;
}
$syllables += count($valid_word_parts);
$syllables = ($syllables == 0) ? 1 : $syllables;
return $syllables;
}
From this very interesting link:
http://www.addedbytes.com/php/flesch-kincaid-function/