Wikipedia defines a lot of possible emoticons people can use. I want to match this list to words in a string. I now have this:
$string = \"Lorem ipsum :-) do
I would start trying out the simplest implementation first, using str_replace
and those arrays with spaces. If the performance is unacceptable, try a single regular expression per emotion. That compresses things quite a bit:
$emoticons = array(
'[HAPPY]' => ' [:=]-?[\)\]] ',
'[SAD]' => ' [:=]-?[\(\[\|] '
);
If performance is still unacceptable, you can use something fancier, like a suffix tree (see: http://en.wikipedia.org/wiki/Suffix_tree ), which allows you to scan the string only once for all emoticons. The concept is simple, you have a tree whose root is a space (since you want to match a space before the emoticon), the first children are ':' and '=', then children of ':' are ']', ')', '-', etc. You have a single loop that scans the string, char by char. When you find a space, you move to the next level in the tree, then see if the next character is one of the nodes at that level (':' or '='), if so, move to the next level, etc. If, at any point, the current char is not a node in the current level, you go back to root.