Split a text into single words

后端 未结 6 744
野趣味
野趣味 2020-11-30 07:56

I would like to split a text into single words using PHP. Do you have any idea how to achieve this?

My approach:

function tokenizer($text) {
    $tex         


        
6条回答
  •  盖世英雄少女心
    2020-11-30 08:46

    I would first make the string to lower-case before splitting it up. That would make the i modifier and the array processing afterwards unnecessary. Additionally I would use the \W shorthand for non-word characters and add a + multiplier.

    $text = 'This is an example text, it contains commas and full stops. Exclamation marks, too! Question marks? All punctuation marks you know.';
    $result = preg_split('/\W+/', strtolower($text), -1, PREG_SPLIT_NO_EMPTY);
    

    Edit   Use the Unicode character properties instead of \W as marcog suggested. Something like [\p{P}\p{Z}] (punctuation and separator characters) would cover the characters more specific than \W.

提交回复
热议问题