What is the best way to remove punctuation marks, symbols, diacritics, special characters?

雨燕双飞 提交于 2019-12-30 07:59:09

问题


I use these lines of code to remove all punctuation marks, symbols, etc as you can see them in the array,

$pattern_page = array("+",",",".","-","'","\"","&","!","?",":",";","#","~","=","/","$","£","^","(",")","_","<",">");

$pg_url = str_replace($pattern_page, ' ', strtolower($pg_url));

but I want to make it simpler as it looks silly to list all the stuff I want to remove in the array as there might be some other special characters I want to remove.

I thought of using the regular expression below,

$pg_url = preg_replace("/\W+/", " ", $pg_url);

but it doesn't remove under-score - _

What is the best way to remove all these stuff? Can regular expression do that?


回答1:


Depending on how greedy you'd like to be, you could do something like:

$pg_url = preg_replace("/[^a-zA-Z 0-9]+/", " ", $pg_url);

This will replace anything that isn't a letter, number or space.




回答2:


Use classes:

preg_replace('/[^[:alpha:]]/', '', $input);

Would remove anything that's not considered a "character" by the currently set locale. If it's punctuation, you seek to eliminate, the class would be [:punct:].

\W means "any non-word character" and is the opposite of \w which includes underscores (_).



来源:https://stackoverflow.com/questions/4762546/what-is-the-best-way-to-remove-punctuation-marks-symbols-diacritics-special-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!