How to remove non text chars from string PHP

你。 提交于 2019-12-08 04:03:50

问题


How can I replace chars like 🎧🎬 from a string? Sometime the YouTube video title contains characters like this. I don't want to replace characters like !@#$%^&*().

I am currently using preg_replace('/[^A-Za-z0-9\-]/', '', $VideoTitle);

Samples Array:

$VideoTitles[]='Sia 2017 Cheap Thrills 2017 live 🎧🎬'; 

$VideoTitles[]='TAYLOR SWIFT - SHAKE IT OFF 🎬🎧 #1989'; 

Expected Output:

Sia 2017 Cheap Thrills 2017 live 
TAYLOR SWIFT - SHAKE IT OFF #1989

回答1:


Code with sample input: Demo

$VideoTitles=[
    'Kilian à Dijon #4 • Vlog #2 • Primark again !? 🎬 - YouTube',
    'Funfesty 🎧 🎬 on Twitter: "Je commence à avoir mal à la tête à force',
    'Sia 2017 Cheap Thrills 2017 live 🎧🎬'
];

$VideoTitles=preg_replace('/[^ -\x{2122}]\s+|\s*[^ -\x{2122}]/u','',$VideoTitles);  // remove out of range characters and whitespace character on one side only

var_export($VideoTitles);

Output:

array (
  0 => 'Kilian à Dijon #4 • Vlog #2 • Primark again !? - YouTube',
  1 => 'Funfesty on Twitter: "Je commence à avoir mal à la tête à force',
  2 => 'Sia 2017 Cheap Thrills 2017 live',
)

The above regex pattern uses a character range from \x20-\x2122 (space to trade-mark-sign). I have selected this range because it should cover the vast majority of word-related characters including letters with accents and non-English characters. (Admittedly, it also includes many non-word-related characters. You may like to use two separate ranges for greater specificity like: /[^\x{20}-\x{60}\x{7B}-\x{FF}]/ui -- this case-insensitively searches two ranges: space to grave accent and left curly bracket to latin small letter y with diaeresis)

If you find that this range is unnecessarily generous or takes too long to process, you can make your own decision about the appropriate character range.

For instance, you might like the much lighter but less generous /[^\x20-\x7E]/u (from space to tilde). However, if you apply it to either of my above French $VideoTitles then you will mangle the text by removing legitimate letters.

Here is a menu of characters and their unicode numbers to help you understand what is inside the aforementioned ranges and beyond.

*And remember to include a unicode flag u after your closing delimiter.


For completeness, I should say the literal/narrow solution for removing the two emojis would be:

$VideoTitle=preg_replace('/[\x{1F3A7}\x{1F3AC}]/u','',$VideoTitle);  // omit 2 emojis

These emojis are called "clapper board (U+1F3AC)" and "headphone (U+1F3A7)".




回答2:


function removeEmoticon($text) {

$cleanText = "";

// Match Emoticons
$regexEmoticons = '/[\x{1F600}-\x{1F64F}]/u';
$cleanText     = preg_replace($regexEmoticons, '', $text);

// Match Miscellaneous Symbols and Pictographs
$regexSymbols = '/[\x{1F300}-\x{1F5FF}]/u';
$cleanText   = preg_replace($regexSymbols, '', $cleanText);

// Match Transport And Map Symbols
$regexTransport = '/[\x{1F680}-\x{1F6FF}]/u';
$cleanText     = preg_replace($regexTransport, '', $cleanText);

// Match Miscellaneous Symbols
$regexMisc  = '/[\x{2600}-\x{26FF}]/u';
$cleanText = preg_replace($regexMisc, '', $cleanText);

// Match Dingbats
$regexDingbats = '/[\x{2700}-\x{27BF}]/u';
$cleanText    = preg_replace($regexDingbats, '', $cleanText);

return $cleanText;

}



来源:https://stackoverflow.com/questions/43097087/how-to-remove-non-text-chars-from-string-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!