How do you implement a good profanity filter?

后端 未结 21 2649
误落风尘
误落风尘 2020-11-22 04:27

Many of us need to deal with user input, search queries, and situations where the input text can potentially contain profanity or undesirable language. Oftentimes this needs

21条回答
  •  独厮守ぢ
    2020-11-22 05:26

    Regarding your "trick the system" subquestion, you can handle that by normalizing both the "bad word" list and the user-entered text before doing your search. e.g., Use a series of regexes (or tr if PHP has it) to convert [z$5] to "s", [4@] to "a", etc., then compare the normalized "bad word" list against the normalized text. Note that the normalization could potentially lead to additional false positives, although I can't think of any actual cases at the moment.

    The larger challenge is to come up with something that will let people quote "The pen is mightier than the sword" while blocking "p e n i s".

提交回复
热议问题