How do you implement a good profanity filter?

后端 未结 21 2648
误落风尘
误落风尘 2020-11-22 04:27

Many of us need to deal with user input, search queries, and situations where the input text can potentially contain profanity or undesirable language. Oftentimes this needs

21条回答
  •  独厮守ぢ
    2020-11-22 05:10

    Profanity filters are a bad idea. The reason is that you can't catch every swear word. If you try, you get false-positives.

    Catching Words

    Let's just say you want to catch the F-Word. Easy, right? Well let's see.

    You can loop through a string to find "fuck." Unfortunately, people trick filters nowadays. The profanity filter didn't pick up "fuk."

    One can try to check for multiple spellings and variants of the word, but that will slow down your code's performance. To catch the F-Word, you need to look for "fuc", "Fuc", "fuk", "Fuk", "F***", etc. And the list goes on and on.

    Avoiding Innocence

    Okay, so how about make it case-insensitive and ignore spaces so it catches "F u C k"? That might sound like a good idea, but someone can just bypass the profanity filter with "F.U.C.K."

    You ignore punctuation.

    Now that is a real problem, since a sentence like "Hello, there!" will pick up as "hell," and "Whassup?" picks up as "ass."

    And there're a bunch of words that you have to exclude from the filter, such as "Constitution," because there's "tit" in it.

    People can also use substitute words, such as "Frack." You block that too? What about "pen is" for "penis"? Your program doesn't have artificial intelligence to know whether the string is good or bad.

    Don't use profanity filters. They're hard to develop, and they're as slow as a crawl.

提交回复
热议问题