How do you implement a good profanity filter?

后端未结

关注

 21  2648

误落风尘 2020-11-22 04:27

Many of us need to deal with user input, search queries, and situations where the input text can potentially contain profanity or undesirable language. Oftentimes this needs

21条回答

独厮守ぢ (楼主)

2020-11-22 05:10

Profanity filters are a bad idea. The reason is that you can't catch every swear word. If you try, you get false-positives.

Catching Words

Let's just say you want to catch the F-Word. Easy, right? Well let's see.

You can loop through a string to find "fuck." Unfortunately, people trick filters nowadays. The profanity filter didn't pick up "fuk."

One can try to check for multiple spellings and variants of the word, but that will slow down your code's performance. To catch the F-Word, you need to look for "fuc", "Fuc", "fuk", "Fuk", "F***", etc. And the list goes on and on.

Avoiding Innocence

Okay, so how about make it case-insensitive and ignore spaces so it catches "F u C k"? That might sound like a good idea, but someone can just bypass the profanity filter with "F.U.C.K."

You ignore punctuation.

Now that is a real problem, since a sentence like "Hello, there!" will pick up as "hell," and "Whassup?" picks up as "ass."

And there're a bunch of words that you have to exclude from the filter, such as "Constitution," because there's "tit" in it.

People can also use substitute words, such as "Frack." You block that too? What about "pen is" for "penis"? Your program doesn't have artificial intelligence to know whether the string is good or bad.

Don't use profanity filters. They're hard to develop, and they're as slow as a crawl.

0 讨论(0)

查看其它21个回答
发布评论:

提交评论
- 加载中...

How do you implement a good profanity filter?

Catching Words

Avoiding Innocence