Many of us need to deal with user input, search queries, and situations where the input text can potentially contain profanity or undesirable language. Oftentimes this needs
Profanity filters are a bad idea. The reason is that you can't catch every swear word. If you try, you get false-positives.
Let's just say you want to catch the F-Word. Easy, right? Well let's see.
You can loop through a string to find "fuck." Unfortunately, people trick filters nowadays. The profanity filter didn't pick up "fuk."
One can try to check for multiple spellings and variants of the word, but that will slow down your code's performance. To catch the F-Word, you need to look for "fuc", "Fuc", "fuk", "Fuk", "F***", etc. And the list goes on and on.
Okay, so how about make it case-insensitive and ignore spaces so it catches "F u C k"? That might sound like a good idea, but someone can just bypass the profanity filter with "F.U.C.K."
You ignore punctuation.
Now that is a real problem, since a sentence like "Hello, there!" will pick up as "hell," and "Whassup?" picks up as "ass."
And there're a bunch of words that you have to exclude from the filter, such as "Constitution," because there's "tit" in it.
People can also use substitute words, such as "Frack." You block that too? What about "pen is" for "penis"? Your program doesn't have artificial intelligence to know whether the string is good or bad.
Don't use profanity filters. They're hard to develop, and they're as slow as a crawl.