Bounced email parsing

淺唱寂寞╮ 提交于 2019-12-05 14:35:42

You could set up system lets an operator review messages, select strings, and then categorize from there. Eventually, you could hope to get that 1 in 10 down to 1 in 100 or 1 in 1,000. There are always going to be more and more corner cases here however.

Also not a definitive answer, but in a similar spirit to Kyle's response, you could use a bayes/token based spam filter to "learn" about bounce messages and then automatically route them to whatever you want to handle the bounced mail.

In other words, you have an account where you train spamassassin or spamprobe or whatever that a bunch of different bounce messages (and only bounce messages) are "junk", then let that spam system be a second line of filtering after whatever you've developed.

So, let's say your solution, the first filter, finds 90% of bounced messages. You have your system do whatever it normally does with bounces, then save them to a bounce-messages mailbox, which is periodically scanned by spamassasin/spamprobe to learn those messages as "junk".

You also then have spamassassin or spamprobe or whatever as a second filter (run on anything yours doesn't flag as a bounce) do its own estimation of bounced-ness, and whatever it considers "junk" (because you've trained to to think bounce = junk), you also route to your program etc.

Still requires a little bit of manual review, but in theory it should get better and better over time as you rely on the spam system's learning to account for the edge cases.

We are facing the same problem, but neither did not find any "perfect" solution. I think you

  • could either use some service provider (with a proper mail API) - this would let you "outsource" the problem and give you a high detection rate or
  • use some simple filter to catch at least (say) 80% of the bounces. In our setup, this was enough to keep our database in a reasonable state.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!