Neural networks for email spam detection

前端 未结 4 840
半阙折子戏
半阙折子戏 2020-12-23 18:07

Let\'s say you have access to an email account with the history of received emails from the last years (~10k emails) classified into 2 groups

  • genuine email
4条回答
  •  半阙折子戏
    2020-12-23 18:39

    Chad, the answers you've gotten so far are reasonable, but I'll respond to your update that:

    I am set on using neural networks as the main aspect on the project is to test how the NN approach would work for spam detection.

    Well, then you have a problem: an empirical test like this can't prove unsuitability.

    You're probably best off learning a bit about what NN actually do and don't do, to see why they are not a particularly good idea for this sort of classification problem. Probably a helpful way to think about them is as universal function approximators. But for some idea of how this all fits together in the area of classification (which is what the spam filtering problem is), browsing an intro text like pattern classification might be helpful.

    Failing that if you are dead set on seeing it run, just use any general NN library for the network itself. Most of your issue is going to be how to represent the input data anyway. The `best' structure is non-obvious, and it probably doesn't matter that much. The inputs are going to have to be a number of (normalized) measurements (features) on the corpus itself. Some are obvious (counts of 'spam' words, etc), some much less so. This is the part you can really play around with, but you should expect to do poorly compared to Bayesian filters (which have their own problems here) due to the nature of the problem.

提交回复
热议问题