nominal-value inputs for Neural Network

痴心易碎 提交于 2019-12-06 14:00:32

Since you don't provide much detail, my answer can't be very specific.

Generally speaking neural networks tend to perform worse when coding nominal values as numeric values since the transformation will impose a (probably) false ordering on the variables. Mixing inputs with very varied levels also tend to worsen the performance.

However, given the little information provided here there is no way of telling if this is the reason that the networks performance is "not as amazing" as you want. It could just as well be the case that you don't have enough training data, or that your training data contains a lot of noise. Perhaps you need to pre-scale your data, perhaps there is an error in your network code, perhaps you have chosen ill-suited values of constants for your learning algorithm...

The reasons a neural network doesn't perform as expected are many and diverse (on of them beeing unreasonably high expectations). Without much more information there is no way of knowing what the problem is in your case.

As a general note, a better way for coding nominal values would be a binary vector. In your case, in addition to the 4 continuous-valued inputs, you'd have 8 binary input neurons, where only one is activated (1) and the other 7 are inactive.

The way you did it implies an artificial relationship between the computation methods, which is almost certainly an artifact. For example, 1 and 2 are numerically (and from your network's point of view!) nearer than 1 and 8. But are the methods nr. 1 and 2 really more similar, or related, than the methods 1 and 8?

Mapping categories to numerical values is not a good practice in statistics. Especially in the case of neural networks. Bear in mind that neural networks tend to map similar inputs to similar outputs. If you map category A to 1 and category B to 2 (both as inputs), the NN will try to output similar values for both categories, even if they have nothing to do with each other.

A sparser representation is preferred. If you have 4 categories, map them like this:

A -> 0001

B -> 0010

etc

Take a look at the "Subject: How should categories be encoded?" in this link: ftp://ftp.sas.com/pub/neural/FAQ2.html#A_cat

The previous answers are right - do not map nominal values into arbitrary numeric ones. However, if the attribute has an ordinal nature ("Low", "Medium", High" for example), you can replace the nominal values by ascending numeric values. Note that this may not be the optimal solution - since there is no guarantee for example that "High"=3 by the nature of your data. Instead, use one-hot bit encoding as suggested. The reason for this is that a neural network is very similar to regression in the sense that multiple numeric values go through some kind of an aggregating function - but this happens multiple times. Each input is also multiplied by a weight. So when you enter a numeric value, it undergoes a series of mathematical manipulations that adjusts its weights in the network. So if you use numeric values for non-nomial data - nominal values that were mapped to closer numeric values will be treated about the same in the best case, in the worst case - it can harm your model.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!