Testing the Keras sentiment classification with model.predict

不问归期 提交于 2019-12-06 08:07:20

So what you basically need to do is as follows:

  1. Tokenize sequnces: convert the string into words (features): For example: "hello my name is georgio" to ["hello", "my", "name", "is", "georgio"].
  2. Next, you want to remove stop words (check Google for what stop words are).
  3. This stage is optional, it may lead to faulty results but I think it worth a try. Stem your words (features), that way you'll reduce the number of features which will lead to a faster run. Again, that's optional and might lead to some failures, for example: if you stem the word 'parking' you get 'park' which has a different meaning.
  4. Next thing is to create a dictionary (check Google for that). Each word gets a unique number and from this point we will use this number only.
  5. Computers understand numbers only so we need to talk in their language. We'll take the dictionary from stage 4 and replace each word in our corpus with its matching number.
  6. Now we need to split our data set to two groups: training and testing sets. One (training) will train our NN model and the second (testing) will help us to figure out how good is our NN. You can use Keras' cross validation function.
  7. Next thing is defining whats the max number of features our NN can get as an input. Keras call this parameter - 'maxlen'. But you don't really have to do this manually, Keras can do that automatically just by searching for the longest sentence you have in your corpus.
  8. Next, let's say that Keras found out that the longest sentence in your corpus has 20 words (features) and one of your sentences is the example in the first stage, which its length is 5 (if we'll remove stop words it'll be shorter), in such case we'll need to add zeros, 15 zeros actually. This is called pad sequence, we do that so every input sequence will be in the same length.
Panuwat Assawinjaipetch

This might help. http://keras.io/models/

Here is an sample usage. How to use keras for XOR

Probably you have to convert ur corpus into ndarray first and throw it to your model.predict

From what it seem so far the model.predict input of the training model should be 100 words corpus which represent an index of each word in dictionary. So if you want to train it with ur corpus, you have to convert ur corpus according to those dictionary and see if the result is 0 or 1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!