问题
I search a machine learning method detecting some question.
Example,
User: Please tell me your name ?
AI : (AI find User want to know his name)
My name is [AI's name].
My dataset is as follows.
[label], [question]
1 , What's your name?
1 , Tell me your name.
...
But the problem is to include something that is not a question in the input.
Example,
User: Hello, my name is [User name]
AI : (this is not a question)
(throw another process)
(->) Nice to meet you.
The number of Question's categories is 10~20, but the number of sentences which is not a question is too many.
Do you know how to solve this question Or any task related to this?
回答1:
You'll probably want to factor the problem into three parts.
- First, you'll want to map the arbitrary-length sequence of text onto a fixed-length vector. For this, you might look at Le and Mikolov's "Distributed Representations of Sentences and Documents"
- Once you have that, the problem reduces to a simple classification task. You have a set of vectors and the categories each maps to. A network with one hidden layer and a softmax output layer should probably be sufficient. This will give you a distribution over categories.
- Finally, you need to determine a confidence level for each prediction. There are two broad approaches that come to mind.
- First, you can introduce a new category for "miscellaneous", and add any sentences that don't fit into one of the "real" categories to it. The weakness of this approach is that this class is actually the union of many unrelated classes and thus might be difficult to learn, since the points will be scattered all over the paragraph vector space rather than existing in a nice cluster.
- Alternately, you might look at the values of the output neurons before you normalize using softmax, and output "unknown category" if the value for the most likely category doesn't exceed some threshold. You'll probably want to tune the threshold value to maximize accuracy over a validation set.
I can't guarantee that this will work, but it's the approach I would try first.
来源:https://stackoverflow.com/questions/53183467/nlpnatural-language-processing-how-to-detect-question-with-any-method