NLP(natural language processing) How to detect question with any method?

醉酒当歌 提交于 2020-04-12 07:35:31

问题


I search a machine learning method detecting some question.

Example,

 User: Please tell me your name ?     
 AI  : (AI find User want to know his name)   
       My name is [AI's name]. 

My dataset is as follows.

[label], [question]    
   1   , What's your name?    
   1   , Tell me your name.
   ...

But the problem is to include something that is not a question in the input.

Example,

User: Hello, my name is [User name]
AI  : (this is not a question)    
      (throw another process)
      (->) Nice to meet you.

The number of Question's categories is 10~20, but the number of sentences which is not a question is too many.

Do you know how to solve this question Or any task related to this?


回答1:


You'll probably want to factor the problem into three parts.

  • First, you'll want to map the arbitrary-length sequence of text onto a fixed-length vector. For this, you might look at Le and Mikolov's "Distributed Representations of Sentences and Documents"
  • Once you have that, the problem reduces to a simple classification task. You have a set of vectors and the categories each maps to. A network with one hidden layer and a softmax output layer should probably be sufficient. This will give you a distribution over categories.
  • Finally, you need to determine a confidence level for each prediction. There are two broad approaches that come to mind.
    1. First, you can introduce a new category for "miscellaneous", and add any sentences that don't fit into one of the "real" categories to it. The weakness of this approach is that this class is actually the union of many unrelated classes and thus might be difficult to learn, since the points will be scattered all over the paragraph vector space rather than existing in a nice cluster.
    2. Alternately, you might look at the values of the output neurons before you normalize using softmax, and output "unknown category" if the value for the most likely category doesn't exceed some threshold. You'll probably want to tune the threshold value to maximize accuracy over a validation set.

I can't guarantee that this will work, but it's the approach I would try first.



来源:https://stackoverflow.com/questions/53183467/nlpnatural-language-processing-how-to-detect-question-with-any-method

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!