Is there an open source Java library/algorithm for finding if a particular piece of text is a question or not?
I am working on a question answering system that needs t
In a syntactic parse of a question, the correct structure will be in the form of:
(SBARQ (WH+ (W+) ...)
(SQ ...*
(V+) ...*)
(?))
So, using anyone of the syntactic parsers available, a tree with an SBARQ node having an embedded SQ (optionally) will be an indicator the input is a question. The WH+ node (WHNP/WHADVP/WHADJP) contains the question stem (who/what/when/where/why/how) and the SQ holds the inverted phrase.
i.e.:
(SBARQ
(WHNP
(WP What))
(SQ
(VBZ is)
(NP
(DT the)
(NN question)))
(. ?))
Of course, having a lot of preceeding clauses will cause errors in the parse (that can be worked around), as will really poorly-written questions. For example, the title of this post "How to find out if a sentence is a question?" will have an SBARQ, but not an SQ.