Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Is there an open source Java library/algorithm for finding if a particular piece of text is a question or not?
I am working on a question answering system that needs to analyze if the text input by user is a question.
I think the problem can probably be solved by using opensource NLP libraries but its obviously more complicated than simple part of speech tagging. So if someone can instead tell the algorithm for it by using an existing opensource NLP library, that would be good too.
Also let me know if you know a library/toolkit that uses data mining to solve this problem. Although it will be difficult to get sufficient data for training purposes, I will be able to use stack exchange data for training.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
135 views
Welcome To Ask or Share your Answers For Others

1 Answer

In a syntactic parse of a question, the correct structure will be in the form of:

(SBARQ (WH+ (W+) ...)
       (SQ ...*
           (V+) ...*)
       (?))

So, using anyone of the syntactic parsers available, a tree with an SBARQ node having an embedded SQ (optionally) will be an indicator the input is a question. The WH+ node (WHNP/WHADVP/WHADJP) contains the question stem (who/what/when/where/why/how) and the SQ holds the inverted phrase.

i.e.:

(SBARQ 
  (WHNP 
    (WP What)) 
  (SQ 
    (VBZ is) 
    (NP 
      (DT the) 
      (NN question)))
  (. ?))

Of course, having a lot of preceeding clauses will cause errors in the parse (that can be worked around), as will really poorly-written questions. For example, the title of this post "How to find out if a sentence is a question?" will have an SBARQ, but not an SQ.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...