Independent clause boundary disambiguation, and independent clause segmentation – any tools to do this?

岁酱吖の 提交于 2019-12-04 05:29:32

To the best of my knowledge, there is no readily available tool to solve this exact problem. Usually, NLP systems do not get into the problem of identifying different types of sentences and clauses as defined by English grammar. There is one paper published in EMNLP which provides an algorithm which uses the SBAR tag in parse trees to identify independent and dependent clauses in a sentence.

You should find section 3 of this paper useful. It talks about English language syntax in some details, but I don't think the entire paper is relevant to your question.

Note that they have used the Berkeley parser (demo available here), but you can obviously any other constituency parsing tool (e.g. the Stanford parser demo available here).

Jeff Kang

Chthonic Project gives some good information here:

Clause Extraction using Stanford parser

Part of the answer:

It is probably better if you primarily use the constituenty-based parse tree, and not the dependencies.

The clauses are indicated by the SBAR tag, which is a clause introduced by a (possibly empty) subordinating conjunction.

All you need to do is the following:

  1. Identify the non-root clausal nodes in the parse tree
  2. Remove (but retain separately) the subtrees rooted at these clausal nodes from the main tree.
  3. In the main tree (after removal of subtrees in step 2), remove any hanging prepositions, subordinating conjunctions and adverbs.

For a list of all clausal tags (and, in fact, all Penn Treebank tags), see this list: http://www.surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html

For an online parse-tree visualization, you may want to use the online Berkeley parser demo.

It helps a lot in forming a better intuition.

Here's the image generated for your example sentence:

I don't know any tools that do clause segmentation, but in rhetorical structure theory, there is a concept called "elementary discourse unit" which work in a similar way as a clause. They are sometimes, however, slightly smaller than clauses.

You may see the section 2.0 of this manual for more information about this concept:

https://www.isi.edu/~marcu/discourse/tagging-ref-manual.pdf

There are some software available online that can segment sentence into their elementary discourse unit , for instance:

http://alt.qcri.org/tools/discourse-parser/

and

https://github.com/jiyfeng/DPLP

Via user YourWelcomeOrMine from the subreddit /r/LanguageTechnology/:

“I would check out Stanford's CoreNLP. I believe you can customize how a sentence is broken up.”

Via user Breakthrough from Superuser:

I've found different classifiers using the NPS Chat Corpus training set to be very effective for a similar application.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!