How to parse text into sentences

后端 未结 7 1508
星月不相逢
星月不相逢 2020-12-06 07:31

I\'m trying to break up a paragraph into sentences. Here is my code so far:

import java.util.*;

public class StringSplit {
 public static void main(String a         


        
7条回答
  •  孤城傲影
    2020-12-06 08:04

    The problem you mentioned is a NLP (Natural Language Processing) problem. It is fine to write a crude rule engine but it might not scale up to support full english text.

    To have a deeper insight and a java library check out this link http://nlp.stanford.edu/software/lex-parser.shtml , http://nlp.stanford.edu:8080/parser/index.jsp and similar question for ruby language How do you parse a paragraph of text into sentences? (perferrably in Ruby)

    for example : The text -

    The outcome of the negotiations is vital, because the current tax levels signed into law by President George W. Bush expire on Dec. 31. Unless Congress acts, tax rates on virtually all Americans who pay income taxes will rise on Jan. 1. That could affect economic growth and even holiday sales.

    after tagging becomes :

    The/DT outcome/NN of/IN the/DT negotiations/NNS is/VBZ vital/JJ ,/, because/IN the/DT current/JJ tax/NN levels/NNS signed/VBN into/IN law/NN by/IN President/NNP George/NNP W./NNP Bush/NNP expire/VBP on/RP Dec./NNP 31/CD ./. Unless/IN Congress/NNP acts/VBZ ,/, tax/NN rates/NNS on/IN virtually/RB all/RB Americans/NNPS who/WP pay/VBP income/NN taxes/NNS will/MD rise/VB on/IN Jan./NNP 1/CD ./. That/DT could/MD affect/VB economic/JJ growth/NN and/CC even/RB holiday/NN sales/NNS ./. Parse

    Check how it has distinguished the full stop (.) and the period after Dec. 31 ...

提交回复
热议问题