Splitting strings through regular expressions by punctuation and whitespace etc in java

后端 未结 4 1152
感动是毒
感动是毒 2020-12-01 12:28

I have this text file that I read into a Java application and then count the words in it line by line. Right now I am splitting the lines into words by a

St         


        
4条回答
  •  感情败类
    2020-12-01 12:48

    Try:

    line.split("[\\.,\\s!;?:\"]+");
    or         "[\\.,\\s!;?:\"']+"
    

    This is an or match of one of these characters: ., !;?:"' (note that there is a space in there but no / or \) the + causes several chars together to be counted as one.

    That should give you a mostly sufficient accuracy. More precise regexes would need more information about the type of text you need to parse, because ' can be a word delimiter as well. Mostly the most punctuation word delimiters are around a whitespace so matching on [\\s]+ would be a close approximation as well. (but gives the wrong count on short quotations like: She said:"no".)

提交回复
热议问题