I have this text file that I read into a Java application and then count the words in it line by line. Right now I am splitting the lines into words by a
St
Try:
line.split("[\\.,\\s!;?:\"]+");
or "[\\.,\\s!;?:\"']+"
This is an or match of one of these characters: ., !;?:"' (note that there is a space in there but no / or \) the + causes several chars together to be counted as one.
That should give you a mostly sufficient accuracy.
More precise regexes would need more information about the type of text you need to parse, because ' can be a word delimiter as well. Mostly the most punctuation word delimiters are around a whitespace so matching on [\\s]+ would be a close approximation as well. (but gives the wrong count on short quotations like: She said:"no".)