I need my Java program to take a string like:
\"This is a sample sentence.\"
and turn it into a string array like:
{\"this\
string.replaceAll() doesn't correctly work with locale different from predefined. At least in jdk7u10.
This example creates a word dictionary from textfile with windows cyrillic charset CP1251
public static void main (String[] args) {
String fileName = "Tolstoy_VoinaMir.txt";
try {
List lines = Files.readAllLines(Paths.get(fileName),
Charset.forName("CP1251"));
Set words = new TreeSet<>();
for (String s: lines ) {
for (String w : s.split("\\s+")) {
w = w.replaceAll("\\p{Punct}","");
words.add(w);
}
}
for (String w: words) {
System.out.println(w);
}
} catch (Exception e) {
e.printStackTrace();
}