As part of a project I\'m working on, I\'d like to clean up a file I generate of duplicate line entries. These duplicates often won\'t occur near each other, however. I came
A similar approach
public void stripDuplicatesFromFile(String filename) {
IOUtils.writeLines(
new LinkedHashSet<String>(IOUtils.readLines(new FileInputStream(filename)),
"\n", new FileOutputStream(filename + ".uniq"));
}
void deleteDuplicates(File filename) throws IOException{
@SuppressWarnings("resource")
BufferedReader reader = new BufferedReader(new FileReader(filename));
Set<String> lines = new LinkedHashSet<String>();
String line;
String delims = " ";
System.out.println("Read the duplicate contents now and writing to file");
while((line=reader.readLine())!=null){
line = line.trim();
StringTokenizer str = new StringTokenizer(line, delims);
while (str.hasMoreElements()) {
line = (String) str.nextElement();
lines.add(line);
BufferedWriter writer = new BufferedWriter(new FileWriter(filename));
for(String unique: lines){
writer.write(unique+" ");
}
writer.close();
}
}
System.out.println(lines);
System.out.println("Duplicate removal successful");
}