问题
I'm looking for a small code snippet that will find and detect in (a) line(s) in file and alert user that the line(or lines) include(s) unacceptable entries
but could not find.
So for example I have in a file following:
myFile.txt:
Field1,Field2,Field3,Field4,Field5,Field6,Field7
a,b,a,d,e,f,g
h,i,h,i,h,ff,f27
f31,f32,f33,f34,f35,f36,f37
f41,f42,f43,f44,f45,f46,f47
f51,f52,f53,f54,f55,f56,f57
f61,f62,a,b,a,f66,f67
f71,f72,f73,f74,f75,f76,f77
f81,f82,f83,f84,f85,f86,f87
f91,f92,f93,f94,f95,f96,f97
f101,f102,f103,f104,f105,f106,f107
f111,f112,f113,f114,f115,f116,f117
f121,f122,f123,f124,f125,f126,f127
f131,f132,f133,f134,f135,f136,f137
f141,f142,f143,f144,f145,f146,f147
f151,f152,f153,f154,f155,f156,f157
f161,a,b,a,f165,f166,f167
i,h,ff,f174,f175,f176,f177
f181,f182,f183,f184,f185,f186,f187
f191,f192,f193,f194,f195,f196,f197
f201,f202,f203,f204,f205,f206,f207
f211,f212,f213,f214,f215,f216,f217
f221,f222,f223,f224,f225,f226,f227
f231,f232,f233,f234,f235,f236,f237
f241,f242,f243,f244,f245,f246,f247
f251,f252,f253,f254,f255,f256,f257
f261,f262,f263,f264,f265,f266,f267
f271,f272,f273,f274,f275,f276,f277
f281,f282,f283,i,h,ff,f287
fn1,fn2,fn3,fn4,fn5,fn6,fn7
f301,f302,f303,f304,f305,f306,f307
ALL VALUES ON TXT FILE ARE TREATED AS STRINGS.
unacceptable entries
unacceptable entrie in a line(or lines) are the lines that include a fi,j where a tuple [fi,(j-1), fi,j ,fi,j+1] existed already before or after in the txt file. i.e for a targeted field X detect if the field on the left XL and the field on the right XR don't match on any previous field in the txt file and hence if It matches we have to output: the filed X on the line Number is problematic because is the Tuple [XL,X,XR] is already defined on the previous Line number
and we diplay :
- all The lines that will cause a conflict: That means,
+ The previous Line (that first occurence will be accepted on txt file
reading) and
+ The problematic Lines(that follow The previous Line on txt file reading
and hence would be ignored)
- The row number for accepted first occurence Tuple but accepted
- The eventually row numbers for Not accepted Tuples that would be ignored
- The Tuples [XL,X,XR] that cause the problem.
Example:
Field1;Field2;Field3;Field4;Field5;Field6;Field7<--------Headers
a;b;a;d;e;f;g
h;i;h;i;h;ff;f27
f31;f32;f33;f34;f35;f36;f37
f41;f42;f43;f44;f45;f46;f47
f51;f52;f53;f54;f55;f56;f57
f61;f62;a;b;a;f66;f67
............................
f161;a;b;a;f165;f166;f167
i;h;ff;f174;f175;f176;f177
...........................
f281;f282;f283;i;h;ff;f287
fn1;fn2;fn3;fn4;fn5;fn6;fn7
It will display :
[a;b;a], accepetd on line 1 but rejected on lines: 6,16
Line accepted is : a;b;a;d;e;f;g
Line(s) rejected are: f61;f62;a;b;a;f66;f67
f161;a;b;a;f165;f166;f167
[h;i;h], Not accepted at all. rejected on lines: 2
Line accepted is: empty
Lines rejected : h;i;h;i;h;ff;f27
[i;h;ff],Not accepted at all. rejected on lines: 2,17,28
Line accepted is: empty
Lines rejected :
h;i;h;i;h;ff;f27
i;h;ff;f174;f175;f176;f177
f281;f282;f283;i;h;ff;f287
N.B: Not accepted at all will be displayed if the list of accepted Line is empty i.e when the problem occurs at the same line.
Any advice,help is welcome.
Update
I gave an answer.
Thank You very much.
回答1:
This is sort of the point of objects. You should create an object model that reflects the things you are working with.
So first You would create a class, something like this
public class SeptTuple {
public final String field1, field2, ..., field7
public SeptTuple(String f1, String f2, ..., String f7) {
field1 = f1;
...
field7 = f7;
}
@Override
public boolean equals(Object o) {
if(!(o instanceof SeptTuple))
return false;
SeptTuple s = (SeptTuple)o;
return Objects.equals(field1, s.field1) && Objects.equals(field2, s.field2) && ... && Objects.equals(field7, s.field7)
}
@Override
public int hashcode() {
// If 2 objects are equal, they must return the same hashcode
return Objects.hash(field1, field2, ..., field7);
}
}
And then once you make that, finding dupes is as easy as
Map<SeptTuple, SeptTuple> map = new HashMap<>();
....
// If already set, map will return the old value on put
SeptTuple temp = map.put(newSetTuple, newSetTuple);
if(temp != null) {
// handle clash
}
If you need to find equal parts in subsets of each row, than break this solution down into as many objects as you need to accurately represent each element of the tuple. (You will need to make 3 classes to represent each part of your tuple.)
回答2:
Here is a solution that could be used if needed.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Objects;
import java.util.Set;
import java.util.TreeSet;
import java.io.BufferedReader;
public class TextFileProgram {
private static <T> Set<T> findDuplicates(Collection<T> list) {
Set<T> duplicates = new LinkedHashSet<T>();
Set<T> uniques = new HashSet<T>();
for(T t : list) {
if(!uniques.add(t)) {
duplicates.add(t);
}
}
return duplicates;
}
private static boolean hasDuplicates(HashMap<Integer, List<String>> datamap) {
boolean status = false;
Set valueset = new HashSet(datamap.values());
if(datamap.values().size() != valueset.size()) {
status = true;
}
else {
status = false;
}
return status;
}
static HashMap<Integer, List<String>> findTriplets(ArrayList<Line> data) {
HashMap<Integer, List<String>> hm = new HashMap<Integer, List<String>>();
int j = 0;
for(int i = 0; i < data.size(); i++) {
String line = data.get(i).toString();
String[] arr = line.split(",");
final int L = arr.length;
final int K = 3;
List<String> list = new ArrayList<String>(Arrays.asList(arr));
list.addAll(list.subList(0, K - 1));
for(int z = 0; z < L - 2; z++) {
hm.put(j, list.subList(z, z + K));
j++;
}
}
return hm;
}
public static <T, E> Set<T> getKeysByValue(Map<T, E> map, E value) {
Set<T> keys = new HashSet<T>();
for(Entry<T, E> entry : map.entrySet()) {
if(Objects.equals(value, entry.getValue())) {
keys.add(entry.getKey());
}
}
return keys;
}
public static boolean getDataFromFile() {
ArrayList<Line> data = new ArrayList<Line>();
FileInputStream fis = null;
BufferedReader br = null;
boolean done = false;
String result1 = "";
String line = "";
String result2 = "";
try {
File mFile = new File("C:\\siebog-master\\maven-demo\\" + "TestTuples.txt");
fis = new FileInputStream(mFile);
br = new BufferedReader(new InputStreamReader(fis));
int iteration = 0;
while((line = br.readLine()) != null) {
if(iteration < 1) {
iteration++;
result1 = result1 + line + System.getProperty("line.separator");
continue;
}
String[] pair = line.split(",");
data.add(new Line(pair[0], pair[1], pair[2], pair[3], pair[4], pair[5], pair[6]));
}
HashMap<Integer, List<String>> hm = findTriplets(data);
boolean isContainingDuplicates = hasDuplicates(hm);
if(isContainingDuplicates) {
Collection<List<String>> valuesList = hm.values();
Set<List<String>> set = findDuplicates(valuesList);
Set<Integer> setOfAlreadyRejected = new HashSet<Integer>();
for(List<String> li : set) {
Set<String> setToTestForDuplicate = new HashSet<String>(li);
Set<Integer> myKeySet = getKeysByValue(hm, li);
int index = 0;
boolean allreadyDone = false;
ArrayList<Integer> sortedList = new ArrayList(myKeySet);
Collections.sort(sortedList);
for(Integer key : myKeySet) {
if(index == 0) {
String value = hm.get(key).toString();
System.out.print(value);
}
index++;
if(setToTestForDuplicate.size() < li.size() && !allreadyDone) {
System.out.print(", Not accepted at all. rejected on lines: ");
System.out.println((key / 5 + 2) + " ");// number of rejected
setOfAlreadyRejected.add(key / 5 + 2);// added to set of rejected
System.out.println("Line accepted is: empty");
System.out.print("Line rejected :");
System.out.println(" " + data.get(key / 5));
allreadyDone = true;
break;
}
else if(set.size() >= li.size() || allreadyDone) {
int z = 0;
for(Integer s : sortedList) {
boolean blnAlreadyExistsOnSetOfRejected = false;
if(z == 0) {
blnAlreadyExistsOnSetOfRejected = setOfAlreadyRejected.contains((Integer.valueOf(s) / 5 + 2));
if(blnAlreadyExistsOnSetOfRejected) {
System.out.print(" , Not accepetd on line ");
System.out.print(" " + (Integer.valueOf(s) / 5 + 2)
+ " because already rejected on the same line ");
System.out.println(" " + (Integer.valueOf(s) / 5 + 2) + " ");
System.out.print("Line rejected : ");
System.out.println(" " + data.get(s / 5));
}
else {
System.out.print(" , accepetd on line ");
System.out.print(" " + (Integer.valueOf(s) / 5 + 2) + " rejected on lines: ");
}
}
else {
System.out.println(" " + (Integer.valueOf(s) / 5 + 2) + " ");
System.out.print("Line rejected : ");
System.out.println(" " + data.get(s / 5));
}
z++;
}
System.out.println();
break;
}
}
System.out.println();
}
}
}
catch(FileNotFoundException ex) {
}
catch(IOException ex) {
}
catch(NullPointerException ex) {
}
finally {
try {
fis.close();
br.close();
done = true;
}
catch(IOException ex) {
}
}
return done;
}
public static void main(String[] args) {
getDataFromFile();
}
}
public class Line implements Comparable<Line> {
private String fieldOne;
private String fieldTwo;
private String fieldThree;
private String fieldFour;
private String fieldFive;
private String fieldSix;
private String fieldSeven;
public Line(String fieldOne,
String fieldTwo,
String fieldThree,
String fieldFour,
String fieldFive,
String fieldSix,
String fieldSeven) {
super();
this.fieldOne = fieldOne;
this.fieldThree = fieldThree;
this.fieldFive = fieldFive;
this.fieldSix = fieldSix;
this.fieldFour = fieldFour;
this.fieldTwo = fieldTwo;
this.fieldSeven = fieldSeven;
}
public Line(String fieldOne) {
super();
this.fieldOne = fieldOne;
this.fieldThree = "";
this.fieldFive = "";
this.fieldSix = "";
this.fieldFour = "";
this.fieldTwo = "";
this.fieldSeven = "";
}
public Line(String fieldOne, String fieldTwo) {
super();
this.fieldOne = fieldOne;
this.fieldThree = "";
this.fieldFive = "";
this.fieldSix = "";
this.fieldFour = "";
this.fieldTwo = fieldTwo;
this.fieldSeven = "";
}
public Line(String fieldOne, String fieldTwo, String fieldThree) {
super();
this.fieldOne = fieldOne;
this.fieldThree = fieldThree;
this.fieldFive = "";
this.fieldSix = "";
this.fieldFour = "";
this.fieldTwo = fieldTwo;
this.fieldSeven = "";
}
public Line(String fieldOne, String fieldTwo, String fieldThree, String fieldFour) {
super();
this.fieldOne = fieldOne;
this.fieldThree = fieldThree;
this.fieldFive = "";
this.fieldSix = "";
this.fieldFour = fieldFour;
this.fieldTwo = fieldTwo;
this.fieldSeven = "";
}
public Line(String fieldOne, String fieldTwo, String fieldThree, String fieldFour, String fieldFive) {
super();
this.fieldOne = fieldOne;
this.fieldThree = fieldThree;
this.fieldFive = fieldFive;
this.fieldSix = "";
this.fieldFour = fieldFour;
this.fieldTwo = fieldTwo;
this.fieldSeven = "";
}
public Line(String fieldOne,
String fieldTwo,
String fieldThree,
String fieldFour,
String fieldFive,
String fieldSix) {
super();
this.fieldOne = fieldOne;
this.fieldThree = fieldThree;
this.fieldFive = fieldFive;
this.fieldSix = fieldSix;
this.fieldFour = fieldFour;
this.fieldTwo = fieldTwo;
this.fieldSeven = "";
}
public String getFieldOne() {
return fieldOne;
}
public void setFieldOne(String fieldOne) {
this.fieldOne = fieldOne;
}
public String getFieldTwo() {
return fieldTwo;
}
public void setFieldTwo(String fieldTwo) {
fieldTwo = fieldTwo;
}
public String getFieldThree() {
return fieldThree;
}
public void setFieldThree(String fieldThree) {
this.fieldThree = fieldThree;
}
public String getFieldFour() {
return fieldFour;
}
public void setCity(String fieldFour) {
fieldFour = fieldFour;
}
public String getFieldFive() {
return fieldFive;
}
public void setFieldFive(String fieldFive) {
this.fieldFive = fieldFive;
}
public String getFieldSix() {
return fieldSix;
}
public void setFieldSix(String fieldSix) {
fieldSix = fieldSix;
}
public String getFieldSeven() {
return fieldSeven;
}
public void setDetail(String fieldSeven) {
fieldSeven = fieldSeven;
}
// Easy to print and show the row data
@Override
public String toString() {
if(fieldTwo == null || fieldTwo.isEmpty())
return fieldOne;
else if(fieldThree == null || fieldThree.isEmpty())
return fieldOne + "," + fieldTwo;
else if(fieldFour == null || fieldFour.isEmpty())
return fieldOne + "," + fieldTwo + "," + fieldThree;
else if(fieldFive == null || fieldFive.isEmpty())
return fieldOne + "," + fieldTwo + "," + fieldThree + "," + fieldFour;
else if(fieldSix == null || fieldSix.isEmpty())
return fieldOne + "," + fieldTwo + "," + fieldThree + "," + fieldFour + "," + fieldFive;
else if(fieldSeven == null || fieldSeven.isEmpty())
return fieldOne + "," + fieldTwo + "," + fieldThree + "," + fieldFour + "," + fieldFive + "," + fieldSix;
else
return fieldOne + "," + fieldTwo + "," + fieldThree + "," + fieldFour + "," + fieldFive + "," + fieldSix + ","
+ fieldSeven;
}
// sort based on column "fieldOne"
@Override
public int compareTo(Line o) {
return this.fieldOne.compareTo(o.fieldOne);
}
}
TestTuples.txt for test
Field1,Field2,Field3,Field4,Field5,Field6,Field7
a,b,a,d,e,f,g
h,i,h,i,h,ff,f27
f31,f32,f33,f34,f35,f36,f37
f41,f42,f43,f44,f45,f46,f47
f51,f52,f53,f54,f55,f56,f57
f61,f62,a,b,a,f66,f67
f71,f72,f73,f74,f75,f76,f77
f81,f82,f83,f84,f85,f86,f87
f91,f92,f93,f94,f95,f96,f97
f101,f102,f103,f104,f105,f106,f107
f111,f112,f113,f114,f115,f116,f117
f121,f122,f123,f124,f125,f126,f127
f131,f132,f133,f134,f135,f136,f137
f141,f142,f143,f144,f145,f146,f147
f151,f152,f153,f154,f155,f156,f157
f161,a,b,a,f165,f166,f167
i,h,ff,f174,f175,f176,f177
f181,f182,f183,f184,f185,f186,f187
f191,f192,f193,f194,f195,f196,f197
f201,f202,f203,f204,f205,f206,f207
f211,f212,f213,f214,f215,f216,f217
f221,f222,f223,f224,f225,f226,f227
f231,f232,f233,f234,f235,f236,f237
f241,f242,f243,f244,f245,f246,f247
f251,f252,f253,f254,f255,f256,f257
f261,f262,f263,f264,f265,f266,f267
f271,f272,f273,f274,f275,f276,f277
f281,f282,f283,i,h,ff,f287
fn1,fn2,fn3,fn4,fn5,fn6,fn7
f301,f302,f303,f304,f305,f306,f307
OUTPUT
[h, i, h], Not accepted at all. rejected on lines: 3
Line accepted is: empty
Line rejected : h,i,h,i,h,ff,f27
[a, b, a], Not accepted at all. rejected on lines: 2
Line accepted is: empty
Line rejected : a,b,a,d,e,f,g
[i, h, ff] , Not accepetd on line 3 because already rejected on the same line 3
Line rejected : h,i,h,i,h,ff,f27
18
Line rejected : i,h,ff,f174,f175,f176,f177
29
Line rejected : f281,f282,f283,i,h,ff,f287
来源:https://stackoverflow.com/questions/51428372/detect-repeated-tuples-fi-j-1-fi-j-fi-j1-on-txt-file-using-java