问题
I have a text file that contains multiple reports in it. Each report starts with a literal "REPORT ID" and have a specific value i.e ABCD. For simple case, I want to extract data of only those reports which have their value ABCD for example. And for complexity, I want to extract data of only those reports which have TAG1 value (2nd line)as 1000375351 and report value is same as ABCD.
I have done it using traditional way. My decideAndExtract(String line)
function have the required logic. But how can I use Java 9 streams takeWhile and dropWhile methods to efficiently deal with it?
try (Stream<String> lines = Files.lines(filePath)) {
lines.forEach(this::decideAndExtract);
}
Sample text file data:
REPORT ID: ABCD
TAG1: 1000375351 PR
DATA1: 7399910002 T
DATA2: 4754400002 B
DATA3 : 1000640
Some Lines Here
REPORT ID: WXYZ
TAG1: 1000375351 PR
DATA1: 7399910002 T
DATA2: 4754400002 B
DATA3 : 1000640
Some Lines Here
REPORT ID: ABCD
TAG1: 1000375351 PR
DATA1: 7399910002 T
DATA2: 4754400002 B
DATA3 : 1000640
Some Lines Here
回答1:
It seems to be a common anti-pattern to go for Files.lines
, whenever a Stream
over a file is needed, regardless of whether processing individual lines is actually needed.
The first tool of your choice, when pattern matching over a file is needed, should be Scanner:
Pattern p = Pattern.compile(
"REPORT ID: ABCD\\s*\\R"
+"TAG1\\s*:\\s*(.*?)\\R"
+"DATA1\\s*:\\s*(.*?)\\R"
+"DATA2\\s*:\\s*(.*?)\\R"
+"DATA3\\s*:\\s*(.*?)\\R"); // you can keep this in a static final field
try(Scanner sc = new Scanner(filePath, StandardCharsets.UTF_8);
Stream<MatchResult> st = sc.findAll(p)) {
st.forEach(mr -> System.out.println("found tag1: " + mr.group(1)
+ ", data: "+String.join(", ", mr.group(2), mr.group(3), mr.group(4))));
}
It's easy to adapt the pattern, i.e. use
Pattern p = Pattern.compile(
"REPORT ID: ABCD\\s*\\R"
+"TAG1: (1000375351 PR)\\R"
+"DATA1\\s*:\\s*(.*?)\\R"
+"DATA2\\s*:\\s*(.*?)\\R"
+"DATA3\\s*:\\s*(.*?)\\R"); // you can keep this in a static final field
as pattern to fulfill your more complex criteria.
But you could also provide arbitrary filter conditions in the Stream:
Pattern p = Pattern.compile(
"REPORT ID: (.*?)\\s*\\R"
+"TAG1: (.*?)\\R"
+"DATA1\\s*:\\s*(.*?)\\R"
+"DATA2\\s*:\\s*(.*?)\\R"
+"DATA3\\s*:\\s*(.*?)\\R"); // you can keep this in a static final field
try(Scanner sc = new Scanner(filePath, StandardCharsets.UTF_8);
Stream<MatchResult> st = sc.findAll(p)) {
st.filter(mr -> mr.group(1).equals("ABCD") && mr.group(2).equals("1000375351 PR"))
.forEach(mr -> System.out.println(
"found data: " + String.join(", ", mr.group(3), mr.group(4), mr.group(5))));
}
allowing more complex constructs than the equals
calls of the example. (Note that the group numbers changed for this example.)
E.g., to support a variable order of the data items after the “REPORT ID”, you can use
Pattern p = Pattern.compile("REPORT ID: (.*?)\\s*\\R(((TAG1|DATA[1-3])\\s*:.*?\\R){4})");
Pattern nl = Pattern.compile("\\R"), sep = Pattern.compile("\\s*:\\s*");
try(Scanner sc = new Scanner(filePath, StandardCharsets.UTF_8);
Stream<MatchResult> st = sc.findAll(p)) {
st.filter(mr -> mr.group(1).equals("ABCD"))
.map(mr -> nl.splitAsStream(mr.group(2))
.map(s -> sep.split(s, 2))
.collect(Collectors.toMap(a -> a[0], a -> a[1])))
.filter(map -> "1000375351 PR".equals(map.get("TAG1")))
.forEach(map -> System.out.println("found data: " + map));
}
findAll
is available in Java 9, but if you have to support Java 8, you can use the findAll
implementation of this answer.
回答2:
dropWhile
and takeWhile
don't work the way you expect. They keep either dropping or processing elements of the stream until the condition is not met any more for one single element.
If you need to check a condition on all elements and choose only some of them, you should use Stream.filter instead.
回答3:
You can do the search in two steps:
First create list of all reports as a List of String. In below code there was used an indicator to split reaports entries.
String newReportIndicator = "=====";
List<String> reports = Arrays.asList(lines
.reduce("", (a, l) -> {
return a +
((l.startsWith("REPORT ID: ")) ? newReportIndicator : "") +
l + System.lineSeparator();
}).split(newReportIndicator));
After that execute the filtering according your conditions.
The main method that filter:
List<String> reportsToFind = reports
.stream().filter(r -> {
List<String> list = Arrays.asList(r.split(System.lineSeparator()));
String header = list.get(0).trim();
return (header.endsWith("ABCD")
&& list.stream().filter(l ->
l.startsWith("TAG1:") && l.endsWith("1000375351 PR")
).count() == 1
);
})
.collect(Collectors.toList());
来源:https://stackoverflow.com/questions/57332614/java-9-takewhile-and-dropwhile-to-read-and-skip-certain-lines