问题
I have a CSV file like so -
"user_id","age","liked_ad","location"
2145,34,true,USA
6786,25,true,UK
9025,21,false,USA
1145,40,false,UK
The csv file goes on. I worked out that there are duplicate user_id's within the file and so what I am trying to do is find out which users have the most 'true' answers for the 'liked_ads' column. I am super stuck on how to do this in Java and would appreciate any help.
This is what I have so far to literally just parse the file -
public static void main(String[] args) throws FileNotFoundException
{
Scanner scanner = new Scanner(new File("src/main/resources/advert-data.csv"));
scanner.useDelimiter(",");
while (scanner.hasNext()) {
System.out.print(scanner.next() + " | ");
}
scanner.close();
}
I'm stuck on where to go from here in order to achieve what I am trying to achieve.
回答1:
You can store the frequency of true value of liked_ad for each user_id in a Map<String, Integer> map and then sort the Map on values.
import java.io.File;
import java.io.IOException;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws IOException {
Scanner scanner = new Scanner(new File("file.txt"));
// Ignore the header line
if (scanner.hasNextLine()) {
scanner.nextLine();
}
// Store the frequency of liked_ad for each user_id
Map<String, Integer> map = new HashMap<>();
while (scanner.hasNextLine()) {
String[] data = scanner.nextLine().split(",");
if (data.length >= 3 && Boolean.parseBoolean(data[2])) {
map.merge(data[0], 1, Integer::sum);
}
}
// Sort the Map on values and display each entry
map.entrySet().stream().sorted(Collections.reverseOrder(Map.Entry.comparingByValue()))
.forEach(System.out::println);
}
}
Given the following data in the file:
"user_id","age","liked_ad","location"
1145,40,true,UK
2145,34,true,USA
6786,25,true,UK
6786,25,true,UK
1145,40,true,UK
2145,34,true,USA
9025,21,false,USA
1145,40,false,UK
1145,40,true,UK
the output will be
1145=3
6786=2
2145=2
回答2:
Following code should do what you want to achive:
public static void main(String[] args) throws IOException {
SortedMap<String, Integer> stats = new TreeMap<>(Collections.reverseOrder());
Files.readAllLines(Paths.get(args[0])).forEach((line) -> {
String[] columns = line.split(",");
if (Boolean.valueOf(columns[2])) {
stats.compute(columns[0], (key, value) -> value == null ? 1 : value + 1);
}
});
for (Entry<String, Integer> entry : stats.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
来源:https://stackoverflow.com/questions/65062471/grouping-duplicates-in-csv-file-and-ranking-data-based-on-certain-values