Grouping duplicates in CSV file and ranking data based on certain values

问题

I have a CSV file like so -

"user_id","age","liked_ad","location"
2145,34,true,USA
6786,25,true,UK
9025,21,false,USA
1145,40,false,UK

The csv file goes on. I worked out that there are duplicate user_id's within the file and so what I am trying to do is find out which users have the most 'true' answers for the 'liked_ads' column. I am super stuck on how to do this in Java and would appreciate any help.

This is what I have so far to literally just parse the file -

    public static void main(String[] args) throws FileNotFoundException
    {
        Scanner scanner = new Scanner(new File("src/main/resources/advert-data.csv"));

        scanner.useDelimiter(",");
        
        while (scanner.hasNext()) {
            System.out.print(scanner.next() + " | ");
        }

        scanner.close();
    }

I'm stuck on where to go from here in order to achieve what I am trying to achieve.

回答1:

You can store the frequency of true value of liked_ad for each user_id in a Map<String, Integer> map and then sort the Map on values.

import java.io.File;
import java.io.IOException;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

public class Main {
    public static void main(String[] args) throws IOException {
        Scanner scanner = new Scanner(new File("file.txt"));

        // Ignore the header line
        if (scanner.hasNextLine()) {
            scanner.nextLine();
        }

        // Store the frequency of liked_ad for each user_id
        Map<String, Integer> map = new HashMap<>();
        while (scanner.hasNextLine()) {
            String[] data = scanner.nextLine().split(",");
            if (data.length >= 3 && Boolean.parseBoolean(data[2])) {
                map.merge(data[0], 1, Integer::sum);
            }
        }

        // Sort the Map on values and display each entry
        map.entrySet().stream().sorted(Collections.reverseOrder(Map.Entry.comparingByValue()))
                .forEach(System.out::println);

    }
}

Given the following data in the file:

"user_id","age","liked_ad","location"
1145,40,true,UK
2145,34,true,USA
6786,25,true,UK
6786,25,true,UK
1145,40,true,UK
2145,34,true,USA
9025,21,false,USA
1145,40,false,UK
1145,40,true,UK

the output will be

1145=3
6786=2
2145=2

回答2:

Following code should do what you want to achive:

public static void main(String[] args) throws IOException {

    SortedMap<String, Integer> stats = new TreeMap<>(Collections.reverseOrder());

    Files.readAllLines(Paths.get(args[0])).forEach((line) -> {
        String[] columns = line.split(",");
        if (Boolean.valueOf(columns[2])) {
            stats.compute(columns[0], (key, value) -> value == null ? 1 : value + 1);
        }
    });
    
    for (Entry<String, Integer> entry : stats.entrySet()) {
        System.out.println(entry.getKey() + ": " + entry.getValue());
    }
}

来源：https://stackoverflow.com/questions/65062471/grouping-duplicates-in-csv-file-and-ranking-data-based-on-certain-values

标签

java

csv