Merge CSV files into a single file with no repeated headers

后端 未结 5 1498
挽巷
挽巷 2020-12-30 10:39

I have some CSV files with the same column headers. For example

File A

header1,header2,header3
one,two,three
four,five,six

File B

相关标签:
5条回答
  • 2020-12-30 10:46

    This should work. It checks if the file being merged have matching headers. Would throw an exception otherwise. Exception handling (to close the streams etc.) has been left as an exercise.

    String[] headers = null;
    String firstFile = "/path/to/firstFile.dat";
    Scanner scanner = new Scanner(new File(firstFile));
    
    if (scanner.hasNextLine())
        headers[] = scanner.nextLine().split(",");
    
    scanner.close();
    
    Iterator<File> iterFiles = listOfFilesToBeMerged.iterator();
    BufferedWriter writer = new BufferedWriter(new FileWriter(firstFile, true));
    
    while (iterFiles.hasNext()) {
      File nextFile = iterFiles.next();
      BufferedReader reader = new BufferedReader(new FileReader(nextFile));
    
      String line = null;
      String[] firstLine = null;
      if ((line = reader.readLine()) != null)
        firstLine = line.split(",");
    
      if (!Arrays.equals (headers, firstLine))
        throw new FileMergeException("Header mis-match between CSV files: '" +
                  firstFile + "' and '" + nextFile.getAbsolutePath());
    
      while ((line = reader.readLine()) != null) {
        writer.write(line);
        writer.newLine();
      }
    
      reader.close();
    }
    writer.close();
    
    0 讨论(0)
  • 2020-12-30 10:49

    Before:

    idFile#x_y.csv

    After:

    idFile.csv

    For example:

    100#1_2.csv + 100#2_2.csv > 100.csv

    100#1_2.csv contains:

    "one","two","three"
    "a","b","c"
    "d","e","f"
    

    100#2_2.csv contains:

    "one","two","three"
    "g","h","i"
    "j","k","l"
    

    100.csv contains:

    "one","two","three"
    "a","b","c"
    "d","e","f"    
    "g","h","i"
    "j","k","l"
    

    Source:

    //MergeDemo.java
    import java.io.BufferedReader;
    import java.io.BufferedWriter;
    import java.io.File;
    import java.io.FileNotFoundException;
    import java.io.FileReader;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.util.ArrayList;
    //import java.util.Arrays;
    import java.util.Iterator;
    import java.util.Scanner;
    
    public class MergeDemo {
    
        public static void main(String[] args) {
    
            String idFile = "100";
            int numFiles = 3;
    
            try {
                mergeCsvFiles(idFile, numFiles);
            } catch (IOException e) {
                e.printStackTrace();
            }
    
        }
    
        private static void mergeCsvFiles(String idFile, int numFiles) throws IOException {
    
            // Variables
            ArrayList<File> files = new ArrayList<File>();
            Iterator<File> iterFiles;
            File fileOutput;
            BufferedWriter fileWriter;
            BufferedReader fileReader;
            String csvFile;
            String csvFinal = "C:\\out\\" + idFile + ".csv";
            String[] headers = null;
            String header = null;
    
            // Files: Input
            for (int i = 1; i <= numFiles; i++) {
                csvFile = "C:\\in\\" + idFile + "#" + i + "_" + numFiles + ".csv";
                files.add(new File(csvFile));
            }
    
            // Files: Output
            fileOutput = new File(csvFinal);
            if (fileOutput.exists()) {
                fileOutput.delete();
            }
            try {
                fileOutput.createNewFile();
                // log
                // System.out.println("Output: " + fileOutput);
            } catch (IOException e) {
                // log
            }
    
            iterFiles = files.iterator();
            fileWriter = new BufferedWriter(new FileWriter(csvFinal, true));
    
            // Headers
            Scanner scanner = new Scanner(files.get(0));
            if (scanner.hasNextLine())
                header = scanner.nextLine();
            // if (scanner.hasNextLine()) headers = scanner.nextLine().split(";");
            scanner.close();
    
            /*
             * System.out.println(header); for(String s: headers){
             * fileWriter.write(s); System.out.println(s); }
             */
    
            fileWriter.write(header);
            fileWriter.newLine();
    
            while (iterFiles.hasNext()) {
    
                String line;// = null;
                String[] firstLine;// = null;
    
                File nextFile = iterFiles.next();
                fileReader = new BufferedReader(new FileReader(nextFile));
    
                if ((line = fileReader.readLine()) != null)
                    firstLine = line.split(";");
    
                while ((line = fileReader.readLine()) != null) {
                    fileWriter.write(line);
                    fileWriter.newLine();
                }
                fileReader.close();
            }
    
            fileWriter.close();
    
        }
    
    }
    
    0 讨论(0)
  • 2020-12-30 10:50

    Here is an example:

    public static void main(String[] args) throws IOException {
        List<Path> paths = Arrays.asList(Paths.get("c:/temp/file1.csv"), Paths.get("c:/temp/file2.csv"));
        List<String> mergedLines = getMergedLines(paths);
        Path target = Paths.get("c:/temp/merged.csv");
        Files.write(target, mergedLines, Charset.forName("UTF-8"));
    }
    
    private static List<String> getMergedLines(List<Path> paths) throws IOException {
        List<String> mergedLines = new ArrayList<> ();
        for (Path p : paths){
            List<String> lines = Files.readAllLines(p, Charset.forName("UTF-8"));
            if (!lines.isEmpty()) {
                if (mergedLines.isEmpty()) {
                    mergedLines.add(lines.get(0)); //add header only once
                }
                mergedLines.addAll(lines.subList(1, lines.size()));
            }
        }
        return mergedLines;
    }
    
    0 讨论(0)
  • 2020-12-30 10:53

    Late here but Fuzzy-Csv (https://github.com/kayr/fuzzy-csv/) was designed just for that.

    This is what the code would look like

            String csv1 = "NAME,SURNAME,AGE\n" +
                    "Fred,Krueger,Unknown";
    
            String csv2 = "NAME,MIDDLENAME,SURNAME,AGE\n" +
                    "Jason,Noname,Scarry,16";
    
            FuzzyCSVTable t1 = FuzzyCSVTable.parseCsv(csv1);
            FuzzyCSVTable t2 = FuzzyCSVTable.parseCsv(csv2);
    
            FuzzyCSVTable output = t1.mergeByColumn(t2);
    
            output.printTable();
    

    Output

    ╔═══════╤═════════╤═════════╤════════════╗
    ║ NAME  │ SURNAME │ AGE     │ MIDDLENAME ║
    ╠═══════╪═════════╪═════════╪════════════╣
    ║ Fred  │ Krueger │ Unknown │ -          ║
    ╟───────┼─────────┼─────────┼────────────╢
    ║ Jason │ Scarry  │ 16      │ Noname     ║
    ╚═══════╧═════════╧═════════╧════════════╝
    

    You can re-export your csv using one of the helper methods

    output.write("FilePath.csv");
    
    or 
    
    output.toCsvString()
    
    
    0 讨论(0)
  • 2020-12-30 10:54

    It seems a bit heavyweight to do this in Java. Its trivial in a Linux shell:

    (cat FileA ; tail --lines=+2 FileB) > FileC
    
    0 讨论(0)
提交回复
热议问题