Issues converting csv to xls in Java? Only core Java experience needed - question not related to import

后端 未结 4 1489
甜味超标
甜味超标 2020-12-22 08:35

First of all, I understand that it\'s unusual that I want to up-convert like this, but please bear with me. We get these csv files via website export and we have no options

相关标签:
4条回答
  • 2020-12-22 08:54

    Rather than depend on split(), write your own parser to handle this situation. Have your grammar treat all the characters between a pair of " " or ' ' as a single token.

    0 讨论(0)
  • 2020-12-22 08:55

    I know this doesn't really help you with your immediate problem, but my advice is: don't do it at all. You'll be very lucky if you get away with just dealing with embedded commas. What about embedded double quotes? Embedded line breaks? etc. etc...

    Quite honestly? The answer is to find a library which parses CSVs and use that. I'm pretty sure nearly every single developer in the world has fallen into the "oh, CSV is such a simple format, I'll parse it myself" trick. I know I have.

    There's a great post about the problems with roll-your-own CSV Parsers which I love referring people to (I'm cruel like that). It's a .NET-related post, but it still applies to your situation. Note that you're only up to step #2 of 5... there's a lot to go.

    0 讨论(0)
  • 2020-12-22 08:58

    Just check the CSV char by char and set a toggle whenever a quote occurs. Here's a kickoff example:

    public static List<List<String>> parseCsv(InputStream input, char separator) 
        throws IOException 
    {
        BufferedReader reader = null;
        List<List<String>> csv = new ArrayList<List<String>>();
        try {
            reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
            for (String record; (record = reader.readLine()) != null;) {
                boolean quoted = false;
                StringBuilder fieldBuilder = new StringBuilder();
                List<String> fields = new ArrayList<String>();
                for (int i = 0; i < record.length(); i++) {
                    char c = record.charAt(i);
                    fieldBuilder.append(c);
                    if (c == '"') {
                        quoted = !quoted;
                    }
                    if ((!quoted && c == separator) || i + 1 == record.length()) {
                        fields.add(fieldBuilder.toString().replaceAll(separator + "$", "")
                            .replaceAll("^\"|\"$", "").replace("\"\"", "\"").trim());
                        fieldBuilder = new StringBuilder();
                    }
                }
                csv.add(fields);
            }
        } finally {
            if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
        }
        return csv;
    }
    

    You can however also just grab any 3rd party Java CSV API which may have some more features and so on.

    0 讨论(0)
  • 2020-12-22 09:08

    Managed to answer my own question. With a bit of searching, I managed to find this little pdf here:

    http://www.objectmentor.com/resources/articles/tfd.pdf

    From there, I managed to adopt the code on page 35 to work with my program. All credit goes to Jeff Langr, 2001. All I did was make it work with some of Java's current standards.

    Here's the code for all the people who may encounter this problem in the future.

    import java.io.BufferedReader;
    import java.io.IOException;
    import java.util.ArrayList;
    
    public class CSVReader {
    
        private BufferedReader reader;
        private String line;
        private static final String DOUBLE_QUOTE = "\"";
        private static final String COMMENT_SYMBOL = "#";
        private static final char stateINIT = 'S';
        private static final char stateCOMMENT = '#';
        private static final char stateQUOTED_DATA = 'q';
        private static final char stateQUOTE_IN_QUOTED_DATA = 'Q';
        private static final char stateDATA = 'D';
        private static final char stateNEW_TOKEN = 'N';
        private static final char stateWHITESPACE = 'W';
    
        public CSVReader(String filename) throws IOException {
            reader = new BufferedReader(new java.io.FileReader(filename));
            loadNextNonCommentLine();
        }
    
        public ArrayList<String> next() throws IOException {
            if (line == null)
                throw new IOException("Read past end of file");
            ArrayList<String> columns = columnsFromCSVRecord(line);
            loadNextNonCommentLine();
            return columns;
        }
    
        public boolean hasNext() {
            return line != null;
        }
    
        void loadNextNonCommentLine() throws IOException {
            do
                line = reader.readLine();
            while (line != null && line.startsWith(COMMENT_SYMBOL));
            if (line == null)
                reader.close();
        }
    
        public ArrayList<String> columnsFromCSVRecord(String line) throws IOException {
            char state = stateINIT;
            char ch;
            int i = 0;
            ArrayList<String> tokens = new ArrayList<String>();
            StringBuffer buffer = new StringBuffer();
            char[] charArray = line.toCharArray();
            while (i < charArray.length) {
                ch = charArray[i++];
                switch (state) {
                case stateINIT:
                    switch (ch) {
                    case '"':
                        buffer.append(ch);
                        state = stateQUOTED_DATA;
                        break;
                    case ',':
                        state = stateNEW_TOKEN;
                        tokens.add(clean(buffer));
                        buffer = new StringBuffer();
                        break;
                    case '\t':
                    case ' ':
                        break;
                    case '#':
                        state = stateCOMMENT;
                        break;
                    default:
                        state = stateDATA;
                        buffer.append(ch);
                        break;
                    }
                    break;
                case stateCOMMENT:
                    break;
                case stateQUOTED_DATA:
                    switch (ch) {
                    case '"':
                        buffer.append(ch);
                        state = stateQUOTE_IN_QUOTED_DATA;
                        break;
                    default:
                        buffer.append(ch);
                        break;
                    }
                    break;
                case stateQUOTE_IN_QUOTED_DATA:
                    switch (ch) {
                    case '"':
                        state = stateQUOTED_DATA;
                        break;
                    case ',':
                        state = stateNEW_TOKEN;
                        tokens.add(clean(buffer));
                        buffer = new StringBuffer();
                        break;
                    case ' ':
                    case '\t':
                        break;
                    case '#':
                        tokens.add(clean(buffer));
                        state = stateCOMMENT;
                        break;
                    default:
                        throw new IOException("badly formed CSV record:" + line);
                    }
                    break;
                case stateDATA:
                    switch (ch) {
                    case '#':
                        tokens.add(clean(buffer));
                        state = stateCOMMENT;
                        break;
                    case ',':
                        state = stateNEW_TOKEN;
                        tokens.add(clean(buffer));
                        buffer = new StringBuffer();
                        break;
                    default:
                        buffer.append(ch);
                        break;
                    }
                    break;
                case stateNEW_TOKEN:
                    switch (ch) {
                    case '#':
                        tokens.add(clean(buffer));
                        state = stateCOMMENT;
                        break;
                    case ',':
                        tokens.add(clean(buffer));
                        buffer = new StringBuffer();
                        break;
                    case ' ':
                    case '\t':
                        state = stateWHITESPACE;
                        break;
                    case '"':
                        buffer.append(ch);
                        state = stateQUOTED_DATA;
                        break;
                    default:
                        state = stateDATA;
                        buffer.append(ch);
                        break;
                    }
                    break;
                case stateWHITESPACE:
                    switch (ch) {
                    case '#':
                        state = stateCOMMENT;
                        break;
                    case ',':
                        state = stateNEW_TOKEN;
                        // ACCEPT NEW EMPTY COLUMN HERE??
                        break;
                    case '"':
                        buffer.append(ch);
                        state = stateQUOTED_DATA;
                        break;
                    case ' ':
                    case '\t':
                        break;
                    default:
                        state = stateDATA;
                        buffer.append(ch);
                        break;
                    }
                    break;
                default:
                    break;
                }
            }
            if (state == stateQUOTED_DATA)
                throw new IOException("Unmatched quotes in line:\n" + line);
            if (state != stateCOMMENT)
                tokens.add(clean(buffer));
            return tokens;
        }
    
        public String clean(StringBuffer buffer) {
            String string = buffer.toString().trim();
            if (string.startsWith(DOUBLE_QUOTE))
                return string.substring(1, string.length() - 1);
            return string;
        }
    }
    
    0 讨论(0)
提交回复
热议问题