read csv file with java - comma delimiter in text field

前端 未结 4 1346
逝去的感伤
逝去的感伤 2020-12-22 10:29

I have a comma separated CSV file contains NASDAQ symbols . I use Scanner to read a file

  s = new Scanner(new File(\"C:\\\\nasdaq_companylist.csv\")).useD         


        
相关标签:
4条回答
  • 2020-12-22 11:12

    As others have correctly pointed out, rolling your own csv parser is not a good idea as it will usually leave huge security holes in your system.

    That said, I use this regex:

    "((?:\"[^\"]*?\")*|[^\"][^,]*?)([,]|$)"
    

    which does a good job with well-formed csv data. You will need to use a Pattern and a Matcher with it.

    This is what it does:

    /*
     ( - Field Group
       (?: - Non-capturing (because the outer group will do the capturing) consume of quoted strings
        \"  - Start with a quote
        [^\"]*? - Non-greedy match on anything that is not a quote
        \" - End with a quote
       )* - And repeat
      | - Or
       [^\"] - Not starting with a quote
       [^,]*? - Non-greedy match on anything that is not a comma
     ) - End field group
     ( - Separator group
      [,]|$ - Comma separator or end of line
     ) - End separator group 
    */
    

    Note that it parses the data into two groups, the field and the separator. It also leaves the quote characters in the field, you may wish to remove them and replace "" with " etc.

    0 讨论(0)
  • 2020-12-22 11:14

    Your safest bet is you use csv parsing library. Your comma is enclosed in quotes. You'd need to implement logic to look for quoted commas. However you'd also need to plan for other situations, like quote within a quote, escape sequences etc. Better use some ready for use and tested solution. Use google, you'll find some. CSV files can be tricky to use on your own.

    0 讨论(0)
  • 2020-12-22 11:22

    I hope you can remove \ \ s * from your regular expression. Then have:

    while (s.hasNext() {
        String symbol = s.next();
        if (symbol.startsWith("\"")) {
            while ((symbol.endsWith("\"") || symbol.length() == 1) && s.hasNext()) {
                symbol += "," + s.next();
            }
        }
    ...
    
    0 讨论(0)
  • 2020-12-22 11:26

    Unless this is homework you should not parse CSV yourself. Use one of existing libraries. For example this one: http://commons.apache.org/sandbox/csv/

    Or google "java csv parser" and choose another.

    But if you wish to implement the logic yourself you should use negative lookahead feature of regular expressions (see http://download.oracle.com/javase/1,5.0/docs/api/java/util/regex/Pattern.html)

    0 讨论(0)
提交回复
热议问题