Dealing with commas in a CSV file

后端 未结 27 2547
傲寒
傲寒 2020-11-21 06:53

I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.

相关标签:
27条回答
  • 2020-11-21 07:07

    There is a library available through nuget for dealing with pretty much any well formed CSV (.net) - CsvHelper

    Example to map to a class:

    var csv = new CsvReader( textReader );
    var records = csv.GetRecords<MyClass>();
    

    Example to read individual fields:

    var csv = new CsvReader( textReader );
    while( csv.Read() )
    {
        var intField = csv.GetField<int>( 0 );
        var stringField = csv.GetField<string>( 1 );
        var boolField = csv.GetField<bool>( "HeaderName" );
    }
    

    Letting the client drive the file format:
    , is the standard field delimiter, " is the standard value used to escape fields that contain a delimiter, quote, or line ending.

    To use (for example) # for fields and ' for escaping:

    var csv = new CsvReader( textReader );
    csv.Configuration.Delimiter = "#";
    csv.Configuration.Quote = ''';
    // read the file however meets your needs
    

    More Documentation

    0 讨论(0)
  • 2020-11-21 07:07

    In case you're on a *nix-system, have access to sed and there can be one or more unwanted commas only in a specific field of your CSV, you can use the following one-liner in order to enclose them in " as RFC4180 Section 2 proposes:

    sed -r 's/([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*)/\1"\2"\3/' inputfile
    

    Depending on which field the unwanted comma(s) may be in you have to alter/extend the capturing groups of the regex (and the substitution).
    The example above will enclose the fourth field (out of six) in quotation marks.

    enter image description here

    In combination with the --in-place-option you can apply these changes directly to the file.

    In order to "build" the right regex, there's a simple principle to follow:

    1. For every field in your CSV that comes before the field with the unwanted comma(s) you write one [^,]*, and put them all together in a capturing group.
    2. For the field that contains the unwanted comma(s) you write (.*).
    3. For every field after the field with the unwanted comma(s) you write one ,.* and put them all together in a capturing group.

    Here is a short overview of different possible regexes/substitutions depending on the specific field. If not given, the substitution is \1"\2"\3.

    ([^,]*)(,.*)                     #first field, regex
    "\1"\2                           #first field, substitution
    
    (.*,)([^,]*)                     #last field, regex
    \1"\2"                           #last field, substitution
    
    
    ([^,]*,)(.*)(,.*,.*,.*)          #second field (out of five fields)
    ([^,]*,[^,]*,)(.*)(,.*)          #third field (out of four fields)
    ([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*) #fourth field (out of six fields)
    

    If you want to remove the unwanted comma(s) with sed instead of enclosing them with quotation marks refer to this answer.

    0 讨论(0)
  • 2020-11-21 07:08

    Here's a neat little workaround:

    You can use a Greek Lower Numeral Sign instead (U+0375)

    It looks like this ͵

    Using this method saves you a lot of resources too...

    0 讨论(0)
  • 2020-11-21 07:08

    Just use SoftCircuits.CsvParser on NuGet. It will handle all those details for you and efficiently handles very large files. And, if needed, it can even import/export objects by mapping columns to object properties. In addition, my testing showed it averages nearly 4 times faster than the popular CsvHelper.

    0 讨论(0)
  • 2020-11-21 07:08

    An example might help to show how commas can be displayed in a .csv file. Create a simple text file as follows:

    Save this text file as a text file with suffix ".csv" and open it with Excel 2000 from Windows 10.

    aa,bb,cc,d;d "In the spreadsheet presentation, the below line should look like the above line except the below shows a displayed comma instead of a semicolon between the d's." aa,bb,cc,"d,d", This works even in Excel

    aa,bb,cc,"d,d", This works even in Excel 2000 aa,bb,cc,"d ,d", This works even in Excel 2000 aa,bb,cc,"d , d", This works even in Excel 2000

    aa,bb,cc, " d,d", This fails in Excel 2000 due to the space belore the 1st quote aa,bb,cc, " d ,d", This fails in Excel 2000 due to the space belore the 1st quote aa,bb,cc, " d , d", This fails in Excel 2000 due to the space belore the 1st quote

    aa,bb,cc,"d,d " , This works even in Excel 2000 even with spaces before and after the 2nd quote. aa,bb,cc,"d ,d " , This works even in Excel 2000 even with spaces before and after the 2nd quote. aa,bb,cc,"d , d " , This works even in Excel 2000 even with spaces before and after the 2nd quote.

    Rule: If you want to display a comma in a a cell (field) of a .csv file: "Start and end the field with a double quotes, but avoid white space before the 1st quote"

    0 讨论(0)
  • 2020-11-21 07:11

    I used papaParse library to have the CSV file parsed and have the key-value pairs(key/header/first row of CSV file-value).

    here is example that I use:

    https://codesandbox.io/embed/llqmrp96pm

    it has dummy.csv file in there to have the CSV parsing demo.

    I've used it within reactJS though it is easy and simple to replicate in app written with any language.

    0 讨论(0)
提交回复
热议问题