Apps Script Utilities.parseCsv assumes new row on line breaks within double quotes

前端 未结 5 822
渐次进展
渐次进展 2020-12-18 08:39

When using Utilities.parseCsv() linebreaks encased inside double quotes are assumed to be new rows entirely. The output array from this function will have sever

相关标签:
5条回答
  • 2020-12-18 09:25

    I had this same problem and have finally figured it out. Thanks Douglas for the Regex/code (a bit over my head I must say) it matches up nicely to the field in question. Unfortunately, that is only half the battle. The replace shown will simply replaces the entire field with \r\n. So that only works when whatever is between the "" in the CSV file is only \r\n. If it is embedded in the field with other data it silently destroys that data. To solve the other half of the problem, you need to use a function as your replace. The replace takes the matching field as a parameter so so you can execute a simple replace call in the function to address just that field. Example...

    Data:

    "Student","Officer
    

    RD

    Special Member","Member",705,"2016-07-25 22:40:04 EDT"
    

    Code to process:

    var dataString = myBlob().getDataAsString(); 
    var escapedString = dataString.replace(/(?=["'])(?:"[^"\](?:\[\s\S][^"\])"|'[^'\]\r\n(?:\[\s\S][^'\]\r\n)')/g, function(match) { return match.replace(/\r\n/g,"\r\n")} ); 
    var csvData = Utilities.parseCsv(escapedString);
    

    Now the "Officer\r\nRD\r\nSpecial Member" field gets evaluated individually so the match.replace call in the replace function can be very straight forward and simple.

    0 讨论(0)
  • 2020-12-18 09:28

    So I had a somewhat large csv file about 10MB 50k rows, which contained a field at the end of each row with comments that users enter with all sorts of characters inside. I found the proposed regex solution was working when I tested a small set of the rows, but when I threw the big file to it, there was an error again and after trying a few things with the regex I even got to crash the whole runtime.

    BTW I'm running my code on the V8 runtime.

    After scratching my head for about an hour and with not really helpful error messages from AppsSript runtime. I had an idea, what if some weird users where deciding to use back-slashes in some weird ways making some escapes go wrong. So I tried replacing all back-slashes in my data with something else for a while until I had the array that parseCsv() returns. It worked! My hypothesis is that having a \ at the end of lines was breaking the replacement.

    So my final solution is:

    function testParse() {
        let csv =
            '"title1","title2","title3"\r\n' +
            '1,"person1","A ""comment"" with a \\ and \\\r\n a second line"\r\n' +
            '2,"person2","Another comment"';
    
        let sanitizedString =
            csv.replace(/\\/g, '::back-slash::')
                .replace(/(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]\r?\n(?:\\[\s\S][^'\\]\r?\n)*')/g,
                    match => match.replace(/\r?\n/g, "::newline::"));
        let arr = Utilities.parseCsv(sanitizedString);
        for (let i = 0, rows = arr.length; i < rows; i++) {
            for (let j = 0, cols = arr[i].length; j < cols; j++) {
                arr[i][j] = 
                    arr[i][j].replace(/::back-slash::/g,'\\')
                        .replace(/::newline::/g,'\r\n');
    
            }
        }
        Logger.log(arr)
    }

    Output:

    [20-02-18 11:29:03:980 CST] [[title1, title2, title3], [1, person1, A "comment" with a \ and \
     a second line], [2, person2, Another comment]]
    
    0 讨论(0)
  • 2020-12-18 09:35

    It may be helpful for you to use Sheets API. In my case, it works fine without replacing the CSV text that contains double-quoted multi-line text.

    First, you need to make sure of bellow:

    Enabling advanced services

    To use an advanced Google service, follow these instructions:

    1. In the script editor, select Resources > Advanced Google services....
    2. In the Advanced Google Service dialog that appears, click the on/off switch next to the service you want to use.
    3. Click OK in the dialog.

    If it is ok, you can import a CSV text data into a sheet with:

    var ss = SpreadsheetApp.getActiveSpreadsheet();
    var sheet = ss.getSheetByName('some_name');
    const resource = {
        requests: [
            {
                pasteData: {
                    data: csvText, // Your CSV data string
                    coordinate: {sheetId: sheet.getSheetId()},
                    delimiter: ",",
                }
            }
    
        ]
    };
    Sheets.Spreadsheets.batchUpdate(resource, ss.getId());
    

    or for TypeScript, which can be used by clasp:

    var ss = SpreadsheetApp.getActiveSpreadsheet();
    var sheet = ss.getSheetByName('some_name');
    const resource: GoogleAppsScript.Sheets.Schema.BatchUpdateSpreadsheetRequest = {
        requests: [
            {
                pasteData: {
                    data: csvText, // Your CSV data string
                    coordinate: {sheetId: sheet.getSheetId()},
                    delimiter: ",",
                }
            }
    
        ]
    };
    Sheets.Spreadsheets.batchUpdate(resource, ss.getId());
    
    0 讨论(0)
  • 2020-12-18 09:38

    Retrieved and slightly modified a regex from another reply on another post: https://stackoverflow.com/a/29452781/3547347

    Regex: (?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]\r\n(?:\\[\s\S][^'\\]\r\n)*')

    Code:

      var dataString = myBlob.getDataAsString();
      var escapedString = dataString.replace(/(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]\r\n(?:\\[\s\S][^'\\]\r\n)*')/g, '\\r\\n');
    
    0 讨论(0)
  • 2020-12-18 09:44

    To avoid trying to understand regular expressions, I found a workaround below, not using Utilities.parseCsv(). I'm copying the data line by line.

    Here is how it goes:

    If you can find a way to add an extra column to the end of your CSV, that contains the exact same value all the time, then you can force a specific "line break separator" according to that value.

    Then, you copy the whole line into column A and use google app script' dedicated splitTextToColumns() method...

    In the example below, I'm getting the CSV from an HTML form. This works because I also have admin access to the database the user takes the CSV from, so I could force that last column on all CSV files...

    function updateSheet(form) {
      var fileData = form.myFile;
      // gets value from form
      blob = fileData.getBlob();
      var name = String(form.folderId);
      // gets value from form
    
      var ss = SpreadsheetApp.getActiveSpreadsheet();
      var sheet = ss.setActiveSheet(ss.getSheetByName(name), true);
    
      sheet.clearContents().clearFormats();
    
      var values = [];
      // below, the "Dronix" value is the value that I could force at the end of each row
      var rows = blob.contents.split('"Dronix",\n');
    
      if (rows.length > 1) {
        for (var r = 2, max_r = rows.length; r < max_r; ++r) {
          sheet.getRange(r + 6, 1, 1, 1).setValue(String(rows[r]));
        }
      }
    
      var spreadsheet = SpreadsheetApp.getActive();
      spreadsheet.getRange("A:A").activate();
      spreadsheet.getRange("A:A").splitTextToColumns();
    }
    
    0 讨论(0)
提交回复
热议问题