using papa parse for big csv files

拟墨画扇 提交于 2020-01-13 03:13:09

问题


I am trying to load a file that has about 100k in lines and so far the browser has been crashing ( locally ). I looked on the internet and saw Papa Parse seems to handle large files. Now it is reduced down to about 3-4 minutes to load into the textarea. Once the file is loaded, I then want to do some more jQuery to do counts and things so the process is taking awhile. Is there a way to make the csv load faster? Am I using the program correctly?

<div id="tabs">
<ul>
  <li><a href="#tabs-4">Generate a Report</a></li>
</ul>
<div id="tabs-4">
  <h2>Generating a CSV report</h2>
  <h4>Input Data:</h4>      
  <input id="myFile" type="file" name="files" value="Load File" />
  <button onclick="loadFileAsText()">Load Selected File</button>
  <form action="./" method="post">
  <textarea id="input3" style="height:150px;"></textarea>

  <input id="run3" type="button" value="Run" />
  <input id="runSplit" type="button" value="Run Split" />
  <input id="downloadLink" type="button" value="Download" />
  </form>
</div>
</div>

$(function () {
    $("#tabs").tabs();
});

var data = $('#input3').val();

function handleFileSelect(evt) {
    var file = evt.target.files[0];

Papa.parse(file, {
    header: true,
    dynamicTyping: true,
    complete: function (results) {
        data = results;
    }
});
}

$(document).ready(function () {

    $('#myFile').change(function(handleFileSelect){

    });
});


function loadFileAsText() {
    var fileToLoad = document.getElementById("myFile").files[0];

    var fileReader = new FileReader();
    fileReader.onload = function (fileLoadedEvent) {
        var textFromFileLoaded = fileLoadedEvent.target.result;
        document.getElementById("input3").value = textFromFileLoaded;
    };
    fileReader.readAsText(fileToLoad, "UTF-8");
}

回答1:


You probably are using it correctly, it is just the program will take some time to parse through all 100k lines!

This is probably a good use case scenario for Web Workers.

If you've never used them before, this site gives a decent rundown, but the key part is:

Web Workers mimics multithreading, allowing intensive scripts to be run in the background so they do not block other scripts from running. Ideal for keeping your UI responsive while also performing processor-intensive functions.

Browser coverage is pretty decent as well, with IE10 and below being the only semi-modern browsers that don't support it.

Mozilla has a good video that shows how web workers can speed up frame rate on a page as well.

I'll try to get a working example with web workers for you, but also note that this won't speed up the script, it'll just make it process asynchronously so your page stays responsive.

EDIT:

(NOTE: if you want to parse the CSV within the worker, you'll probably need to import the Papa Parser script within worker.js using the importScript function (which is globally defined within the worker thread). See the MDN page for more info on that.)

Here is my working example:

csv.html

<!doctype html>
<html>
<head>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0/jquery.min.js"></script>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.1.2/papaparse.js"></script>
</head>

<body>
  <input id="myFile" type="file" name="files" value="Load File" />
  <br>
  <button class="load-file">Load and Parse Selected CSV File</button>
  <div id="report"></div>

<script>
// initialize our parsed_csv to be used wherever we want
var parsed_csv;
var start_time, end_time;

// document.ready
$(function() {

  $('.load-file').on('click', function(e) {
    start_time = performance.now();
    $('#report').text('Processing...');

    console.log('initialize worker');

    var worker = new Worker('worker.js');
    worker.addEventListener('message', function(ev) {
      console.log('received raw CSV, now parsing...');

      // Parse our CSV raw text
      Papa.parse(ev.data, {
        header: true,
        dynamicTyping: true,
        complete: function (results) {
            // Save result in a globally accessible var
          parsed_csv = results;
          console.log('parsed CSV!');
          console.log(parsed_csv);

          $('#report').text(parsed_csv.data.length + ' rows processed');
          end_time = performance.now();
          console.log('Took ' + (end_time - start_time) + " milliseconds to load and process the CSV file.")
        }
      });

      // Terminate our worker
      worker.terminate();
    }, false);

    // Submit our file to load
    var file_to_load = document.getElementById("myFile").files[0];

    console.log('call our worker');
    worker.postMessage({file: file_to_load});
  });

});
</script>
</body>

</html>

worker.js

self.addEventListener('message', function(e) {
    console.log('worker is running');

    var file = e.data.file;
    var reader = new FileReader();

    reader.onload = function (fileLoadedEvent) {
        console.log('file loaded, posting back from worker');

        var textFromFileLoaded = fileLoadedEvent.target.result;

        // Post our text file back from the worker
        self.postMessage(textFromFileLoaded);
    };

    // Actually load the text file
    reader.readAsText(file, "UTF-8");
}, false);

GIF of it processing, takes less than a second (all running locally)




回答2:


As of v5, PapaParse has now baked in WebWorkers.

A simple example of invoking the worker within Papaparse is below

Papa.parse(bigFile, {
    worker: true,
    step: function(results) {
        console.log("Row:", results.data);
    }
});

No need to re-implement if you have your own worker with PP, but for future projects, some may find it easier to use PapaParse's solution.



来源:https://stackoverflow.com/questions/38100011/using-papa-parse-for-big-csv-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!