Fastest way to parse large CSV files in Pandas

后端 未结 3 1123
悲哀的现实
悲哀的现实 2020-12-09 08:22

I am using pandas to analyse the large data files here: http://www.nielda.co.uk/betfair/data/ They are around 100 megs in size.

Each load from csv takes a few second

3条回答
  •  心在旅途
    2020-12-09 09:22

    Modin is an early-stage project at UC Berkeley’s RISELab designed to facilitate the use of distributed computing for Data Science. It is a multiprocess Dataframe library with an identical API to pandas that allows users to speed up their Pandas workflows. Modin accelerates Pandas queries by 4x on an 8-core machine, only requiring users to change a single line of code in their notebooks.

    pip install modin
    

    if using dask

    pip install modin[dask]
    

    import modin by typing

    import modin.pandas as pd
    

    It uses all CPU cores to import csv file and it is almost like pandas.

提交回复
热议问题