Read a small random sample from a big CSV file into a Python data frame

后端 未结 13 1982
暖寄归人
暖寄归人 2020-11-27 02:37

The CSV file that I want to read does not fit into main memory. How can I read a few (~10K) random lines of it and do some simple statistics on the selected data frame?

13条回答
  •  旧巷少年郎
    2020-11-27 03:22

    read the data file

    import pandas as pd
    df = pd.read_csv('data.csv', 'r')
    

    First check the shape of df

    df.shape()
    

    create the small sample of 1000 raws from df

    sample_data = df.sample(n=1000, replace='False')
    

    #check the shape of sample_data

    sample_data.shape()
    

提交回复
热议问题