is there any better way for reading files?

六眼飞鱼酱① 提交于 2020-12-15 07:34:27

问题


Every time when i am reading CSv file as list by using this long method, can we simplify this?

  1. Creating empty List
  2. Reading file row-wise and appending to the list
filename = 'mtms_excelExtraction_m_Model_Definition.csv'
Ana_Type = []
Ana_Length = []
Ana_Text = []
Ana_Space = []                                                                                                                                                                                                                                                                     
with open(filename, 'rt') as f:  
    reader = csv.reader(f)   
    try:
        for row in reader:
            Ana_Type.append(row[0])
            Ana_Length.append(row[1])
            Ana_Text.append(row[2])
            Ana_Space.append(row[3])            
    except csv.Error as e:
        sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

回答1:


This is a good opportunity for you to start using pandas and working with DataFrames.

import pandas as pd

df = pd.read_csv(path_to_csv)

1-2 (depending on if you count the import) lines of code and you're done!




回答2:


This one is essentially the numpy way of processing the csv file, without using numpy. Whether it is better than your original method is close to a matter of taste. It has in common with the numpy or Pandas method the fact of loading the whole file in memory and than transposing it into lists:

with open(filename, 'rt') as f:  
    reader = csv.reader(f)   
    tmp = list(reader)
Ana_Type, Ana_Length, Ana_Text, Ana_Space = [[tmp[i][j] for i in range(len(tmp))]
                                             for j in range(len(tmp[0]))]

It uses less code, and build arrays with comprehensions instead of repeated appends, but more memory (as would numpy or pandas).

Depending on how you later process the data, numpy or Pandas could be a nice option. Because IMHO using them only to load a csv file into list is not worth it.




回答3:


You can use a DictReader

import csv

with open(filename, 'rt') as f:  
    data = list(csv.DictReader(f, fieldnames=["Type", "Length", "Text", "Space"]))

print(data)

This will give you a single list of dict objects, one per row.




回答4:


Try this

import csv
from collections import defaultdict
d = defaultdict(list)
with open(filename, mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        for k,v in row.items():
            d[k].append(v)

next

d.keys()
dict_keys(['Ana_Type', 'Ana_Length', 'Ana_Text', 'Ana_Space'])

next

d.get('Ana_Type')
['bla','bla1','df','ccc']



回答5:


The repetitive calls to list.append can be avoided by reading the csv and using the zip builtin function to transpose the rows.

import io, csv

# Create an example file
buf = io.StringIO('type1,length1,text1,space1\ntype2,length2,text2,space2\ntype3,length3,text3,space3')

reader = csv.reader(buf)
# Uncomment the next line if there is a header row
# next(reader)

Ana_Types, Ana_Length, Ana_Text, Ana_Space = zip(*reader)

print(Ana_Types)
('type1', 'type2', 'type3')
print(Ana_Length)
('length1', 'length2', 'length3')
...

If you need lists rather than tuples you can use a list or generator comprehension to convert them:

Ana_Types, Ana_Length, Ana_Text, Ana_Space = [list(x) for x in zip(*reader)]



回答6:


This could be useful:

import numpy as np
# read the rows with Numpy
rows = np.genfromtxt('data.csv',dtype='str',delimiter=';')
# call numpy.transpose to convert the rows to columns
cols = np.transpose(rows)

# get the stuff as lists
Ana_Type = list(cols[0])
Ana_Length = list(cols[1])
Ana_Text = list(cols[2])
Ana_Space = list(cols[0]) 

Edit : note that the first element will be the name of the columns (example with test data):

['Date', '2020-03-03', '2020-03-04', '2020-03-05', '2020-03-06']


来源:https://stackoverflow.com/questions/63056391/is-there-any-better-way-for-reading-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!