问题
Every time when i am reading CSv file as list by using this long method, can we simplify this?
- Creating empty List
- Reading file row-wise and appending to the list
filename = 'mtms_excelExtraction_m_Model_Definition.csv'
Ana_Type = []
Ana_Length = []
Ana_Text = []
Ana_Space = []
with open(filename, 'rt') as f:
reader = csv.reader(f)
try:
for row in reader:
Ana_Type.append(row[0])
Ana_Length.append(row[1])
Ana_Text.append(row[2])
Ana_Space.append(row[3])
except csv.Error as e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
回答1:
This is a good opportunity for you to start using pandas and working with DataFrames.
import pandas as pd
df = pd.read_csv(path_to_csv)
1-2 (depending on if you count the import) lines of code and you're done!
回答2:
This one is essentially the numpy way of processing the csv file, without using numpy. Whether it is better than your original method is close to a matter of taste. It has in common with the numpy or Pandas method the fact of loading the whole file in memory and than transposing it into lists:
with open(filename, 'rt') as f:
reader = csv.reader(f)
tmp = list(reader)
Ana_Type, Ana_Length, Ana_Text, Ana_Space = [[tmp[i][j] for i in range(len(tmp))]
for j in range(len(tmp[0]))]
It uses less code, and build arrays with comprehensions instead of repeated appends, but more memory (as would numpy or pandas).
Depending on how you later process the data, numpy or Pandas could be a nice option. Because IMHO using them only to load a csv file into list is not worth it.
回答3:
You can use a DictReader
import csv
with open(filename, 'rt') as f:
data = list(csv.DictReader(f, fieldnames=["Type", "Length", "Text", "Space"]))
print(data)
This will give you a single list of dict objects, one per row.
回答4:
Try this
import csv
from collections import defaultdict
d = defaultdict(list)
with open(filename, mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
for k,v in row.items():
d[k].append(v)
next
d.keys()
dict_keys(['Ana_Type', 'Ana_Length', 'Ana_Text', 'Ana_Space'])
next
d.get('Ana_Type')
['bla','bla1','df','ccc']
回答5:
The repetitive calls to list.append can be avoided by reading the csv and using the zip builtin function to transpose the rows.
import io, csv
# Create an example file
buf = io.StringIO('type1,length1,text1,space1\ntype2,length2,text2,space2\ntype3,length3,text3,space3')
reader = csv.reader(buf)
# Uncomment the next line if there is a header row
# next(reader)
Ana_Types, Ana_Length, Ana_Text, Ana_Space = zip(*reader)
print(Ana_Types)
('type1', 'type2', 'type3')
print(Ana_Length)
('length1', 'length2', 'length3')
...
If you need lists rather than tuples you can use a list or generator comprehension to convert them:
Ana_Types, Ana_Length, Ana_Text, Ana_Space = [list(x) for x in zip(*reader)]
回答6:
This could be useful:
import numpy as np
# read the rows with Numpy
rows = np.genfromtxt('data.csv',dtype='str',delimiter=';')
# call numpy.transpose to convert the rows to columns
cols = np.transpose(rows)
# get the stuff as lists
Ana_Type = list(cols[0])
Ana_Length = list(cols[1])
Ana_Text = list(cols[2])
Ana_Space = list(cols[0])
Edit : note that the first element will be the name of the columns (example with test data):
['Date', '2020-03-03', '2020-03-04', '2020-03-05', '2020-03-06']
来源:https://stackoverflow.com/questions/63056391/is-there-any-better-way-for-reading-files