Python: skip comment lines marked with # in csv.DictReader

ぃ、小莉子 提交于 2019-12-17 08:36:44

问题


Processing CSV files with csv.DictReader is great - but I have CSV files with comment lines in (indicated by a hash at the start of a line), for example:

# step size=1.61853
val0,val1,val2,hybridisation,temp,smattr
0.206895,0.797923,0.202077,0.631199,0.368801,0.311052,0.688948,0.597237,0.402763
-169.32,1,1.61853,2.04069e-92,1,0.000906546,0.999093,0.241356,0.758644,0.202382
# adaptation finished

The csv module doesn't include any way to skip such lines.

I could easily do something hacky, but I imagine there's a nice way to wrap a csv.DicReader around some other iterator object, which preprocesses to discard the lines.


回答1:


Actually this works nicely with filter:

import csv
fp = open('samples.csv')
rdr = csv.DictReader(filter(lambda row: row[0]!='#', fp))
for row in rdr:
    print(row)
fp.close()



回答2:


Good question, and a good example of how Python's CSV library lacks important functionality, such as handling basic comments (not uncommon at the top of CSV files). While Dan Stowell's solution works for the specific case of the OP, it is limited in that # must appear as the first symbol. A more generic solution would be:

def decomment(csvfile):
    for row in csvfile:
        raw = row.split('#')[0].strip()
        if raw: yield raw

with open('dummy.csv') as csvfile:
    reader = csv.reader(decomment(csvfile))
    for row in reader:
        print(row)

As an example, the following dummy.csv file:

# comment
 # comment
a,b,c # comment
1,2,3
10,20,30
# comment

returns

['a', 'b', 'c']
['1', '2', '3']
['10', '20', '30']

Of course, this works just as well with csv.DictReader().




回答3:


Another way to read a CSV file is using pandas

Here's a sample code:

df = pd.read_csv('test.csv',
                 sep=',',     # field separator
                 comment='#', # comment
                 index_col=0, # number or label of index column
                 skipinitialspace=True,
                 skip_blank_lines=True,
                 error_bad_lines=False,
                 warn_bad_lines=True
                 ).sort_index()
print(df)
df.fillna('no value', inplace=True) # replace NaN with 'no value'
print(df)

For this csv file:

a,b,c,d,e
1,,16,,55#,,65##77
8,77,77,,16#86,18#
#This is a comment
13,19,25,28,82

we will get this output:

       b   c     d   e
a                     
1    NaN  16   NaN  55
8   77.0  77   NaN  16
13  19.0  25  28.0  82
           b   c         d   e
a                             
1   no value  16  no value  55
8         77  77  no value  16
13        19  25        28  82


来源:https://stackoverflow.com/questions/14158868/python-skip-comment-lines-marked-with-in-csv-dictreader

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!