Why do my lists become strings after saving to csv and re-opening? Python

▼魔方 西西 提交于 2020-01-14 05:34:07

问题


I have a Dataframe in which each row contains a sentence followed by a list of part-of-speech tags, created with spaCy:

df.head()

   question             POS_tags            
0  A title for my ...   [DT, NN, IN,...]  
1  If one of the ...    [IN, CD, IN,...]  

When I write the DataFrame to a csv file (encoding='utf-8') and re-open it, it looks like the data format has changed with the POS tags now appearing between quotes ' ' like this:

df.head()

   question             POS_tags                    
0  A title for my ...   ['DT', 'NN', 'IN',...]  
1  If one of the ...    ['IN', 'CD', 'IN',...]  

When I now try to use the POS tags for some operations, it turns out they are no longer lists but have become strings that even include the quotation marks. They still look like lists but are not. This is clear when doing:

q = df['POS_tags']
q = list(q)
print(q)

Which results in:

["['DT', 'NN', 'IN']"]

What is going on here?

I either want the column 'POS_tags' to contain lists, even after saving to csv and re-opening. Or I want to do an operation on the column 'POS_tags' to have the same lists again that SpaCy originally created. Any advice how to do this?


回答1:


To preserve the exact structure of the DataFrame, an easy solution is to serialize the DF in pickle format with pd.to_pickle, instead of using csv, which will always throw away all information about data types, and will require manual reconstruction after re-import. One drawback of pickle is that it's not human-readable.

# Save to pickle
df.to_pickle('pickle-file.pkl')
# Save with compression
df.to_pickle('pickle-file.pkl.gz', compression='gzip')

# Load pickle from disk
df = pd.read_pickle('pickle-file.pkl')   # or...
df = pd.read_pickle('pickle-file.pkl.gz', compression='gzip')

Fixing lists after importing from CSV

If you've already imported from CSV, this should convert the POS_tags column from strings to python lists:

from ast import literal_eval
df['POS_tags'] = df['POS_tags'].apply(literal_eval)


来源:https://stackoverflow.com/questions/49580996/why-do-my-lists-become-strings-after-saving-to-csv-and-re-opening-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!