Why do my lists become strings after saving to csv and re-opening? Python

问题

I have a Dataframe in which each row contains a sentence followed by a list of part-of-speech tags, created with spaCy:

df.head()

   question             POS_tags            
0  A title for my ...   [DT, NN, IN,...]  
1  If one of the ...    [IN, CD, IN,...]

When I write the DataFrame to a csv file (encoding='utf-8') and re-open it, it looks like the data format has changed with the POS tags now appearing between quotes ' ' like this:

df.head()

   question             POS_tags                    
0  A title for my ...   ['DT', 'NN', 'IN',...]  
1  If one of the ...    ['IN', 'CD', 'IN',...]

When I now try to use the POS tags for some operations, it turns out they are no longer lists but have become strings that even include the quotation marks. They still look like lists but are not. This is clear when doing:

q = df['POS_tags']
q = list(q)
print(q)

Which results in:

["['DT', 'NN', 'IN']"]

What is going on here?

I either want the column 'POS_tags' to contain lists, even after saving to csv and re-opening. Or I want to do an operation on the column 'POS_tags' to have the same lists again that SpaCy originally created. Any advice how to do this?

回答1:

To preserve the exact structure of the DataFrame, an easy solution is to serialize the DF in pickle format with pd.to_pickle, instead of using csv, which will always throw away all information about data types, and will require manual reconstruction after re-import. One drawback of pickle is that it's not human-readable.

# Save to pickle
df.to_pickle('pickle-file.pkl')
# Save with compression
df.to_pickle('pickle-file.pkl.gz', compression='gzip')

# Load pickle from disk
df = pd.read_pickle('pickle-file.pkl')   # or...
df = pd.read_pickle('pickle-file.pkl.gz', compression='gzip')

Fixing lists after importing from CSV

If you've already imported from CSV, this should convert the POS_tags column from strings to python lists:

from ast import literal_eval
df['POS_tags'] = df['POS_tags'].apply(literal_eval)

来源：https://stackoverflow.com/questions/49580996/why-do-my-lists-become-strings-after-saving-to-csv-and-re-opening-python