Setting column types while reading csv with pandas

余生长醉 提交于 2020-06-25 08:36:38

问题


Trying to read csv file into pandas dataframe with the following formatting

dp = pd.read_csv('products.csv', header = 0,  dtype = {'name': str,'review': str,
                                                      'rating': int,'word_count': dict}, engine = 'c')
print dp.shape
for col in dp.columns:
    print 'column', col,':', type(col[0])
print type(dp['rating'][0])
dp.head(3)

This is the output:

(183531, 4)
column name : <type 'str'>
column review : <type 'str'>
column rating : <type 'str'>
column word_count : <type 'str'>
<type 'numpy.int64'>

I can sort of understand that pandas might be finding it difficult to convert a string representation of a dictionary into a dictionary given this and this. But how can the content of the "rating" column be both str and numpy.int64???

By the way, tweaks like not specifying an engine or header do not change anything.

Thanks and regards


回答1:


In your loop you are doing:

for col in dp.columns:
    print 'column', col,':', type(col[0])

and you are correctly seeing str as the output everywhere because col[0] is the first letter of the name of the column, which is a string.

For example, if you run this loop:

for col in dp.columns:
    print 'column', col,':', col[0]

you will see the first letter of the string of each column name is printed out - this is what col[0] is.

Your loop only iterates on the column names, not on the series data.

What you really want is to check the type of each column's data (not its header or part of its header) in a loop.

So do this instead to get the types of the column data (non-header data):

for col in dp.columns:
    print 'column', col,':', type(dp[col][0])

This is similar to what you did when printing the type of the rating column separately.




回答2:


Use:

dp.info()

to see the datatypes of the columns. dp.columns refers to the column header names, which are strings.




回答3:


I think you should check this one first: Pandas: change data type of columns

when google pandas dataframe column type, it's on the top 5 answers.




回答4:


Just use read_table with delimiter as "," along with literal_eval as functions for converting values in the concerned columns.

recipes = pd.read_table("\\souravD\\PP_recipes.csv", sep=r',',
                      names=["id", "i", "name_tokens", "ingredient_tokens", "steps_tokens", "techniques","calorie_level","ingredient_ids"],
                      converters = {'name_tokens' : literal_eval,
                                    'ingredient_tokens' : literal_eval,
                                    'steps_tokens' : literal_eval,
                                    'techniques' : literal_eval,
                                    'ingredient_ids' : literal_eval},header=0)

image of recipes dataframe after changing datatype



来源:https://stackoverflow.com/questions/36195485/setting-column-types-while-reading-csv-with-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!