I need to import a csv file that has 300+ columns, among these columns, only the first column needs to specified as a category, while the rest of the columns should be float
read it twice, first time get all the columns, second time, specify dtype when reading.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.to_csv('tmp.csv',index=False)
path = 'tmp.csv'
df =pd.read_csv(path)
type_dict = {}
for key in df.columns:
if key == 'A':
type_dict[key]='category'
else:
type_dict[key]=np.float32
df = pd.read_csv(path,dtype=type_dict)
print(df.dtypes)