In Python 2.7.11 & Pandas 0.18.1:
If we have the following csv file:
YEAR,MONTH,ID
2011,JAN,1
2011,FEB,1
2011,MAR,1
Is there any way to read it as a Pandas data frame and convert the MONTH column into strings like this?
YEAR,MONTH,ID
2011,1,1
2011,2,1
2011,3,1
Some pandas functions such as "dt.strftime('%b')" doesn't seem to work. Could someone enlighten?
I guess the easiest and one of the fastest method would be to create a mapping dict and map like as follows:
In [2]: df
Out[2]:
   YEAR MONTH  ID
0  2011   JAN   1
1  2011   FEB   1
2  2011   MAR   1
In [3]: d = {'JAN':1, 'FEB':2, 'MAR':3, 'APR':4, }
In [4]: df.MONTH = df.MONTH.map(d)
In [5]: df
Out[5]:
   YEAR  MONTH  ID
0  2011      1   1
1  2011      2   1
2  2011      3   1
you may want to use df.MONTH = df.MONTH.str.upper().map(d) if not all MONTH values are in upper case
another more slower but more robust method:
In [11]: pd.to_datetime(df.MONTH, format='%b').dt.month
Out[11]:
0    1
1    2
2    3
Name: MONTH, dtype: int64
UPDATE: we can create a mapping automatically (thanks to @Quetzalcoatl)
import calendar
d = dict((v,k) for k,v in enumerate(calendar.month_abbr))
or alternatively (using only Pandas):
d = dict(zip(range(1,13), pd.date_range('2000-01-01', freq='M', periods=12).strftime('%b')))
来源:https://stackoverflow.com/questions/42684530/convert-a-column-in-a-python-pandas-from-string-month-into-int