Transposing one column in python pandas with the simplest index possible

问题

I have the following data (data_current):

import pandas as pd
import numpy as np

data_current=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation','meditation'],'disease':['acne','hypertension', 'cancer','lupus']})
data_current

What I would like to do is to transpose one of the columns, so that instead of having multiple rows with same medicine and different diseases I have one row for each medicine with several columns for diseases. It is also important to keep index as simple as possible, i.e. 0,1,2... i.e. I don't want to assign 'medicines' as index column because I will merge it on some other key. So, I need to get data_needed

data_needed=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation'],'disease_1':['acne','hypertension','cancer'], 'disease_2':['np.nan','np.nan','lupus']})
data_needed

回答1:

Here's one to achieve the output

Firstly, groupby on medicine and get the disease as list

In [368]: md = (data_current.groupby('medicine')
                            .apply(lambda x: x['disease'].tolist())
                            .reset_index())

In [369]: md
Out[369]:
         medicine                0
0  fried tomatoes   [hypertension]
1       green tea           [acne]
2      meditation  [cancer, lupus]

Then convert the lists in column to separate columns

In [370]: dval = pd.DataFrame(md[0].tolist(), )

In [371]: dval
Out[371]:
              0      1
0  hypertension   None
1          acne   None
2        cancer  lupus

Now, you can concat -- md with dval

In [372]: md = md.drop(0, axis=1)

In [373]: data_final = pd.concat([md, dval], axis=1)

And, rename the columns as you want.

In [374]: data_final.columns = ['medicine', 'disease_1', 'disease_2']

In [375]: data_final
Out[375]:
         medicine     disease_1 disease_2
0  fried tomatoes  hypertension      None
1       green tea          acne      None
2      meditation        cancer     lupus

回答2:

I'm thinking you want a pivot table. Check this link for more information --> http://pandas.pydata.org/pandas-docs/stable/reshaping.html

Do you find the output from this acceptable?

data_current.pivot(index='medicine', columns='disease', values='disease')

回答3:

dc = data_current
dc['disease_header'] = dc.diseases.replace(
                       dict(zip(diseases, 
                                map(lambda v: 'diseases_%d' %v, range(len(diseases))
                           )))

This will give us:

In [548]: dc
Out[548]: 
        disease        medicine disease_header
0          acne       green tea     diseases_0
1  hypertension  fried tomatoes     diseases_1
2        cancer      meditation     diseases_2
3         lupus      meditation     diseases_3

And, finally we can pivot:

    In [547]: dc.pivot(columns='disease_header', index='medicine', values='disease').reset_index()
Out[547]: 
disease_header        medicine diseases_0    diseases_1 diseases_2 diseases_3
0               fried tomatoes        NaN  hypertension        NaN        NaN
1                    green tea       acne           NaN        NaN        NaN
2                   meditation        NaN           NaN     cancer      lupus

来源：https://stackoverflow.com/questions/29942167/transposing-one-column-in-python-pandas-with-the-simplest-index-possible

标签

python

pandas

transpose