How to do intersection of dataframes in pandas

女生的网名这么多〃 提交于 2021-02-11 14:53:31

问题


I have a dataframe like following :

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>Title</th>      <th>ASIN</th>      <th>State</th>      <th>SellerSKU</th>      <th>Quantity</th>      <th>FBAStock</th>      <th>QuantityToShip</th>    </tr>  </thead>  <tbody>    <tr>      <th>1</th>      <td>Daedal crafters- Pack of Two Gajra (Orange and...</td>      <td>B075T64ZWJ</td>      <td>WEST BENGAL</td>      <td>DC216</td>      <td>1</td>      <td>0</td>      <td>1</td>    </tr>    <tr>      <th>2</th>      <td>Daedal Dream Catchers - Intricate Web Design(B...</td>      <td>B06XBRRYVK</td>      <td>KARNATAKA</td>      <td>DDC63BB</td>      <td>1</td>      <td>24</td>      <td>0</td>    </tr>    <tr>      <th>3</th>      <td>Daedal Dream Catchers- Blue and White Four Rin...</td>      <td>B07428QBJ9</td>      <td>MAHARASHTRA</td>      <td>12-16RT-1H8B</td>      <td>1</td>      <td>4</td>      <td>0</td>    </tr>    <tr>      <th>4</th>      <td>Daedal dream catchers- Crescent wine DDC21</td>      <td>B01DI70P9W</td>      <td>UTTAR PRADESH</td>      <td>70-PK4Z-6VSP</td>      <td>1</td>      <td>10</td>      <td>0</td>    </tr>  </tbody></table>

The columns are :

Title   ASIN    State   SellerSKU   Quantity    FBAStock    QuantityToShip 

I have another dataframe which contains a subset of rows of the above dataframe but only the column "Quantity" is changed in this dataframe and has the columns

ASIN State Quantity

How do I intersect or merge this smaller dataframe with the first dataframe such that Quantity of smaller dataframe overwrites the original quantity of dataframe by matching the ASIN and State columns ?

If it can be done by merging , how to do so ? I'm not familiar with SQL merge words like 'inner' , 'left' ,etc...

Purpose :

I am modifying the original DF like this :

new = originalDF.groupby(['State' ,'ASIN' , 'Quantity']).size().reset_index().rename(columns= {0 : 'Count'})

new.Quantity = new[['Quantity' , 'Count']].apply(lambda tup : tup[0]*tup[1] , axis = 1)
new.drop(['Count'] , axis =1 , inplace=True)

Now i want to put the columns of originalDF to the new DF matching the columns ASIN and State of the new DF (Quantity column of new DF is what I want in the final dataframe).


回答1:


I believe want transform for new column by size per groups with multiple column Quantity by *=:

originalDF = pd.DataFrame({'State':list('aaabbb'),
                           'ASIN':list('cfcccc'),
                           'Quantity':[100] * 6})


originalDF['Quantity'] *= (originalDF.groupby(['State' ,'ASIN' , 'Quantity'])['State']
                                    .transform('size'))

print (originalDF)
  State ASIN  Quantity
0     a    c       200
1     a    f       100
2     a    c       200
3     b    c       300
4     b    c       300
5     b    c       300

Detail:

print ((originalDF.groupby(['State' ,'ASIN' , 'Quantity'])['State']
                                    .transform('size')))

0    2
1    1
2    2
3    3
4    3
5    3
Name: State, dtype: int64


来源:https://stackoverflow.com/questions/50680259/how-to-do-intersection-of-dataframes-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!