Pandas calculate result dataframe from a dataframe of multiple trades at same timestamp

China☆狼群 提交于 2020-12-14 07:57:15

问题


I have a dataframe containing trades with duplicated timestamps and buy and sell orders divided over several rows. In my example the total order amount is the sum over the same timestamp for that particular stock. I have created a simplified dataframe to show how the data looks like. I would like to end up with an dataframe with results from the trades and a trading ID for each trades. All trades are long positions, ie buy and try to sell at a higher price. The ID column for the desired output df2 is answered in this thread Create ID column in a pandas dataframe

import pandas as pd
from datetime import datetime
import numpy as np
     string_date =['2018-01-01 01:00:00',
             '2018-01-01 01:00:00',
             '2018-01-01 01:00:00',
             '2018-01-01 01:00:00',
             '2018-01-01 02:00:00',
             '2018-01-01 03:00:00',
             '2018-01-01 03:00:00',
             '2018-01-01 03:00:00',
             '2018-01-01 04:00:00',
             '2018-01-01 04:00:00',
             '2018-01-01 04:00:00',
             '2018-01-01 07:00:00',
             '2018-01-01 07:00:00',
             '2018-01-01 07:00:00',
             '2018-01-01 08:00:00',
             '2018-01-01 08:00:00',
             '2018-01-01 08:00:00',
             '2018-02-01 12:00:00',
            ]



data ={'stock': ['A','A','A','A','B','A','A','A','C','C','C','B','B','B','C','C','C','B'],
                    'deal': ['buy', 'buy', 'buy','buy','buy','sell','sell','sell','buy','buy','buy','sell','sell','sell','sell','sell','sell','buy'],
                    'amount':[1,2,3,4,10,8,1,1,3,2,5,2,2,6,3,3,4,5],
                    'price':[10,10,10,10,2,20,20,20,3,3,3,1,1,1,2,2,2,11]}

df = pd.DataFrame(data, index =string_date)
df
Out[245]: 
                    stock  deal  amount  price
2018-01-01 01:00:00     A   buy       1     10
2018-01-01 01:00:00     A   buy       2     10
2018-01-01 01:00:00     A   buy       3     10
2018-01-01 01:00:00     A   buy       4     10
2018-01-01 02:00:00     B   buy      10      2
2018-01-01 03:00:00     A  sell       8     20
2018-01-01 03:00:00     A  sell       1     20
2018-01-01 03:00:00     A  sell       1     20
2018-01-01 04:00:00     C   buy       3      3
2018-01-01 04:00:00     C   buy       2      3
2018-01-01 04:00:00     C   buy       5      3
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:00:00     B  sell       6      1
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:00:00     C  sell       4      2
2018-02-01 12:00:00     B   buy       5     11

One desired output:

string_date2 =['2018-01-01 01:00:00',
               '2018-01-01 02:00:00',
               '2018-01-01 03:00:00',
               '2018-01-01 04:00:00',
               '2018-01-01 07:00:00',
               '2018-01-01 08:00:00',
               '2018-01-02 12:00:00',
               ]

data2 ={'stock': ['A','B', 'A', 'C', 'B','C','B'],
                    'deal': ['buy', 'buy','sell','buy','sell','sell','buy'],
                    'amount':[10,10,10,10,10,10,5],
                    'price':[10,2,20,3,1,2,11],
                    'ID': ['1', '2','1','3','2','3','4']
                    }

df2 = pd.DataFrame(data2, index =string_date2) 

df2
Out[226]: 
                    stock  deal  amount  price ID
2018-01-01 01:00:00     A   buy      10     10  1
2018-01-01 02:00:00     B   buy      10      2  2
2018-01-01 03:00:00     A  sell      10     20  1
2018-01-01 04:00:00     C   buy      10      3  3
2018-01-01 07:00:00     B  sell      10      1  2
2018-01-01 08:00:00     C  sell      10      2  3
2018-01-02 12:00:00     B   buy       5     11  4

Any ideas?


回答1:


This solution assumes a 'Long Only' portfolio where short sales are not allowed. Once a position is opened for a given stock, the transaction is assigned a new trade ID. Increasing the position in that stock results in the same trade ID, as well as any sell transactions reducing the size of the position (including the final sale where the position quantity is reduced to zero). A subsequent buy transaction in that same stock results in a new trade ID.

In order to maintain consistent trade identifiers with a growing log of transactions, I created a class TradeTracker to track and assign trade identifiers for each transaction.

import numpy as np
import pandas as pd

# Create sample dataframe.    
dates = [
    '2018-01-01 01:00:00',
    '2018-01-01 01:01:00',
    '2018-01-01 01:02:00',
    '2018-01-01 01:03:00',
    '2018-01-01 02:00:00',
    '2018-01-01 03:00:00',
    '2018-01-01 03:01:00',
    '2018-01-01 03:03:00',
    '2018-01-01 04:00:00',
    '2018-01-01 04:01:00',
    '2018-01-01 04:02:00',
    '2018-01-01 07:00:00',
    '2018-01-01 07:01:00',
    '2018-01-01 07:02:00',
    '2018-01-01 08:00:00',
    '2018-01-01 08:01:00',
    '2018-01-01 08:02:00',
    '2018-02-01 12:00:00',
    '2018-03-01 12:00:00',
]
data = {
    'stock': ['A','A','A','A','B','A','A','A','C','C','C','B','B','B','C','C','C','B','A'],
    'deal': ['buy', 'buy', 'buy', 'buy', 'buy', 'sell', 'sell', 'sell', 'buy', 'buy', 'buy',
             'sell', 'sell', 'sell', 'sell', 'sell', 'sell', 'buy', 'buy'],
    'amount': [1, 2, 3, 4, 10, 8, 1, 1, 3, 2, 5, 2, 2, 6, 3, 3, 4, 5, 10],
    'price': [10, 10, 10, 10, 2, 20, 20, 20, 3, 3, 3, 1, 1, 1, 2, 2, 2, 11, 15]
}
df = pd.DataFrame(data, index=pd.to_datetime(dates))
>>> df
                    stock  deal  amount  price
2018-01-01 01:00:00     A   buy       1     10
2018-01-01 01:01:00     A   buy       2     10
2018-01-01 01:02:00     A   buy       3     10
2018-01-01 01:03:00     A   buy       4     10
2018-01-01 02:00:00     B   buy      10      2
2018-01-01 03:00:00     A  sell       8     20
2018-01-01 03:01:00     A  sell       1     20
2018-01-01 03:03:00     A  sell       1     20
2018-01-01 04:00:00     C   buy       3      3
2018-01-01 04:01:00     C   buy       2      3
2018-01-01 04:02:00     C   buy       5      3
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:01:00     B  sell       2      1
2018-01-01 07:02:00     B  sell       6      1
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:01:00     C  sell       3      2
2018-01-01 08:02:00     C  sell       4      2
2018-02-01 12:00:00     B   buy       5     11
2018-03-01 12:00:00     A   buy      10     15

# Add `position` column representing the cumulative buys and sells for a given stock.
df['position'] = (
    df
    .assign(temp_amount=np.where(df['deal'].eq('buy'), df['amount'], -df['amount']))
    .groupby(['stock'])['temp_amount']
    .cumsum()
)

# Create a class to track trade identifiers and instantiate it.
class TradeTracker():
    def __init__(self):
        self.trade_counter = 0
        self.trade_ids = {}
    
    def get_trade_id(self, stock, position):
        if position == 0:
            trade_id = self.trade_ids.pop(stock)
        elif stock not in self.trade_ids:
            self.trade_counter += 1
            self.trade_ids[stock] = trade_id = self.trade_counter
        else:
            trade_id = self.trade_ids[stock]
        return trade_id

trade_tracker = TradeTracker()

# Add a `trade_id` column using our custom class in a list comprehension.
df['trade_id'] = [trade_tracker.get_trade_id(stock, position) 
                  for stock, position in df[['stock', 'position']].to_numpy()]

>>> df
                    stock  deal  amount  price  position  trade_id
2018-01-01 01:00:00     A   buy       1     10         1         1
2018-01-01 01:01:00     A   buy       2     10         3         1
2018-01-01 01:02:00     A   buy       3     10         6         1
2018-01-01 01:03:00     A   buy       4     10        10         1
2018-01-01 02:00:00     B   buy      10      2        10         2
2018-01-01 03:00:00     A  sell       8     20         2         1
2018-01-01 03:01:00     A  sell       1     20         1         1
2018-01-01 03:03:00     A  sell       1     20         0         1
2018-01-01 04:00:00     C   buy       3      3         3         3
2018-01-01 04:01:00     C   buy       2      3         5         3
2018-01-01 04:02:00     C   buy       5      3        10         3
2018-01-01 07:00:00     B  sell       2      1         8         2
2018-01-01 07:01:00     B  sell       2      1         6         2
2018-01-01 07:02:00     B  sell       6      1         0         2
2018-01-01 08:00:00     C  sell       3      2         7         3
2018-01-01 08:01:00     C  sell       3      2         4         3
2018-01-01 08:02:00     C  sell       4      2         0         3
2018-02-01 12:00:00     B   buy       5     11         5         4
2018-03-01 12:00:00     A   buy      10     15        10         5



回答2:


Changed your string_date to this:

In [2295]: string_date =['2018-01-01 01:00:00',
      ...:              '2018-01-01 01:00:00',
      ...:              '2018-01-01 01:00:00',
      ...:              '2018-01-01 01:00:00',
      ...:              '2018-01-01 02:00:00',
      ...:              '2018-01-01 03:00:00',
      ...:              '2018-01-01 03:00:00',
      ...:              '2018-01-01 03:00:00',
      ...:              '2018-01-01 04:00:00',
      ...:              '2018-01-01 04:00:00',
      ...:              '2018-01-01 04:00:00',
      ...:              '2018-01-01 07:00:00',
      ...:              '2018-01-01 07:00:00',
      ...:              '2018-01-01 07:00:00',
      ...:              '2018-01-01 08:00:00',
      ...:              '2018-01-01 08:00:00',
      ...:              '2018-01-01 08:00:00',
      ...:              '2018-02-01 12:00:00',
      ...:             ]
      ...: 

So df now is:

In [2297]: df
Out[2297]: 
                    stock  deal  amount  price
2018-01-01 01:00:00     A   buy       1     10
2018-01-01 01:00:00     A   buy       2     10
2018-01-01 01:00:00     A   buy       3     10
2018-01-01 01:00:00     A   buy       4     10
2018-01-01 02:00:00     B   buy      10      2
2018-01-01 03:00:00     A  sell       8     20
2018-01-01 03:00:00     A  sell       1     20
2018-01-01 03:00:00     A  sell       1     20
2018-01-01 04:00:00     C   buy       3      3
2018-01-01 04:00:00     C   buy       2      3
2018-01-01 04:00:00     C   buy       5      3
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:00:00     B  sell       6      1
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:00:00     C  sell       4      2
2018-02-01 12:00:00     B   buy       5     11

You can use Groupby.agg:

In [2302]: x = df.reset_index().groupby(['index', 'stock', 'deal'], as_index=False).agg({'amount': 'sum', 'price': 'max'}).set_index('index')

In [2303]: m = x['deal'] == 'buy'

In [2305]: x['ID'] = m.cumsum().where(m)

In [2307]: x['ID'] = x.groupby('stock')['ID'].ffill()

In [2308]: x
Out[2308]: 
                     stock  deal  amount  price   ID
index                                              
2018-01-01 01:00:00     A   buy      10     10  1.0
2018-01-01 02:00:00     B   buy      10      2  2.0
2018-01-01 03:00:00     A  sell      10     20  1.0
2018-01-01 04:00:00     C   buy      10      3  3.0
2018-01-01 07:00:00     B  sell      10      1  2.0
2018-01-01 08:00:00     C  sell      10      2  3.0
2018-02-01 12:00:00     B   buy       5     11  4.0


来源:https://stackoverflow.com/questions/64872407/pandas-calculate-result-dataframe-from-a-dataframe-of-multiple-trades-at-same-ti

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!