binning

weighted numpy bincount for 2D IDs array and 1D weights

泄露秘密 提交于 2020-07-09 08:39:50
问题 I am using numpy_indexed for applying a vectorized numpy bincount, as follows: import numpy as np import numpy_indexed as npi rowidx, colidx = np.indices(index_tri.shape) (cols, rows), B = npi.count((index_tri.flatten(), rowidx.flatten())) where index_tri is the following matrix: index_tri = np.array([[ 0, 0, 0, 7, 1, 3], [ 1, 2, 2, 9, 8, 9], [ 3, 1, 1, 4, 9, 1], [ 5, 6, 6, 10, 10, 10], [ 7, 8, 9, 4, 3, 3], [ 3, 8, 6, 3, 8, 6], [ 4, 3, 3, 7, 8, 9], [10, 10, 10, 5, 6, 6], [ 4, 9, 1, 3, 1, 1],

weighted numpy bincount for 2D IDs array and 1D weights

我与影子孤独终老i 提交于 2020-07-09 08:38:17
问题 I am using numpy_indexed for applying a vectorized numpy bincount, as follows: import numpy as np import numpy_indexed as npi rowidx, colidx = np.indices(index_tri.shape) (cols, rows), B = npi.count((index_tri.flatten(), rowidx.flatten())) where index_tri is the following matrix: index_tri = np.array([[ 0, 0, 0, 7, 1, 3], [ 1, 2, 2, 9, 8, 9], [ 3, 1, 1, 4, 9, 1], [ 5, 6, 6, 10, 10, 10], [ 7, 8, 9, 4, 3, 3], [ 3, 8, 6, 3, 8, 6], [ 4, 3, 3, 7, 8, 9], [10, 10, 10, 5, 6, 6], [ 4, 9, 1, 3, 1, 1],

weighted numpy bincount for 2D IDs array and 1D weights

雨燕双飞 提交于 2020-07-09 08:37:30
问题 I am using numpy_indexed for applying a vectorized numpy bincount, as follows: import numpy as np import numpy_indexed as npi rowidx, colidx = np.indices(index_tri.shape) (cols, rows), B = npi.count((index_tri.flatten(), rowidx.flatten())) where index_tri is the following matrix: index_tri = np.array([[ 0, 0, 0, 7, 1, 3], [ 1, 2, 2, 9, 8, 9], [ 3, 1, 1, 4, 9, 1], [ 5, 6, 6, 10, 10, 10], [ 7, 8, 9, 4, 3, 3], [ 3, 8, 6, 3, 8, 6], [ 4, 3, 3, 7, 8, 9], [10, 10, 10, 5, 6, 6], [ 4, 9, 1, 3, 1, 1],

Pandas DataFrame: mean of column B values within column A windows

孤者浪人 提交于 2020-07-07 06:45:11
问题 If I have a pandas DataFrame in Python such as follows: import numpy as np import pandas as pd a = np.random.uniform(0,10,20) b = np.random.uniform(0,1,20) data = np.vstack([a,b]).T df = pd.DataFrame(data) df.columns = ['A','B'] df.sort_values(by=['A']) A B 5 0.057519 0.465408 14 1.610972 0.398077 3 1.725556 0.397708 17 1.734124 0.600723 11 1.944105 0.694152 19 3.265799 0.878538 13 3.352460 0.770505 10 3.865299 0.064723 16 4.137863 0.659662 12 5.597172 0.122269 7 5.990105 0.667533 6 6.410582

panda df iteration, binning of data based on time in milliseconds

帅比萌擦擦* 提交于 2020-04-17 22:54:08
问题 I have refocused my questions and have tried to be as specific as possible. below, I also include code I have used so far; (1) When pulling data from SQL, my time is in a mixed format that contains a letter which is hard to work with. To avoid issues with that, i tried to apply; df.time=pd.to_timedelta(df.time, unit='ms'), which is fine by dont know how to extract the hours and minutes. Example;2019.11.22D01:18:00.01000, i just need to have column 'time' in following format; '01:18:00.01000'.

After binning a column of a dataframe, how to make a new dataframe to count the number of elements in each bin?

不想你离开。 提交于 2020-03-20 07:29:27
问题 Say I have a dataframe, df : >>> df Age Score 19 1 20 2 24 3 19 2 24 3 24 1 24 3 20 1 19 1 20 3 22 2 22 1 I want to construct a new dataframe that bins Age and stores the total number of elements in each of the bins in different Score columns: Age Score 1 Score 2 Score 3 19-21 2 4 3 22-24 2 2 9 This is my way of doing it, which I feel is highly convoluted (meaning, it shouldn't be this difficult): import numpy as np import pandas as pd data = pd.DataFrame(columns=['Age', 'Score']) data['Age']

After binning a column of a dataframe, how to make a new dataframe to count the number of elements in each bin?

风格不统一 提交于 2020-03-20 07:27:47
问题 Say I have a dataframe, df : >>> df Age Score 19 1 20 2 24 3 19 2 24 3 24 1 24 3 20 1 19 1 20 3 22 2 22 1 I want to construct a new dataframe that bins Age and stores the total number of elements in each of the bins in different Score columns: Age Score 1 Score 2 Score 3 19-21 2 4 3 22-24 2 2 9 This is my way of doing it, which I feel is highly convoluted (meaning, it shouldn't be this difficult): import numpy as np import pandas as pd data = pd.DataFrame(columns=['Age', 'Score']) data['Age']

Two-dimensional np.digitize

醉酒当歌 提交于 2020-02-23 15:24:53
问题 I have two-dimensional data and I have a bunch of two-dimensional bins generated with scipy.stats.binned_statistic_2d . For each data point, I want the index of the bin it occupies. This is exactly what np.digitize is for, but as far as I can tell, it only deals with one-dimensional data. This stackexchange seems to have an answer, but that is totally generalized to n-dimensions. Is there a more straightforward solution for two dimensions? 回答1: You can already get the bin index of each