pandas

Explode dict from Pandas column

风流意气都作罢 提交于 2021-02-10 04:49:39
问题 I have the following df: movie_id rating_all 0 tt7653254 [{'age': 'all', 'avg_rating': 8.1, 'count': 109326}, {'age': '<18', 'avg_rating': 8.8, 'count': 318}, {'age': '18-29', 'avg_rating': 8.3, 'count': 29740}, {'age': '30-44', 'avg_rating': 8.0, 'count': 33012}, {'age': '45+', 'avg_rating': 7.7, 'count': 7875}] 1 tt8579674 [{'age': 'all', 'avg_rating': 8.6, 'count': 9420}, {'age': '<18', 'avg_rating': 9.1, 'count': 35}, {'age': '18-29', 'avg_rating': 8.7, 'count': 2437}, {'age': '30-44',

Randomly selecting from Pandas groups with equal probability — unexpected behavior

好久不见. 提交于 2021-02-10 04:45:42
问题 I have 12 unique groups that I am trying to randomly sample from, each with a different number of observations. I want to randomly sample from the entire population (dataframe) with each group having the same probability of being selected from. The simplest example of this would be a dataframe with 2 groups. groups probability 0 a 0.25 1 a 0.25 2 b 0.5 using np.random.choice(df['groups'], p=df['probability'], size=100) Each iteration will now have a 50% chance of selecting group a and a 50%

create a data frame based on the minimum value of two data frames pandas python

纵饮孤独 提交于 2021-02-10 04:31:19
问题 I have two data frames with different sizes. I want to replace the values of the first data frame by the values of the second data frame only if the values of the second data frame are less than the values of the first data frame. In other words I want to find the minimum values of the two data frames for each position for matching indices of the two dataframes. df1: A B C 0 0 12 7 1 15 20 0 2 7 0 3 df2: A B C 1 4 25 8 2 0 0 5 result df: A B C 0 0 12 7 1 4 20 0 2 0 0 3 回答1: Use: pd.concat(

create a data frame based on the minimum value of two data frames pandas python

馋奶兔 提交于 2021-02-10 04:30:50
问题 I have two data frames with different sizes. I want to replace the values of the first data frame by the values of the second data frame only if the values of the second data frame are less than the values of the first data frame. In other words I want to find the minimum values of the two data frames for each position for matching indices of the two dataframes. df1: A B C 0 0 12 7 1 15 20 0 2 7 0 3 df2: A B C 1 4 25 8 2 0 0 5 result df: A B C 0 0 12 7 1 4 20 0 2 0 0 3 回答1: Use: pd.concat(

Pandas dataframe max and min value

江枫思渺然 提交于 2021-02-10 04:14:15
问题 I have a pandas dataframe that looks like the following: +-----+---+---+--+--+ | | A | B | | | +-----+---+---+--+--+ | 288 | 1 | 4 | | | +-----+---+---+--+--+ | 245 | 2 | 3 | | | +-----+---+---+--+--+ | 543 | 3 | 6 | | | +-----+---+---+--+--+ | 867 | 1 | 9 | | | +-----+---+---+--+--+ | 345 | 2 | 7 | | | +-----+---+---+--+--+ | 122 | 3 | 8 | | | +-----+---+---+--+--+ | 233 | 1 | 1 | | | +-----+---+---+--+--+ | 346 | 2 | 6 | | | +-----+---+---+--+--+ | 765 | 3 | 3 | | | +-----+---+---+--+--+

Pandas GroupBy : How to get top n values based on a column

ぐ巨炮叔叔 提交于 2021-02-10 03:56:34
问题 forgive me if this is a basic question but i am new to pandas. I have a dataframe with with a column A and i would like to get the top n rows based on the count in Column A. For instance the raw data looks like A B C x 12 ere x 34 bfhg z 6 bgn z 8 rty y 567 hmmu,,u x 545 fghfgj x 44 zxcbv Note that this is just a small sample of the data that i am actually working with. So if we look at Column A, value x appears 4 times,y appears 2 times and z appears 1 time. How can i get the top n values

Pandas: pairwise multiplication of columns based on column name

陌路散爱 提交于 2021-02-10 03:41:24
问题 I have the following DataFrame >>> df = pd.DataFrame({'ap1_X':[1,2,3,4], 'as1_X':[1,2,3,4], 'ap2_X':[2,2,2,2], 'as2_X':[3,3,3,3]}) >>> df ap1_X as1_X ap2_X as2_X 0 1 1 2 3 1 2 2 2 3 2 3 3 2 3 3 4 4 2 3 I would like to multiply ap1_X with as1_X and put that value in as1_X , similarly for ap2_X with as2_X . The common identifier here is the number that comes after the ap or as . The final DataFrame should look like this >>> df ap1_X as1_X ap2_X as2_X 0 1 1 2 6 1 2 4 2 6 2 3 9 2 6 3 4 16 2 6 I

TypingError: Failed in nopython mode pipeline (step: nopython frontend)

爷,独闯天下 提交于 2021-02-10 03:39:48
问题 I am trying to write my first function using numba jit, I have a pandas dataframe that I need to iterate through and find the root mean square for each 350 points, since the for loop of python is quite slow I decided to try numba jit, the code is: @jit(nopython=True) def find_rms(data, length): res = [] for i in range(length, len(data)): interval = np.array(data[i-length:i]) interval =np.power(interval, 2) sum = interval.sum() resI = sum/length resI = np.sqrt(res) res.appennd(resI) return res

multiindex selecting in pandas

扶醉桌前 提交于 2021-02-10 03:26:58
问题 I have problems understanding multiindex selecting in pandas. 0 1 2 3 first second third C one mean 3 4 2 7 std 4 1 7 7 two mean 3 1 4 7 std 5 6 7 0 three mean 7 0 2 5 std 7 3 7 1 H one mean 2 4 3 3 std 5 5 3 5 two mean 5 7 0 6 std 0 1 0 2 three mean 5 2 5 1 std 9 0 4 6 V one mean 3 7 3 9 std 8 7 9 3 two mean 1 9 9 0 std 1 1 5 1 three mean 3 1 0 6 std 6 2 7 4 I need to create new rows: - 'CH' : ['CH',:,'mean'] => ['C',:,'mean'] - ['H',:,'mean'] - 'CH' : ['CH',:,'std'] => (['C',:,'std']**2 + [

multiindex selecting in pandas

一个人想着一个人 提交于 2021-02-10 03:15:59
问题 I have problems understanding multiindex selecting in pandas. 0 1 2 3 first second third C one mean 3 4 2 7 std 4 1 7 7 two mean 3 1 4 7 std 5 6 7 0 three mean 7 0 2 5 std 7 3 7 1 H one mean 2 4 3 3 std 5 5 3 5 two mean 5 7 0 6 std 0 1 0 2 three mean 5 2 5 1 std 9 0 4 6 V one mean 3 7 3 9 std 8 7 9 3 two mean 1 9 9 0 std 1 1 5 1 three mean 3 1 0 6 std 6 2 7 4 I need to create new rows: - 'CH' : ['CH',:,'mean'] => ['C',:,'mean'] - ['H',:,'mean'] - 'CH' : ['CH',:,'std'] => (['C',:,'std']**2 + [