dataframe

Identify start and end time of a value per id in a data frame

淺唱寂寞╮ 提交于 2021-01-29 13:33:46
问题 This relates to my previous question on identifying the occurrence of a value in a data frame per id. This time I am trying to identify consecutive measurements per id with a length of 4 or more. Ex. Below an example of the consecutive occurrence of w with the length of 4 id t1 t2 t3 t4 t5 t6 1 s s w w w w For the same id an example of the consecutive occurrence of w with the length of 4 as well 4 non-w occurrences after the last w id t3 t4 t5 t6 t7 t8 t9 t10 1 w w w w r s s s I would like to

Get average by months of a time series (all Januaries, all Februaries, etc)

爱⌒轻易说出口 提交于 2021-01-29 13:33:34
问题 I have a time series of daily data from 1992 to 2018. So far I have converted to monthly data but I also need to obtain anomalies per month and I need to obtain the average of each month over all years to finish with 12 averages. One for each month from each individual average of each year. I have done the following using Pandas: df = pd.read_excel(filename, "Daily", index_col=0) df = df.resample("M").mean() I have been trying to find how out to obtain now the average of each month every the

How to create (correctly) a NumPy array from Pandas DF

 ̄綄美尐妖づ 提交于 2021-01-29 13:19:23
问题 I'm trying to create a NumPy array for the "label" column from a pandas data-frame. My df: label vector 0 0 1:0.044509422 2:-0.03092437 3:0.054365806 4:-... 1 0 1:-0.007471546 2:-0.062329583 3:0.012314787 4... 2 0 1:-0.009525825 2:0.0028720177 3:0.0029517233 ... 3 1 1:-0.0040618754 2:-0.03754585 3:0.008025528 4... 4 0 1:0.039150625 2:-0.08689039 3:0.09603256 4:0.... ... ... ... 59996 1 1:0.01846487 2:-0.012882819 3:0.035375785 4:-... 59997 1 1:0.01435293 2:-0.00683616 3:0.009475072 4:-0...

problem with pandas efficiency when working with dates

冷暖自知 提交于 2021-01-29 13:12:28
问题 I have a piece of code that runs but that is not scaling well with bigger dataset AT ALL. We are talking about minutes with big datasets. Here is a toy dataset to illustrate the issue: Id Supplier Avg_NetAmountSpent Date Quantity NetAmount 0 185781 SAXON 2953.500000 2020-05-10 401 9294 1 185781 SAXON 2953.500000 2020-05-09 3502 8890 2 185781 SAXON 2953.500000 2020-05-08 7380 8381 3 185781 SAXON 2953.500000 2020-05-08 3384 1734 4 185781 SAXON 2953.500000 2020-05-08 4826 4910 612 467809 SAXONIS

Secondary axis in ploty for R and Shiny

戏子无情 提交于 2021-01-29 13:00:38
问题 EDIT: Regarding my question 2, it seems it is a bug and hasn't been fixed yet as it is not their top priority at the moment. Someone asked to try katex instead of latex, but not sure how that works https://github.com/plotly/plotly.js/issues/559 I have attached an output for a code- https://i.stack.imgur.com/u65if.jpg. I am trying to plot two y axis and a common x axis using plotly. The issues I am facing are: I would like the primary and the secondary y axis ticks to share the same gridline.

Summing rows based on keyword within index

纵然是瞬间 提交于 2021-01-29 12:56:29
问题 I am trying to sum multiple rows together based on a keyword that is part of the index - but it is not the entire index. For example, the index could look like Count 1234_Banana_Green 43 4321_Banana_Yellow 34 2244_Banana_Brown 23 12345_Apple_Red 45 I would like to sum all of the rows that have the same "keyword" within them and create a total "banana" row. Is there a way to do this without searching for the keyword "banana"? For my purposes, this keyword changes every time and I would like to

Pandas DataFrame merge, ends up with more rows

耗尽温柔 提交于 2021-01-29 12:41:01
问题 I am doing a_df = a_df.merge(b_df, how='left', on=['col1', col2]) After this, a_df actually has more rows than before the operation. How is this possible? They both have millions of rows, so it's hard for me to narrow down the problem. Probably I am missing something about how left merge works. 回答1: Problem is with duplicates, so instead left join merge return all combination of dupplicates pairs of both DataFrame s, check sample below: a_df = pd.DataFrame({'A':list('abcdef'), 'B':[4,5,4,5,5

Removing Empty Dataframes with pandas

烈酒焚心 提交于 2021-01-29 12:36:50
问题 I have written the following code to use regex to request pages, and look for strings that resemble interest rates. The overall code works; however, it is creating multiple empty dataframes and I can't get the code to drop the empty frames to clean up my output. I have been trying to use .dropna, .drop, and .empty to try and deprecate the dataframes but the output remains unchanged and keeps printing the empty dataframes with the information I have already. Is there an method I am not aware

Getting descriptive statistics with (analytic) weighting using describe() in python

梦想的初衷 提交于 2021-01-29 12:34:53
问题 I was trying to translate code from Stata to Python The original code in Stata: by year, sort : summarize age [aweight = wt] Normally a simply describe() function will do dataframe.groupby("year")["age"].describe() But I could not find a way to translate the aweight option into the language of python i.e. to give descriptive statistics of a dataset under analytic/ variance weighting. codes to generate the dataset in python: dataframe = {'year': [2016,2016,2020, 2020], 'age': [41,65, 35,28],

(in R) Add metadata from a vector to a set of columns of a dataframe?

做~自己de王妃 提交于 2021-01-29 12:32:59
问题 I would like to use values from a character vector that I created as label attributes for a set of variables in a dataframe. I thought this simple solution should work, yet it does not: x <- rep("text", time=19) %>% paste(1:19, sep = " ") #character vector with names of label attributes I want attr(mydataframe[var_names], "label") <- x #var_names and x have the same length Thanks for your help! 回答1: Hmisc supports column labels. Using the built in data frame anscombe having 8 columns: library