dataframe

pandas add new row based on sum/difference of other rows

我与影子孤独终老i 提交于 2021-02-08 09:24:45
问题 df have id measure t1 t2 t3 1 savings 1 2 5 1 income 10 15 14 1 misc 5 5 5 2 savings 3 6 12 2 income 4 20 80 2 misc 1 1 1 df want- add a new row to the measure for each id, called spend, calculated by subtracting measure=income - measure=savings, for each of the periods t1,t2,t3, for each id id measure t1 t2 t3 1 savings 1 2 5 1 income 10 15 14 1 misc 5 5 5 1 spend 9 13 9 2 savings 3 6 12 2 income 4 20 80 2 misc 1 1 1 2 spend 1 14 68 Trying: df.loc[df['Measure'] == 'spend'] = df.loc[df[

pandas add new row based on sum/difference of other rows

泄露秘密 提交于 2021-02-08 09:24:07
问题 df have id measure t1 t2 t3 1 savings 1 2 5 1 income 10 15 14 1 misc 5 5 5 2 savings 3 6 12 2 income 4 20 80 2 misc 1 1 1 df want- add a new row to the measure for each id, called spend, calculated by subtracting measure=income - measure=savings, for each of the periods t1,t2,t3, for each id id measure t1 t2 t3 1 savings 1 2 5 1 income 10 15 14 1 misc 5 5 5 1 spend 9 13 9 2 savings 3 6 12 2 income 4 20 80 2 misc 1 1 1 2 spend 1 14 68 Trying: df.loc[df['Measure'] == 'spend'] = df.loc[df[

Add missing day rows in stock market data to maintain continuity in pandas dataframe

大兔子大兔子 提交于 2021-02-08 09:15:59
问题 So I have around 13 years of stock market data of daily low high open close. The problem is the markets are closed sometimes in between and hence Monday to Friday might not appear continuously sometimes. Look below Date Day Open High Low Close Adjusted Close 0 17-09-2007 Monday 6898 6977.2 6843 6897.1 6897.100098 1 18-09-2007 Tuesday 6921.15 7078.95 6883.6 7059.65 7059.649902 2 19-09-2007 Wednesday 7111 7419.35 7111 7401.85 7401.850098 3 20-09-2007 Thursday 7404.95 7462.9 7343.6 7390.15 7390

Add missing day rows in stock market data to maintain continuity in pandas dataframe

删除回忆录丶 提交于 2021-02-08 09:14:16
问题 So I have around 13 years of stock market data of daily low high open close. The problem is the markets are closed sometimes in between and hence Monday to Friday might not appear continuously sometimes. Look below Date Day Open High Low Close Adjusted Close 0 17-09-2007 Monday 6898 6977.2 6843 6897.1 6897.100098 1 18-09-2007 Tuesday 6921.15 7078.95 6883.6 7059.65 7059.649902 2 19-09-2007 Wednesday 7111 7419.35 7111 7401.85 7401.850098 3 20-09-2007 Thursday 7404.95 7462.9 7343.6 7390.15 7390

Test whether position x is between any start (i=1 to i=max) and end (i=1 to i=max) positions stored in lists

独自空忆成欢 提交于 2021-02-08 09:10:17
问题 I have a simple data frame with specifies start and end positions within lists. These start and end positions define i number of regions. Now I would like to test whether a given position lies within such a region and if yes I need to know in which region (i). Here is a simple example data frame: start <- list(c(5,10,15), c(5) ,c(6,11),c(6,11)) end <- list(c(7,11,17), c(10), c(8,12),c(8,12)) imax <- c(3,1,2,2) position <- c(11,6,9,8) example <- data.frame(start = I(start), end = I(end), imax

Test whether position x is between any start (i=1 to i=max) and end (i=1 to i=max) positions stored in lists

眉间皱痕 提交于 2021-02-08 09:09:48
问题 I have a simple data frame with specifies start and end positions within lists. These start and end positions define i number of regions. Now I would like to test whether a given position lies within such a region and if yes I need to know in which region (i). Here is a simple example data frame: start <- list(c(5,10,15), c(5) ,c(6,11),c(6,11)) end <- list(c(7,11,17), c(10), c(8,12),c(8,12)) imax <- c(3,1,2,2) position <- c(11,6,9,8) example <- data.frame(start = I(start), end = I(end), imax

Show top n rows for every column in Pandas data frame

本秂侑毒 提交于 2021-02-08 09:04:37
问题 I have the following sample CSV. ,cid1,cid2,cid3 rid1,0.1,0.4,0.3 rid2,1.0,0.1,0.5 rid3,0.2,0.5,0.1 rid4,0.3,0.4,0.8 rid5,0.2,0.3,0.7 rid6,0.9,0.2,0.1 rid7,0.4,0.8,0.9 rid8,0.6,0.5,0.7 rid9,0.3,0.9,0.4 I want to show n rows with the highest value for every column in the file. Ideally, I would like to get the following output (for n = 3). cid1 rid2 1.0 cid1 rid6 0.9 cid1 rid8 0.6 # Blank lines separating columns are optional. cid2 rid9 0.9 cid2 rid7 0.8 cid2 rid8 0.5 cid3 rid7 0.9 cid3 rid4 0

pandas dataframe.apply — converting hex string to int number

泪湿孤枕 提交于 2021-02-08 08:42:13
问题 I am very new to both python and pandas. I would like to know how to convert dataframe elements from hex string input to integer number, also I have followed the solution provided by: convert pandas dataframe column from hex string to int However, it is still not working. The following is my code: df = pd.read_csv(filename, delim_whitespace = True, header = None, usecols = range(7,23,2)) for i in range(num_frame): skipheader = lineNum[header_padding + i*2] data = df.iloc[skipheader:skipheader

Merge Maps in scala dataframe

南楼画角 提交于 2021-02-08 08:32:30
问题 I have a dataframe with columns col1,col2,col3. col1,col2 are strings. col3 is a Map[String,String] defined below |-- col3: map (nullable = true) | |-- key: string | |-- value: string (valueContainsNull = true) I have grouped by col1,col2 and aggregated using collect_list to get an Array of Maps and stored in col4. df.groupBy($"col1", $"col2").agg(collect_list($"col3").as("col4")) |-- col4: array (nullable = true) | |-- element: map (containsNull = true) | | |-- key: string | | |-- value:

Creating new column based on multiple possible cell possibilities across several columns

人盡茶涼 提交于 2021-02-08 08:25:50
问题 data[, allkneePR := Reduce(`|`, lapply(.SD, `==`, "0082")), .SDcols=PR1:PR3] Hey, I'm trying to look for different diagnoses c("0082", "0083", "0084") across a range of rows and columns in data.table (the dataset is huge). If one of the values is "0082" or "0083" or "0084" in any of the columns PR1:PR3 I want another column that indicates true. Right now this works with the above code, but I am trying to add in multiple diagnoses, not just "0082". I tried the any() function which doesn't work