dataframe | 易学教程

pandas add new row based on sum/difference of other rows

阅读更多关于 pandas add new row based on sum/difference of other rows

问题 df have id measure t1 t2 t3 1 savings 1 2 5 1 income 10 15 14 1 misc 5 5 5 2 savings 3 6 12 2 income 4 20 80 2 misc 1 1 1 df want- add a new row to the measure for each id, called spend, calculated by subtracting measure=income - measure=savings, for each of the periods t1,t2,t3, for each id id measure t1 t2 t3 1 savings 1 2 5 1 income 10 15 14 1 misc 5 5 5 1 spend 9 13 9 2 savings 3 6 12 2 income 4 20 80 2 misc 1 1 1 2 spend 1 14 68 Trying: df.loc[df['Measure'] == 'spend'] = df.loc[df[

pandas add new row based on sum/difference of other rows

阅读更多关于 pandas add new row based on sum/difference of other rows

Add missing day rows in stock market data to maintain continuity in pandas dataframe

阅读更多关于 Add missing day rows in stock market data to maintain continuity in pandas dataframe

问题 So I have around 13 years of stock market data of daily low high open close. The problem is the markets are closed sometimes in between and hence Monday to Friday might not appear continuously sometimes. Look below Date Day Open High Low Close Adjusted Close 0 17-09-2007 Monday 6898 6977.2 6843 6897.1 6897.100098 1 18-09-2007 Tuesday 6921.15 7078.95 6883.6 7059.65 7059.649902 2 19-09-2007 Wednesday 7111 7419.35 7111 7401.85 7401.850098 3 20-09-2007 Thursday 7404.95 7462.9 7343.6 7390.15 7390

Add missing day rows in stock market data to maintain continuity in pandas dataframe

阅读更多关于 Add missing day rows in stock market data to maintain continuity in pandas dataframe

Test whether position x is between any start (i=1 to i=max) and end (i=1 to i=max) positions stored in lists

阅读更多关于 Test whether position x is between any start (i=1 to i=max) and end (i=1 to i=max) positions stored in lists

问题 I have a simple data frame with specifies start and end positions within lists. These start and end positions define i number of regions. Now I would like to test whether a given position lies within such a region and if yes I need to know in which region (i). Here is a simple example data frame: start <- list(c(5,10,15), c(5) ,c(6,11),c(6,11)) end <- list(c(7,11,17), c(10), c(8,12),c(8,12)) imax <- c(3,1,2,2) position <- c(11,6,9,8) example <- data.frame(start = I(start), end = I(end), imax

Test whether position x is between any start (i=1 to i=max) and end (i=1 to i=max) positions stored in lists

阅读更多关于 Test whether position x is between any start (i=1 to i=max) and end (i=1 to i=max) positions stored in lists

Show top n rows for every column in Pandas data frame

阅读更多关于 Show top n rows for every column in Pandas data frame

问题 I have the following sample CSV. ,cid1,cid2,cid3 rid1,0.1,0.4,0.3 rid2,1.0,0.1,0.5 rid3,0.2,0.5,0.1 rid4,0.3,0.4,0.8 rid5,0.2,0.3,0.7 rid6,0.9,0.2,0.1 rid7,0.4,0.8,0.9 rid8,0.6,0.5,0.7 rid9,0.3,0.9,0.4 I want to show n rows with the highest value for every column in the file. Ideally, I would like to get the following output (for n = 3). cid1 rid2 1.0 cid1 rid6 0.9 cid1 rid8 0.6 # Blank lines separating columns are optional. cid2 rid9 0.9 cid2 rid7 0.8 cid2 rid8 0.5 cid3 rid7 0.9 cid3 rid4 0

pandas dataframe.apply — converting hex string to int number

阅读更多关于 pandas dataframe.apply — converting hex string to int number

问题 I am very new to both python and pandas. I would like to know how to convert dataframe elements from hex string input to integer number, also I have followed the solution provided by: convert pandas dataframe column from hex string to int However, it is still not working. The following is my code: df = pd.read_csv(filename, delim_whitespace = True, header = None, usecols = range(7,23,2)) for i in range(num_frame): skipheader = lineNum[header_padding + i*2] data = df.iloc[skipheader:skipheader

Merge Maps in scala dataframe

阅读更多关于 Merge Maps in scala dataframe

问题 I have a dataframe with columns col1,col2,col3. col1,col2 are strings. col3 is a Map[String,String] defined below |-- col3: map (nullable = true) | |-- key: string | |-- value: string (valueContainsNull = true) I have grouped by col1,col2 and aggregated using collect_list to get an Array of Maps and stored in col4. df.groupBy($"col1", $"col2").agg(collect_list($"col3").as("col4")) |-- col4: array (nullable = true) | |-- element: map (containsNull = true) | | |-- key: string | | |-- value:

Creating new column based on multiple possible cell possibilities across several columns

阅读更多关于 Creating new column based on multiple possible cell possibilities across several columns

问题 data[, allkneePR := Reduce(`|`, lapply(.SD, `==`, "0082")), .SDcols=PR1:PR3] Hey, I'm trying to look for different diagnoses c("0082", "0083", "0084") across a range of rows and columns in data.table (the dataset is huge). If one of the values is "0082" or "0083" or "0084" in any of the columns PR1:PR3 I want another column that indicates true. Right now this works with the above code, but I am trying to add in multiple diagnoses, not just "0082". I tried the any() function which doesn't work