dataframe | 易学教程

Apply FUN row-wise on data frame with integer and character variables

阅读更多关于 Apply FUN row-wise on data frame with integer and character variables

问题 A completely basic question - and forgive me if it is a duplicate. set.seed(1) df <- data.frame(id=c('a', 'a', 'b', 'b', 'a'), a=sample(1:10, size=5, replace=T), b=sample(1:10, size=5, replace=T), c=sample(1:10, size=5, replace=T)) Then, > df id a b c 1 a 3 9 3 2 a 4 10 2 3 b 6 7 7 4 b 10 7 4 5 a 3 1 8 To return the column name (a, b or c) with the largest value, and if this is in the id variable take the second highest, I use the below function. FUN <- function(r) { top <- names(r[,c('a', 'b

combining real and imag columns in dataframe into complex number to obtain magnitude using np.abs

阅读更多关于 combining real and imag columns in dataframe into complex number to obtain magnitude using np.abs

问题 I have a data frame that has complex numbers split into a real and an imaginary column. I want to add a column (2, actually, one for each channel) to the dataframe that computes the log magnitude: ` ch1_real ch1_imag ch2_real ch2_imag ch1_phase ch2_phase distance 79 0.011960 -0.003418 0.005127 -0.019530 -15.95 -75.290 0.0 78 -0.009766 -0.005371 -0.015870 0.010010 -151.20 147.800 1.0 343 0.002197 0.010990 0.003662 -0.013180 78.69 -74.480 2.0 80 -0.002686 0.010740 0.011960 0.013430 104.00 48

combining real and imag columns in dataframe into complex number to obtain magnitude using np.abs

阅读更多关于 combining real and imag columns in dataframe into complex number to obtain magnitude using np.abs

Cumulative aggregates within tidyverse

阅读更多关于 Cumulative aggregates within tidyverse

问题 say I have a tibble (or data.table ) which consists of two columns: a <- tibble(id = rep(c("A", "B"), each = 6), val = c(1, 0, 0, 1 ,0,1,0,0,0,1,1,1)) Furthermore I have a function called myfun which takes a numeric vector of arbitrary length as input and returns a single number. For example, you can think of myfun as being the standard deviation. Now I would like to create a third column to my tibble (called result) which contains the outputs of myfun applied to val cumulated and grouped

Combine value part of Tuple2 which is a map, into single map grouping by the key of Tuple2

阅读更多关于 Combine value part of Tuple2 which is a map, into single map grouping by the key of Tuple2

问题 I am doing this in Scala and Spark. I have and Dataset of Tuple2 as Dataset[(String, Map[String, String])] . Below is and example of the values in the Dataset . (A, {1->100, 2->200, 3->100}) (B, {1->400, 4->300, 5->900}) (C, {6->100, 4->200, 5->100}) (B, {1->500, 9->300, 11->900}) (C, {7->100, 8->200, 5->800}) If you notice, the key or first element of the Tuple can be repeated. Also, the corresponding map of the same Tuples' key can have duplicate keys in the map (second part of Tuple2). I

Data.frame from list of rows

阅读更多关于 Data.frame from list of rows

问题 A rather simple answer. Given a named named json-like list of data.frame rows, how would one transform this into a proper data.frame in a concise manner while keeping the column classes and row names intact. df_list <- lapply(1:10, function(x)list(a = 1, b = 'hello', c = 3 - 1i)) names(df_list) <- LETTERS[1:10] desired result data.frame(a = rep(1, 10), b = rep('hello', 10), c = rep(3 - 1i, 10)) 回答1: An option with unnest_wider library(dplyr) library(tidyr) tibble(col1 = df_list) %>% unnest

Stacking select columns as rows in pandas dataframe

阅读更多关于 Stacking select columns as rows in pandas dataframe

问题 Suppose I have df_in below: df_in = pd.DataFrame({'X': ['a', 'b', 'c'], 'A': [1, 0, 0], 'B': [1, 1, 0]}) df_in : +---+---+---+---+ | | X | A | B | +---+---+---+---+ | 0 | a | 1 | 1 | | 1 | b | 0 | 1 | | 2 | c | 0 | 0 | +---+---+---+---+ I want to achieve something like the following: df_out = pd.DataFrame({'X': ['a', 'a', 'b'], 'Y': ['A', 'B', 'B']}) df_out : +---+---+---+ | | X | Y | +---+---+---+ | 0 | a | A | | 1 | a | B | | 2 | b | B | +---+---+---+ I also have a list containing the

Sum one column values if other columns are matched

阅读更多关于 Sum one column values if other columns are matched

问题 I have a spark dataframe like this: word1 word2 co-occur ---- ----- ------- w1 w2 10 w2 w1 15 w2 w3 11 And my expected result is: word1 word2 co-occur ---- ----- ------- w1 w2 25 w2 w3 11 I tried dataframe's groupBy and aggregate functions but I couldn't come up with the solution. 回答1: You need a single column containing both words in sorted order, this column can then be used for the groupBy . You can create a new column with an array containing word1 and word as follows: df.withColumn(

Specify Multi-Level columns using pd.read_clipboard?

阅读更多关于 Specify Multi-Level columns using pd.read_clipboard?

问题 Here's some data from another question: main Meas1 Meas2 Meas3 Meas4 Meas5 sublvl Value Value Value Value Value count 7.000000 1.0 1.0 582.00 97.000000 mean 30 37.0 26.0 33.03 16.635350 I would like to read in this data in such a way that the first column is actually the index, and the first two rows are treated as multi-level columns where MeasX is the first level, and Value is the second level. How can I do this using pd.read_clipboard ? My pd.read_clipboard series: How do you handle column

ValueError: could not convert string to float: (pd.Series)

阅读更多关于 ValueError: could not convert string to float: (pd.Series)

问题 I'm failing to execute 'lambda function' on the following code snippet below. My desired goal is to split columns( btts_x & btts_y ) respectively for further maths calculation. The lambda function is succeeding on first position column btts_x ( see btts_x_1 & btts_x_2 ); but fails on column btts_y as revealed in traceback re ValueError. I think I need to pass a re.sub() inside the lambda function, however I'm stuck on it and would appreciate help! Note: special character(s) \n\n in Team_x &