dataframe

Apply FUN row-wise on data frame with integer and character variables

白昼怎懂夜的黑 提交于 2021-01-28 06:07:51
问题 A completely basic question - and forgive me if it is a duplicate. set.seed(1) df <- data.frame(id=c('a', 'a', 'b', 'b', 'a'), a=sample(1:10, size=5, replace=T), b=sample(1:10, size=5, replace=T), c=sample(1:10, size=5, replace=T)) Then, > df id a b c 1 a 3 9 3 2 a 4 10 2 3 b 6 7 7 4 b 10 7 4 5 a 3 1 8 To return the column name (a, b or c) with the largest value, and if this is in the id variable take the second highest, I use the below function. FUN <- function(r) { top <- names(r[,c('a', 'b

combining real and imag columns in dataframe into complex number to obtain magnitude using np.abs

眉间皱痕 提交于 2021-01-28 06:04:21
问题 I have a data frame that has complex numbers split into a real and an imaginary column. I want to add a column (2, actually, one for each channel) to the dataframe that computes the log magnitude: ` ch1_real ch1_imag ch2_real ch2_imag ch1_phase ch2_phase distance 79 0.011960 -0.003418 0.005127 -0.019530 -15.95 -75.290 0.0 78 -0.009766 -0.005371 -0.015870 0.010010 -151.20 147.800 1.0 343 0.002197 0.010990 0.003662 -0.013180 78.69 -74.480 2.0 80 -0.002686 0.010740 0.011960 0.013430 104.00 48

combining real and imag columns in dataframe into complex number to obtain magnitude using np.abs

不羁岁月 提交于 2021-01-28 05:56:21
问题 I have a data frame that has complex numbers split into a real and an imaginary column. I want to add a column (2, actually, one for each channel) to the dataframe that computes the log magnitude: ` ch1_real ch1_imag ch2_real ch2_imag ch1_phase ch2_phase distance 79 0.011960 -0.003418 0.005127 -0.019530 -15.95 -75.290 0.0 78 -0.009766 -0.005371 -0.015870 0.010010 -151.20 147.800 1.0 343 0.002197 0.010990 0.003662 -0.013180 78.69 -74.480 2.0 80 -0.002686 0.010740 0.011960 0.013430 104.00 48

Cumulative aggregates within tidyverse

◇◆丶佛笑我妖孽 提交于 2021-01-28 05:51:51
问题 say I have a tibble (or data.table ) which consists of two columns: a <- tibble(id = rep(c("A", "B"), each = 6), val = c(1, 0, 0, 1 ,0,1,0,0,0,1,1,1)) Furthermore I have a function called myfun which takes a numeric vector of arbitrary length as input and returns a single number. For example, you can think of myfun as being the standard deviation. Now I would like to create a third column to my tibble (called result) which contains the outputs of myfun applied to val cumulated and grouped

Combine value part of Tuple2 which is a map, into single map grouping by the key of Tuple2

↘锁芯ラ 提交于 2021-01-28 05:45:13
问题 I am doing this in Scala and Spark. I have and Dataset of Tuple2 as Dataset[(String, Map[String, String])] . Below is and example of the values in the Dataset . (A, {1->100, 2->200, 3->100}) (B, {1->400, 4->300, 5->900}) (C, {6->100, 4->200, 5->100}) (B, {1->500, 9->300, 11->900}) (C, {7->100, 8->200, 5->800}) If you notice, the key or first element of the Tuple can be repeated. Also, the corresponding map of the same Tuples' key can have duplicate keys in the map (second part of Tuple2). I

Data.frame from list of rows

て烟熏妆下的殇ゞ 提交于 2021-01-28 05:40:05
问题 A rather simple answer. Given a named named json-like list of data.frame rows, how would one transform this into a proper data.frame in a concise manner while keeping the column classes and row names intact. df_list <- lapply(1:10, function(x)list(a = 1, b = 'hello', c = 3 - 1i)) names(df_list) <- LETTERS[1:10] desired result data.frame(a = rep(1, 10), b = rep('hello', 10), c = rep(3 - 1i, 10)) 回答1: An option with unnest_wider library(dplyr) library(tidyr) tibble(col1 = df_list) %>% unnest

Stacking select columns as rows in pandas dataframe

☆樱花仙子☆ 提交于 2021-01-28 05:23:53
问题 Suppose I have df_in below: df_in = pd.DataFrame({'X': ['a', 'b', 'c'], 'A': [1, 0, 0], 'B': [1, 1, 0]}) df_in : +---+---+---+---+ | | X | A | B | +---+---+---+---+ | 0 | a | 1 | 1 | | 1 | b | 0 | 1 | | 2 | c | 0 | 0 | +---+---+---+---+ I want to achieve something like the following: df_out = pd.DataFrame({'X': ['a', 'a', 'b'], 'Y': ['A', 'B', 'B']}) df_out : +---+---+---+ | | X | Y | +---+---+---+ | 0 | a | A | | 1 | a | B | | 2 | b | B | +---+---+---+ I also have a list containing the

Sum one column values if other columns are matched

被刻印的时光 ゝ 提交于 2021-01-28 05:13:49
问题 I have a spark dataframe like this: word1 word2 co-occur ---- ----- ------- w1 w2 10 w2 w1 15 w2 w3 11 And my expected result is: word1 word2 co-occur ---- ----- ------- w1 w2 25 w2 w3 11 I tried dataframe's groupBy and aggregate functions but I couldn't come up with the solution. 回答1: You need a single column containing both words in sorted order, this column can then be used for the groupBy . You can create a new column with an array containing word1 and word as follows: df.withColumn(

Specify Multi-Level columns using pd.read_clipboard?

混江龙づ霸主 提交于 2021-01-28 05:02:13
问题 Here's some data from another question: main Meas1 Meas2 Meas3 Meas4 Meas5 sublvl Value Value Value Value Value count 7.000000 1.0 1.0 582.00 97.000000 mean 30 37.0 26.0 33.03 16.635350 I would like to read in this data in such a way that the first column is actually the index, and the first two rows are treated as multi-level columns where MeasX is the first level, and Value is the second level. How can I do this using pd.read_clipboard ? My pd.read_clipboard series: How do you handle column

ValueError: could not convert string to float: (pd.Series)

假如想象 提交于 2021-01-28 05:00:40
问题 I'm failing to execute 'lambda function' on the following code snippet below. My desired goal is to split columns( btts_x & btts_y ) respectively for further maths calculation. The lambda function is succeeding on first position column btts_x ( see btts_x_1 & btts_x_2 ); but fails on column btts_y as revealed in traceback re ValueError. I think I need to pass a re.sub() inside the lambda function, however I'm stuck on it and would appreciate help! Note: special character(s) \n\n in Team_x &