dataframe | 易学教程

Reorder Multi-indexed dataframe columns based on reference

阅读更多关于 Reorder Multi-indexed dataframe columns based on reference

问题 I have a multi-indexed dataframe with names attached to the column levels. The data table looks something like this: (df1) TIME TMC 111N1 111P2 111N3 111P4 DATE EPOCH 0 143 113 103 NaN 1 183 NaN NaN NaN 2 NaN NaN NaN NaN 3 143 NaN NaN NaN I'd like to shuffle the columns around so that they match the order specified by the rows index of a reference dataframe (df2): A1 A2 A3 A4 A5 Name 111N3 PA PL er 0.75543 35 111P4 PA PL er 0.09413 35 111N1 PA PL er 4.21557 35 111P2 PA PL er 1.31989 35 i.e.

Coalesce columns and create another column to specify source

阅读更多关于 Coalesce columns and create another column to specify source

问题 I'm using dplyr::coalesce() to combine several columns into one. Originally, across columns, each row has only one column with actual value while the other columns are NA . Based on the coalescing, I want to create an additional column that will specify the source column from which the coalesced value was taken from. My attempt is inspired by existing functionality in other dplyr functions. For example, dplyr::bind_rows() has .id argument that specifies the source dataframe for each row in

Extract specific rows based on the set cut-off values in columns

阅读更多关于 Extract specific rows based on the set cut-off values in columns

问题 I have a TAB-delimited .txt file that looks like this. Gene_name A B C D E F Gene1 1 0 5 2 0 0 Gene2 4 45 0 0 32 1 Gene3 0 23 0 4 0 54 Gene4 12 0 6 8 7 4 Gene5 4 0 0 6 0 7 Gene6 0 6 8 0 0 5 Gene7 13 45 64 234 0 6 Gene8 11 6 0 7 7 9 Gene9 6 0 12 34 0 11 Gene10 23 4 6 7 89 0 I want to extract rows in which at least 3 columns have values > 0.. How do I do this using pandas? I am clueless about how to use conditions in .txt files. thanks very much! update: adding on to this question, how do I

Consecutive occurrence in a data frame

阅读更多关于 Consecutive occurrence in a data frame

问题 I have the above data frame containing different measurements. I would like to identify consecutive measurements (with the length size of more or equal with 6) of w taken at a time t . For example, in the case of id 1 from t3:t8 there are 6 consecutive w measures recorded. I would like to save the results into 2 data frames: df1: At least 6 consecutive measurements of w (per id) before the first occurrence of w; df2: From timing of the last occurrence of w (per id) there are less than 6

Ignoring NA when summing multiple columns with dplyr

阅读更多关于 Ignoring NA when summing multiple columns with dplyr

问题 I am summing across multiple columns, some that have NA. I am using dplyr::mutate and then writing out the arithmetic sum of the columns to get the sum. But the columns have NA and I would like to treat them as zero. I was able to get it to work with rowSums (see below), but now using mutate. Using mutate allows to make it more readable, but can also allow me to subtract columns. The example is below. require(dplyr) data(iris) iris <- tbl_df(iris) iris[2,3] <- NA iris <- mutate(iris, sum =

Groupby and aggregate using lambda functions

阅读更多关于 Groupby and aggregate using lambda functions

问题 I am trying to groupby-aggregate a dataframe using lambda functions that are being created programatically. This so I can simulate a one-hot encoder of the categories present in a column. Dataframe: df = pd.DataFrame(np.array([[10, 'A'], [10, 'B'], [20, 'A'],[30,'B']]), columns=['ID', 'category']) ID category 10 A 10 B 20 A 30 B Expected result: ID A B 10 1 1 20 1 0 30 0 1 What I am trying: one_hot_columns = ['A','B'] lambdas = [lambda x: 1 if x.eq(column).any() else 0 for column in one_hot

Read list of file names from web into R

阅读更多关于 Read list of file names from web into R

问题 I am trying to read a lot of csv files into R from a website. Threa are multiple years of daily (business days only) files. All of the files have the same data structure. I can sucessfully read one file using the following logic: # enter user credentials user <- "JohnDoe" password <- "SecretPassword" credentials <- paste(user,":",password,"@",sep="") web.site <- "downloads.theice.com/Settlement_Reports_CSV/Power/" # construct path to data path <- paste("https://", credentials, web.site, sep="

How to map one dataframe to another (python pandas)?

阅读更多关于 How to map one dataframe to another (python pandas)?

问题 Given these two dataframes, how do I get the intended output dataframe? The long way would be to loop through the rows of the dataframe with iloc and then use the map function after converting df2 to a dict to map the x and y to their score. This seems tedious and would take long to run on a large dataframe. I'm hoping there's a cleaner solution. df1: ID A B C 1 x x y 2 y x y 3 x y y df2: ID score_x score_y 1 20 30 2 15 17 3 18 22 output: ID A B C 1 20 20 30 2 17 15 17 3 18 22 22 Note: the

How to map one dataframe to another (python pandas)?

阅读更多关于 How to map one dataframe to another (python pandas)?

How to map one dataframe to another (python pandas)?

阅读更多关于 How to map one dataframe to another (python pandas)?