dataframe | 易学教程

Can you use loc to select a range of columns plus a column outside of the range?

阅读更多关于 Can you use loc to select a range of columns plus a column outside of the range?

问题 Suppose I want to select a range of columns from a dataframe: Call them 'column_1' through 'column_60'. I know I could use loc like this: df.loc[:, 'column_1':'column_60'] That will give me all rows in columns 1-60. But what if I wanted that range of columns plus 'column_81'. This doesn't work: df.loc[:, 'column_1':'column_60', 'column_81'] It throws a "Too many indexers" error. Is there another way to state this using loc? Or is loc even the best function to use in this case? Many thanks.

Julia DataFrames - How to do one-hot encoding?

阅读更多关于 Julia DataFrames - How to do one-hot encoding?

问题 I'm using Julia's DataFrames.jl package. In it, I have a dataframe with a columns containing a list of strings (e.g. ["Type A", "Type B", "Type D"]). How does one then performs a one-hot encoding? I wasn't able to find a pre-built function in the DataFrames.jl package. Here is an example of what I want to do: Original Dataframe col1 | col2 | 102 |[a] | 103 |[a,b] | 102 |[c,b] | After One-hot encoding col1 | a | b | c | 102 | 1 | 0 | 0 | 103 | 1 | 1 | 0 | 102 | 0 | 1 | 1 | 回答1: It is easy

Adding column to pandas DataFrame containing list of other columns' values

阅读更多关于 Adding column to pandas DataFrame containing list of other columns' values

问题 I have a DataFrame to which I need to add a column. The column needs to be a list of two values: Current table: lat long other_value 0 50 50 x 1 60 50 y 2 70 50 z 3 80 50 a Needed table: lat long other_value new_column 0 50 50 x [50, 50] 1 60 50 y [60, 50] 2 70 50 z [70, 50] 3 80 50 a [80, 50] I know this is super simple, but the documentation doesn't seem to cover this (at least not apparently). 回答1: One way is to use tolist() : >>> df['new_column'] = df[['lat', 'long']].values.tolist() >>>

Merge by lat/lon in R [duplicate]

阅读更多关于 Merge by lat/lon in R [duplicate]

问题 This question already has answers here : Geographic / geospatial distance between 2 lists of lat/lon points (coordinates) (3 answers) Closed 4 years ago . My question is similar to Merging two data frames, both with coordinates based on the closest location. I would like to merge two dataframes in R by latitude and longitude: Dataframe 1 structure(list(lat = c(54L, 55L, 51L, 54L, 53L, 50L, 47L, 51L, 49L, 54L), lon = c(14L, 8L, 15L, 7L, 6L, 5L, 13L, 5L, 13L, 11L ), PPP2000_40 = c(4606, 6575,

Best way to change class of column of data frame in R

阅读更多关于 Best way to change class of column of data frame in R

问题 Once again a seemingly easy problem, but ... I have this small data frame named “d1" in R: [,1] [,2] [1,] "SHY" "75000" [2,] "IGLIX" “25000" All I want to do is convert the characters in column 2 to numerics. After fiddling with this for an hour all I can figure out that works is: a <- data.frame(d1[,1]) b <- data.frame(as.numeric(d1[,2])) cbind(a, b) which gives me: d1...1. as.numeric.d1...2.. 1 SHY 75000 2 IGLIX 25000 Surely there is an easier way to do this? I tried “apply" unsuccessfully.

R: Sum until 0 is reached and then restart

阅读更多关于 R: Sum until 0 is reached and then restart

问题 Adding on to what's already being said or commented on this post: Cumulative sum until maximum reached, then repeat from zero in the next row I've a similar dataframe which has about 50k+ observations. This dataframe was being read from a csv file and is an outcome of several operations already performed on it. Pasting a sample here: Home Date Time Appliance Run value 679 2 1/21/2017 1:30:00 0 1 0 680 2 1/21/2017 1:45:00 0 1 0 681 2 1/21/2017 2:00:00 0 1 0 682 2 1/21/2017 2:15:00 0 1 0 683 2

Cox proportional hazard model

阅读更多关于 Cox proportional hazard model

问题 I am trying to run Cox proportional hazard model on a data of 4 groups. Here's the data: I am using this code: time_Allo_NHL<- c(28,32,49,84,357,933,1078,1183,1560,2114,2144) censor_Allo_NHL<- c(rep(1,5), rep(0,6)) time_Auto_NHL<- c(42,53,57,63,81,140,176,210,252,476,524,1037) censor_Auto_NHL<- c(rep(1,7), rep(0,1), rep(1,1), rep(0,1), rep(1,1), rep(0,1)) time_Allo_HOD<- c(2,4,72,77,79) censor_Allo_HOD<- c(rep(1,5)) time_Auto_HOD<- c(30,36,41,52,62,108,132,180,307,406,446,484,748,1290,1345)

How to stack a dataframe in R [duplicate]

阅读更多关于 How to stack a dataframe in R [duplicate]

问题 This question already has answers here : Reshaping data.frame from wide to long format (9 answers) Closed 2 years ago . I have a data frame that I would like to stack in R so that I end up with three columns. Below cis some example data in its current format. > dput(df) structure(list(Day = c("d1", "d2", "d3", "d4", "d5", "d6", "d7", "d8", "d9", "d10"), A1 = c(14L, 24L, 22L, NA, NA, NA, NA, NA, NA, NA), A2 = c(9L, 15L, 34L, 2L, 12L, 34L, 234L, 34L, NA, NA ), A3 = c(3L, 4L, 19L, 76L, 34L, 34L,

How to stack a dataframe in R [duplicate]

阅读更多关于 How to stack a dataframe in R [duplicate]

Calculate e.g. a mean in a list with multi-column data.frames

阅读更多关于 Calculate e.g. a mean in a list with multi-column data.frames

问题 I have a list of several data.frames. Each data.frame has several columns. By using mean(mylist$first_dataframe$a I can get the mean for a in this one data.frame. However I do not know how to calculate over all the data.frames stored in my list or how for specific data.frames. I could use a loop but I was told that apply() and its variations are better I tried using several solutions I found via search but somehow it just doesn't work. I assume I need to use unlist() Could you provide an