data-manipulation | 易学教程

How to change the resolution (or regrid) data in R

阅读更多关于 How to change the resolution (or regrid) data in R

问题 I have a dataset consisting of lon, lat and a monthly mean variable (e.g. temperature or precipitation) covering 1961 to 1970. The dataset is at a resolution of 0.5 by 0.5 degree lon/lat and covers the whole globe and was downloaded as an .NC file which I have extracted the data in R by using: library(ncdf) f <- open.ncdf("D:/CRU/cru_ts3.21.1961.1970.tmp.dat.nc") A <- get.var.ncdf(nc=f,varid="tmp") B <- get.var.ncdf(nc=f,varid="lon") C <- get.var.ncdf(nc=f,varid="lat") D <- cbind(expand.grid

insert missing category for each group in pandas dataframe

阅读更多关于 insert missing category for each group in pandas dataframe

I need to insert missing category for each group, here is an example: import pandas as pd import numpy as np df = pd.DataFrame({ "group":[1,1,1 ,2,2], "cat": ['a', 'b', 'c', 'a', 'c'] , "value": range(5), "value2": np.array(range(5))* 2}) df # test dataframe cat group value value2 a 1 0 0 b 1 1 2 c 1 2 4 a 2 3 6 c 2 4 8 say I have some categories = ['a', 'b', 'c', 'd'] . if cat column does not contain a category from the list, I would like to insert a row, for each group with value 0 . how to insert a row per group if category, so as to get all the categories for each group cat group value

Faster equivalent to group_by %>% expand in R

阅读更多关于 Faster equivalent to group_by %>% expand in R

I am trying to create a sequence of years for multiple IDs in R. My input table has a single row for each ID, and gives a Start_year. It looks like this: ID Start_year 01 1999 02 2004 03 2015 04 2007 etc... I need to create a table with multiple rows for each ID, showing each year from their Start_year up to 2015. I will then use this to join to another table. So in my example, ID1 would have 17 rows with the years 1999:2015. ID2 would have 12 rows 2004:2015, ID3 would have 1 row 2015, and ID4 would have 9 rows 2007:2015. For a subset of my data I can get this to work using the following code:

loop to create a new variable based on other cases in R (very basic)

阅读更多关于 loop to create a new variable based on other cases in R (very basic)

I have a dataframe with three variables: ID , group , and nominated_ID . I want to know the group that nominated_ID belongs in. I'm imagining that for each case, we take nominated_ID , find the case where it is equal to ID , and then set the nominated_Group variable in the original case equal to the group variable in the matched case. (If there is no match, set it to NA) I wouldn't be surprised if this can be done without a loop, so I'm open-minded about the solution. Thanks so much for your help. Know that I did try to look for similar questions before posting. You can achieve this in one

Passing strings as arguments in dplyr verbs

阅读更多关于 Passing strings as arguments in dplyr verbs

问题 I would like to be able to define arguments for dplyr verbs condition <- "dist > 50" and then use these strings in dplyr functions : require(ggplot2) ds <- cars ds1 <- ds %>% filter (eval(condition)) ds1 But it throws in error Error: filter condition does not evaluate to a logical vector. The code should evaluate as: ds1<- ds %>% filter(dist > 50) ds1 Resulting in : ds1 speed dist 1 14 60 2 14 80 3 15 54 4 18 56 5 18 76 6 18 84 7 19 68 8 20 52 9 20 56 10 20 64 11 22 66 12 23 54 13 24 70 14 24

How to Detect and Mark Change within a Column in Another Column

阅读更多关于 How to Detect and Mark Change within a Column in Another Column

I'm trying to mark when a process starts and ends. The code needs to detect when the change begins and when it ends, marking it so in another column. Example data: date process 2007 0 2008 1 2009 1 2010 1 2011 1 2012 1 2013 0 Goal: date process Status 2007 0 NA 2008 1 Process_START 2009 1 NA 2010 1 NA 2011 1 NA 2012 1 Process_END 2013 0 NA Maybe by calculating diff and lagging it in both directions: dif <- diff(df1$process) df1$Status <- factor(c(NA, dif) - 2 * c(dif, NA), levels = -3:3) levels(df1$Status) <- c(rep(NA, 4), "Start", "End", "Start&End") # date process Status # 1 2007 0 <NA> # 2

R: merge two data frames when either of two criteria matches

阅读更多关于 R: merge two data frames when either of two criteria matches

Say I have two dataframes like the following: n = c(2, 3, 5, 5, 6, 7) s = c("aa", "bb", "cc", "dd", "ee", "ff") b = c(2, 4, 5, 4, 3, 2) df = data.frame(n, s, b) # n s b #1 2 aa 2 #2 3 bb 4 #3 5 cc 5 #4 5 dd 4 #5 6 ee 3 #6 7 ff 2 n2 = c(5, 6, 7, 6) s2 = c("aa", "bb", "cc", "ll") b2 = c("hh", "nn", "ff", "dd") df2 = data.frame(n2, s2, b2) # n2 s2 b2 #1 5 aa hh #2 6 bb nn #3 7 cc ff #4 6 ll dd I want to merge them to achieve the following result: #n s b n2 s2 b2 #2 aa 2 5 aa hh #3 bb 4 6 bb nn #5 cc 5 7 cc ff #5 dd 4 6 ll dd Basically, what I want to achieve is to merge the two dataframes

create OLAP cube in R programming language

阅读更多关于 create OLAP cube in R programming language

Hi I have following data Function SB `Country Region` `+1 Function` `+1 SB` `+1 Country Region` <chr> <chr> <chr> <chr> <chr> <chr> 1 ENG SB10 AMER ENG SB10 AMER 2 IT SB07 EMEA IT SB07 EMEA 3 QLT SB05 EMEA QLT SB05 EMEA 4 MFG SB07 EMEA MFG SB07 EMEA 5 MFG SB04 EMEA MFG SB05 EMEA 6 SCM SB08 EMEA SCM SB08 EMEA i want to create 3 dimensional OLAP cube in which column Function SB Country Region should be in row and +1 Function , +1 SB , +1 Country Region should be in column . output should be of following format `+1 Function` `+1 SB` `+1 Country Region` Function SB Country Region thank you Adding

Function to compute 3D gradient with unevenly spaced sample locations

阅读更多关于 Function to compute 3D gradient with unevenly spaced sample locations

I have experimental observations in a volume: import numpy as np # observations are not uniformly spaced x = np.random.normal(0, 1, 10) y = np.random.normal(5, 2, 10) z = np.random.normal(10, 3, 10) xx, yy, zz = np.meshgrid(x, y, z, indexing='ij') # fake temperatures at those coords tt = xx*2 + yy*2 + zz*2 # sample distances dx = np.diff(x) dy = np.diff(y) dz = np.diff(z) grad = np.gradient(tt, [dx, dy, dz]) # returns error This gives me the error: ValueError: operands could not be broadcast together with shapes (10,10,10) (3,9) (10,10,10) . EDIT: according to @jay-kominek in the comments

Generating a moving sum variable in R

阅读更多关于 Generating a moving sum variable in R

问题 I suspect this is a somewhat simple question with multiple solutions, but I'm still a bit of a novice in R and an exhaustive search didn't yield answers that spoke well to what I'm wanting to do. I'm trying to create, for lack of better term, "moving sums" for a variable in my data frame. These would be 3-year and 5-year sums, lagged one year. So, a 5-year sum for an observation in 1986 would be the sum of all previous observations in 1981, 1982, 1983, 1984, and 1985. Here is an example of