data-manipulation

How to change the resolution (or regrid) data in R

自作多情 提交于 2019-12-06 06:04:35
问题 I have a dataset consisting of lon, lat and a monthly mean variable (e.g. temperature or precipitation) covering 1961 to 1970. The dataset is at a resolution of 0.5 by 0.5 degree lon/lat and covers the whole globe and was downloaded as an .NC file which I have extracted the data in R by using: library(ncdf) f <- open.ncdf("D:/CRU/cru_ts3.21.1961.1970.tmp.dat.nc") A <- get.var.ncdf(nc=f,varid="tmp") B <- get.var.ncdf(nc=f,varid="lon") C <- get.var.ncdf(nc=f,varid="lat") D <- cbind(expand.grid

insert missing category for each group in pandas dataframe

允我心安 提交于 2019-12-06 04:24:02
I need to insert missing category for each group, here is an example: import pandas as pd import numpy as np df = pd.DataFrame({ "group":[1,1,1 ,2,2], "cat": ['a', 'b', 'c', 'a', 'c'] , "value": range(5), "value2": np.array(range(5))* 2}) df # test dataframe cat group value value2 a 1 0 0 b 1 1 2 c 1 2 4 a 2 3 6 c 2 4 8 say I have some categories = ['a', 'b', 'c', 'd'] . if cat column does not contain a category from the list, I would like to insert a row, for each group with value 0 . how to insert a row per group if category, so as to get all the categories for each group cat group value

Faster equivalent to group_by %>% expand in R

自闭症网瘾萝莉.ら 提交于 2019-12-06 03:44:18
I am trying to create a sequence of years for multiple IDs in R. My input table has a single row for each ID, and gives a Start_year. It looks like this: ID Start_year 01 1999 02 2004 03 2015 04 2007 etc... I need to create a table with multiple rows for each ID, showing each year from their Start_year up to 2015. I will then use this to join to another table. So in my example, ID1 would have 17 rows with the years 1999:2015. ID2 would have 12 rows 2004:2015, ID3 would have 1 row 2015, and ID4 would have 9 rows 2007:2015. For a subset of my data I can get this to work using the following code:

loop to create a new variable based on other cases in R (very basic)

别等时光非礼了梦想. 提交于 2019-12-06 00:42:19
I have a dataframe with three variables: ID , group , and nominated_ID . I want to know the group that nominated_ID belongs in. I'm imagining that for each case, we take nominated_ID , find the case where it is equal to ID , and then set the nominated_Group variable in the original case equal to the group variable in the matched case. (If there is no match, set it to NA) I wouldn't be surprised if this can be done without a loop, so I'm open-minded about the solution. Thanks so much for your help. Know that I did try to look for similar questions before posting. You can achieve this in one

Passing strings as arguments in dplyr verbs

霸气de小男生 提交于 2019-12-05 14:55:17
问题 I would like to be able to define arguments for dplyr verbs condition <- "dist > 50" and then use these strings in dplyr functions : require(ggplot2) ds <- cars ds1 <- ds %>% filter (eval(condition)) ds1 But it throws in error Error: filter condition does not evaluate to a logical vector. The code should evaluate as: ds1<- ds %>% filter(dist > 50) ds1 Resulting in : ds1 speed dist 1 14 60 2 14 80 3 15 54 4 18 56 5 18 76 6 18 84 7 19 68 8 20 52 9 20 56 10 20 64 11 22 66 12 23 54 13 24 70 14 24

How to Detect and Mark Change within a Column in Another Column

99封情书 提交于 2019-12-05 07:49:55
I'm trying to mark when a process starts and ends. The code needs to detect when the change begins and when it ends, marking it so in another column. Example data: date process 2007 0 2008 1 2009 1 2010 1 2011 1 2012 1 2013 0 Goal: date process Status 2007 0 NA 2008 1 Process_START 2009 1 NA 2010 1 NA 2011 1 NA 2012 1 Process_END 2013 0 NA Maybe by calculating diff and lagging it in both directions: dif <- diff(df1$process) df1$Status <- factor(c(NA, dif) - 2 * c(dif, NA), levels = -3:3) levels(df1$Status) <- c(rep(NA, 4), "Start", "End", "Start&End") # date process Status # 1 2007 0 <NA> # 2

R: merge two data frames when either of two criteria matches

隐身守侯 提交于 2019-12-04 20:52:39
Say I have two dataframes like the following: n = c(2, 3, 5, 5, 6, 7) s = c("aa", "bb", "cc", "dd", "ee", "ff") b = c(2, 4, 5, 4, 3, 2) df = data.frame(n, s, b) # n s b #1 2 aa 2 #2 3 bb 4 #3 5 cc 5 #4 5 dd 4 #5 6 ee 3 #6 7 ff 2 n2 = c(5, 6, 7, 6) s2 = c("aa", "bb", "cc", "ll") b2 = c("hh", "nn", "ff", "dd") df2 = data.frame(n2, s2, b2) # n2 s2 b2 #1 5 aa hh #2 6 bb nn #3 7 cc ff #4 6 ll dd I want to merge them to achieve the following result: #n s b n2 s2 b2 #2 aa 2 5 aa hh #3 bb 4 6 bb nn #5 cc 5 7 cc ff #5 dd 4 6 ll dd Basically, what I want to achieve is to merge the two dataframes

create OLAP cube in R programming language

和自甴很熟 提交于 2019-12-04 18:25:43
Hi I have following data Function SB `Country Region` `+1 Function` `+1 SB` `+1 Country Region` <chr> <chr> <chr> <chr> <chr> <chr> 1 ENG SB10 AMER ENG SB10 AMER 2 IT SB07 EMEA IT SB07 EMEA 3 QLT SB05 EMEA QLT SB05 EMEA 4 MFG SB07 EMEA MFG SB07 EMEA 5 MFG SB04 EMEA MFG SB05 EMEA 6 SCM SB08 EMEA SCM SB08 EMEA i want to create 3 dimensional OLAP cube in which column Function SB Country Region should be in row and +1 Function , +1 SB , +1 Country Region should be in column . output should be of following format `+1 Function` `+1 SB` `+1 Country Region` Function SB Country Region thank you Adding

Function to compute 3D gradient with unevenly spaced sample locations

不羁岁月 提交于 2019-12-04 17:04:06
I have experimental observations in a volume: import numpy as np # observations are not uniformly spaced x = np.random.normal(0, 1, 10) y = np.random.normal(5, 2, 10) z = np.random.normal(10, 3, 10) xx, yy, zz = np.meshgrid(x, y, z, indexing='ij') # fake temperatures at those coords tt = xx*2 + yy*2 + zz*2 # sample distances dx = np.diff(x) dy = np.diff(y) dz = np.diff(z) grad = np.gradient(tt, [dx, dy, dz]) # returns error This gives me the error: ValueError: operands could not be broadcast together with shapes (10,10,10) (3,9) (10,10,10) . EDIT: according to @jay-kominek in the comments

Generating a moving sum variable in R

∥☆過路亽.° 提交于 2019-12-04 13:20:10
问题 I suspect this is a somewhat simple question with multiple solutions, but I'm still a bit of a novice in R and an exhaustive search didn't yield answers that spoke well to what I'm wanting to do. I'm trying to create, for lack of better term, "moving sums" for a variable in my data frame. These would be 3-year and 5-year sums, lagged one year. So, a 5-year sum for an observation in 1986 would be the sum of all previous observations in 1981, 1982, 1983, 1984, and 1985. Here is an example of