subset

subsetting list in R

匆匆过客 提交于 2019-12-19 09:28:13
问题 I'm using Mcomp package in R which contains dataset for forecasting. The data is organized as yearly, quarterly and monthly frequencies. I can easily subset this into a list but cannot further subset using additional condition. ##Subset monthly data library("Mcomp") mon <- subset(M3,"monthly") Each element in the mon list has following structure, as an example mon$N1500 has the following struture $ N1500:List of 9 ..$ st : chr "M99" ..$ type : chr "MICRO" ..$ period : chr "MONTHLY" ..$

Return rows establishing a “closest value to” in R

可紊 提交于 2019-12-19 08:27:08
问题 I have a data frame with different IDs and I want to make a subgroup in which: for each ID I will only obtain one row with the closest value to 0.5 in variable Y. This is my data frame: df <- data.frame(ID=c("DB1", "BD1", "DB2", "DB2", "DB3", "DB3", "DB4", "DB4", "DB4"), X=c(0.04, 0.10, 0.10, 0.20, 0.02, 0.30, 0.01, 0.20, 0.30), Y=c(0.34, 0.49, 0.51, 0.53, 0.48, 0.49, 0.49, 0.50, 1.0) ) This is what I want to get ID X Y DB1 0.10 0.49 DB2 0.10 0.51 DB3 0.30 0.49 DB4 0.20 0.50 I know I can add

R: Efficiently locating time series segments with maximal cross-correlation to input segment?

六眼飞鱼酱① 提交于 2019-12-19 05:53:14
问题 I have a long numerical time series data of approximately 200,000 rows (lets call it Z ). In a loop, I subset x (about 30) consecutive rows from Z at a time and treat them as the query point q . I want to locate within Z the y (~300) most correlated time series segments of length x (most correlated with q ). What is an efficient way to accomplish this? 回答1: The code below finds the 300 segments you are looking for and runs in 8 seconds on my none too powerful Windows laptop, so it should be

How to subset a list based on the length of its elements in R

蓝咒 提交于 2019-12-19 04:41:17
问题 In R I have a function ( coordinates from the package sp ) which looks up 11 fields of data for each IP addresss you supply. I have a list of IP's called ip.addresses : > head(ip.addresses) [1] "128.177.90.11" "71.179.12.143" "66.31.55.111" "98.204.243.187" "67.231.207.9" "67.61.248.12" Note: Those or any other IP's can be used to reproduce this problem. So I apply the function to that object with sapply : ips.info <- sapply(ip.addresses, ip2coordinates) and get a list called ips.info as my

How to subset a list based on the length of its elements in R

♀尐吖头ヾ 提交于 2019-12-19 04:41:07
问题 In R I have a function ( coordinates from the package sp ) which looks up 11 fields of data for each IP addresss you supply. I have a list of IP's called ip.addresses : > head(ip.addresses) [1] "128.177.90.11" "71.179.12.143" "66.31.55.111" "98.204.243.187" "67.231.207.9" "67.61.248.12" Note: Those or any other IP's can be used to reproduce this problem. So I apply the function to that object with sapply : ips.info <- sapply(ip.addresses, ip2coordinates) and get a list called ips.info as my

Regression by subset in R [duplicate]

泪湿孤枕 提交于 2019-12-19 04:12:29
问题 This question already has answers here : Linear Regression and group by in R (10 answers) Closed 3 years ago . I am new to R and am trying to run a linear regression on multiple subsets ("Cases") of data in a single file. I have 50 different cases, so I don't want to have to run 50 different regressions...be nice to automate this. I have found and experimented with the ddply method, but this, for some reason, returns the same coefficients to me for each case. Code I'm using is as follows:

R remove stopwords from a character vector using %in%

大憨熊 提交于 2019-12-19 03:44:23
问题 I have a data frame with strings that I'd like to remove stop words from. I'm trying to avoid using the tm package as it's a large data set and tm seems to run a bit slowly. I am using the tm stopword dictionary. library(plyr) library(tm) stopWords <- stopwords("en") class(stopWords) df1 <- data.frame(id = seq(1,5,1), string1 = NA) head(df1) df1$string1[1] <- "This string is a string." df1$string1[2] <- "This string is a slightly longer string." df1$string1[3] <- "This string is an even

Pandas style object with multi-index

ⅰ亾dé卋堺 提交于 2019-12-18 15:17:52
问题 I am formatting a pandas dataframe with styler to highlight columns and format numbers. I also want to apply multi-index for more clear, pleasant and easy to read. Since I apply Styler to subset of columns it does not work work with the multi-index. Example: arrays = [np.hstack([['One']*2, ['Two']*2]) , ['A', 'B', 'C', 'D']] columns = pd.MultiIndex.from_arrays(arrays) data = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD')) data.columns = columns import seaborn as sns cm = sns.light

Find elements not in smaller character vector list but in big list

半世苍凉 提交于 2019-12-18 13:33:10
问题 I have the two big and small list. I want to know which of the elements in big list are not in smaller list. The list consists of property ([1] "character" "vector" "data.frameRowLabels" [4] "SuperClassMethod" Here is small example and error I am getting A <- c("A", "B", "C", "D") B <- c("A", "B", "C") new <- A[!B] Error in !B : invalid argument type The expected output is new <- c("D") 回答1: Look at help("%in%") - there's an example all the way at the bottom of that page that addresses this

How to remove groups of observation with dplyr::filter()

别说谁变了你拦得住时间么 提交于 2019-12-18 12:19:02
问题 For the following data ds <- read.table(header = TRUE, text =" id year attend 1 2007 1 1 2008 1 1 2009 1 1 2010 1 1 2011 1 8 2007 3 8 2008 NA 8 2009 3 8 2010 NA 8 2011 3 9 2007 2 9 2008 3 9 2009 3 9 2010 5 9 2011 5 10 2007 4 10 2008 4 10 2009 2 10 2010 NA 10 2011 NA ") ds<- ds %>% dplyr::mutate(time=year-2000) print(ds) How would I write a dplyr::filter() command to keep only the ids that don't have a single NA? So only subjects with ids 1 and 9 should stay after the filter. 回答1: Use filter