subset

How to write the remaining data frame in R after randomly subseting the data

淺唱寂寞╮ 提交于 2019-12-11 10:00:11
问题 I took a random sample from a data frame. But I don't know how to get the remaining data frame. df <- data.frame(x=rep(1:3,each=2),y=6:1,z=letters[1:6]) #select 3 random rows df[sample(nrow(df),3)] What I want is to get the remaining data frame with the other 3 rows. 回答1: sample sets a random seed each time you run it, thus if you want to reproduce its results you will either need to set.seed or save its results in a variable. Addressing your question, you simply need to add - before your

Extract large Matlab dataset subsets

五迷三道 提交于 2019-12-11 09:54:08
问题 Referencing and assigning a subset of a matlab dataset appears to be extremely inefficient and possibly scales like rows^2 Example: alldata is a large dataset of mixed data - say 150,000 rows by 25 columns (integer, boolean and string). The format for the dataset is: 'format', '%s%u%u%u%u%u%s%s%s%s%s%s%s%u%u%u%u%s%u%s%s%u%s%s%s%s%u%s%u%s%s%s%u%s' I then convert 2 type integer cols into type boolean the following subset assignment: somedata = alldata(1:m,:) takes >7 sec for m = 10,000 and

Random subset containing at least one instance of each factor

旧时模样 提交于 2019-12-11 08:47:28
问题 Let's define a data.frame df with 3 columns and 10 rows. The third column is the class and the two first some variables. var1 <- rnorm(10) var2 <- rnorm(10,2) class<- as.factor(c(1,2,3,1,2,1,2,1,3,3)) df <- data.frame(var1=var1,var2=var2,class=class) How to randomly subset df in two subsets so that sub.df1 and sub.df2 have at least one instance of each class? 回答1: This works: set.seed(123) partition <- function(x, n = 2) sample(c(1:n, sample(1:n, length(x) - n, TRUE))) split(df, as.integer

Subset a df using partial match with multiple criteria

北城余情 提交于 2019-12-11 08:37:49
问题 This is the dataset: company <- c("Coca-Cola Inc.", "DF, CocaCola", "COCA-COLA", "PepsiCo Inc.", "Beverages Distribution") brand <- c("Coca-Cola Zero","N/A", "Coca-Cola", "Pepsi", "soft drink") vol <- c("2456","1653", "19", "2766", "167") data <-data.frame(company, brand, vol) data Which results in: company brand vol 1 Coca-Cola Inc. Coca-Cola Zero 2456 2 DF, CocaCola N/A 1653 3 COCA-COLA CocaCola 19 4 PepsiCo Inc. Pepsi 2766 5 Beverages Distribution soft drink 167 Let's say, this is imported

selecting and identifying a subset of elements based on criteria

蓝咒 提交于 2019-12-11 07:45:17
问题 I would like to select a subset of elements from a whole that satisfy certain conditions. There are about 20 elements, each having multiple attributes. I would like to select five elements that offer the least amount of discrepancy from a fixed criterion on one attribute, and offers the highest average value on another attribute. Lastly, I would like to apply the function over multiple sets of 20 elements. Thus far, I have been able to identify the subsets "by hand," but I'd like to be able

Skipping empty data frame in for loop in R

天大地大妈咪最大 提交于 2019-12-11 07:35:01
问题 I am currently working on a project in which I am going through a large data frame with information about certain events. In these events I am interested in calculating the average speed of a ball. Anywho to do this I am using a for loop which first subsets the data frame to get a data frame which only includes information from a certain event. Afterwards it calculates the average ball speed for that event and determines which team had possession of the ball. Now I have come upon a problem

Unable to create an array from a table

若如初见. 提交于 2019-12-11 07:34:31
问题 I'm trying to load an external CSV file using MATLAB. I managed to download it using webread , but I only need a subset of the columns. I tried Tb = webread('https://datahub.io/machine-learning/iris/r/iris.csv'); X = [sepallength sepalwidth petallength petalwidth]; But I cannot form X this way because the names are not recognized. How can I create X correctly? 回答1: The line Tb = webread('https://datahub.io/machine-learning/iris/r/iris.csv'); Produces a table object with column names you later

How can I create a new dataframe comparing values and getting only most recent data in R?

一个人想着一个人 提交于 2019-12-11 06:49:56
问题 I have a data frame that has the data from the Gini Index of countries. Plenty of the values are NA , so i want to create a new data frame that has, for each country, the most recent Gini Index measured for it. For example, if Brazil has a value for 2012, 2013 and 2015, the new data frame will have only the value of 2015. This is how the data looks like: Country.Name Country.Code X2014 X2015 X2016 X2017 8 Argentina ARG 41.4 NA 42.4 NA 9 Armenia ARM 31.5 32.4 32.5 NA 13 Austria AUT 30.5 30.5

Subset an atomic vector in-place

南笙酒味 提交于 2019-12-11 06:47:52
问题 Continuing from Subsetting a large vector uses unnecessarily large amounts of memory : Given an atomic vector, for example x <- rep_len(1:10, 1e7) How can I modify x in-place to remove elements by numeric index using Rcpp? In R, one can do this, but not in-place (i.e. without duplicating x ): idrops <- c(5, 4, 9) x <- x[-idrops] A reasonably efficient way to do this would be the following: IntegerVector dropElements(IntegerVector x, IntegerVector inds) { R_xlen_t n = x.length(); R_xlen_t

Creating variable out of conditional values in another one

半城伤御伤魂 提交于 2019-12-11 06:14:00
问题 I have quite a large conflict dataset (71 million observations) with many variables and date (daily). This is from the GDELT project for which the way the dataset is structured is that for each day, there is a target country and a source country of aggression. Namely, the first of January of 2000, many countries engaged in aggressive behaviour against others or themselves, and this dataset tracks this. It looks like this: clear input long date_01 str18 source_01 str19 target_01 str4 cameocode