subset

Reading multiple files and calculating mean based on user input

a 夏天 提交于 2019-11-26 16:36:38
I am trying to write a function in R which takes 3 inputs: Directory pollutant id I have a directory on my computer full of CSV's files i.e. over 300. What this function would do is shown in the below prototype: pollutantmean <- function(directory, pollutant, id = 1:332) { ## 'directory' is a character vector of length 1 indicating ## the location of the CSV files ## 'pollutant' is a character vector of length 1 indicating ## the name of the pollutant for which we will calculate the ## mean; either "sulfate" or "nitrate". ## 'id' is an integer vector indicating the monitor ID numbers ## to be

Subset data to contain only columns whose names match a condition

北慕城南 提交于 2019-11-26 16:00:29
Is there a way for me to subset data based on column names starting with a particular string? I have some columns which are like ABC_1 ABC_2 ABC_3 and some like XYZ_1, XYZ_2,XYZ_3 let's say. How can I subset my df based only on columns containing the above portions of text (lets say, ABC or XYZ )? I can use indices, but the columns are too scattered in data and it becomes too much of hard coding. Also, I want to only include rows from each of these columns where any of their value is >0 so if either of the 6 columns above has a 1 in the row, it makes a cut into my final data frame. Try grepl

Subset based on variable column name

别等时光非礼了梦想. 提交于 2019-11-26 15:59:26
问题 I'm wondering how to use the subset function if I don't know the name of the column I want to test. The scenario is this: I have a Shiny app where the user can pick a variable on which to filter (subset) the data table. I receive the column name from the webapp as input, and I want to subset based on the value of that column, like so: subset(myData, THECOLUMN == someValue) Except where both THECOLUMN and someValue are variables. Is there a syntax for passing the desired column name as a

Using grep to help subset a data frame in R

廉价感情. 提交于 2019-11-26 15:46:10
问题 I am having trouble subsetting my data. I want the data subsetted on column x, where the first 3 characters begin G45. My data frame: x <- c("G448", "G459", "G479", "G406") y <- c(1:4) My.Data <- data.frame (x,y) I have tried: subset (My.Data, x=="G45*") But I am unsure how to use wildcards. I have also tried grep() to find the indicies: grep ("G45*", My.Data$x) but it returns all 4 rows, rather than just those beginning G45, probably also as I am unsure how to use wildcards. 回答1: It's pretty

subsetting a Python DataFrame

我的梦境 提交于 2019-11-26 15:19:17
问题 I am transitioning from R to Python. I just began using Pandas. I have an R code that subsets nicely: k1 <- subset(data, Product = p.id & Month < mn & Year == yr, select = c(Time, Product)) Now, I want to do similar stuff in Python. this is what I have got so far: import pandas as pd data = pd.read_csv("../data/monthly_prod_sales.csv") #first, index the dataset by Product. And, get all that matches a given 'p.id' and time. data.set_index('Product') k = data.ix[[p.id, 'Time']] # then, index

Calculating all of the subsets of a set of numbers

本秂侑毒 提交于 2019-11-26 12:54:31
I want to find the subsets of a set of integers. It is the first step of "Sum of Subsets" algorithm with backtracking. I have written the following code, but it doesn't return the correct answer: BTSum(0, nums); ///************** ArrayList<Integer> list = new ArrayList<Integer>(); public static ArrayList<Integer> BTSum(int n, ArrayList<Integer> numbers) { if (n == numbers.size()) { for (Integer integer : list) { System.out.print(integer+", "); } System.out.println("********************"); list.removeAll(list); System.out.println(); } else { for (int i = n; i < numbers.size(); i++) { if (i ==

Subset of rows containing NA (missing) values in a chosen column of a data frame

99封情书 提交于 2019-11-26 12:35:47
问题 We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column ( VaR2 ) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA , for missing data. Var1 Var2 10 2010/01/01 20 NA 30 2010/03/01 We would like to use the subset command to define a new data frame new_DF such that it only contains rows that have an NA\' value from the column ( VaR2 ). In the example given, only Row

Selecting columns in R data frame based on those *not* in a vector

為{幸葍}努か 提交于 2019-11-26 12:11:24
问题 I\'m familiar with being able to extract columns from an R data frame (or matrix) like so: df.2 <- df[, c(\"name1\", \"name2\", \"name3\")] But can one use a ! or other tool to select all but those listed columns ? For background, I have a data frame with quite a few column vectors and I\'d like to avoid: Typing out the majority of the names when I could just remove a minority Using the much shorter df.2 <- df[, c(1,3,5)] because when my .csv file changes, my code goes to heck since the

Looping through t.tests for data frame subsets in r

不打扰是莪最后的温柔 提交于 2019-11-26 11:36:40
问题 I have a data frame \'math.numeric\' with 32 variables. Each row represents a student and each variable is an attribute. The students have been put into 5 groups based on their final grade. The data looks as follows: head(math.numeric) school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason ... group 1 1 18 2 1 1 4 4 1 5 1 2 1 1 17 2 1 2 1 1 1 3 1 2 1 1 15 2 2 2 1 1 1 3 3 3 1 1 15 2 1 2 4 2 2 4 2 4 1 1 16 2 1 2 3 3 3 3 2 3 1 2 16 2 2 2 4 3 4 3 4 4 I am performing t-tests on each

Split/subset a data frame by factors in one column [duplicate]

跟風遠走 提交于 2019-11-26 11:01:59
问题 This question already has answers here : Split data.frame based on levels of a factor into new data.frames (2 answers) Closed 2 years ago . My data is like this (for example): ID Rate State 1 24 AL 2 35 MN 3 46 FL 4 34 AL 5 78 MN 6 99 FL Data: structure(list(ID = 1:6, Rate = c(24L, 35L, 46L, 34L, 78L, 99L), State = structure(c(1L, 3L, 2L, 1L, 3L, 2L), .Label = c(\"AL\",\"FL\", \"MN\"), class = \"factor\")), .Names = c(\"ID\", \"Rate\", \"State\"), class = \"data.frame\", row.names = c(NA, -6L