subset | 易学教程

Reading multiple files and calculating mean based on user input

阅读更多关于 Reading multiple files and calculating mean based on user input

I am trying to write a function in R which takes 3 inputs: Directory pollutant id I have a directory on my computer full of CSV's files i.e. over 300. What this function would do is shown in the below prototype: pollutantmean <- function(directory, pollutant, id = 1:332) { ## 'directory' is a character vector of length 1 indicating ## the location of the CSV files ## 'pollutant' is a character vector of length 1 indicating ## the name of the pollutant for which we will calculate the ## mean; either "sulfate" or "nitrate". ## 'id' is an integer vector indicating the monitor ID numbers ## to be

Subset data to contain only columns whose names match a condition

阅读更多关于 Subset data to contain only columns whose names match a condition

Is there a way for me to subset data based on column names starting with a particular string? I have some columns which are like ABC_1 ABC_2 ABC_3 and some like XYZ_1, XYZ_2,XYZ_3 let's say. How can I subset my df based only on columns containing the above portions of text (lets say, ABC or XYZ )? I can use indices, but the columns are too scattered in data and it becomes too much of hard coding. Also, I want to only include rows from each of these columns where any of their value is >0 so if either of the 6 columns above has a 1 in the row, it makes a cut into my final data frame. Try grepl

Subset based on variable column name

阅读更多关于 Subset based on variable column name

问题 I'm wondering how to use the subset function if I don't know the name of the column I want to test. The scenario is this: I have a Shiny app where the user can pick a variable on which to filter (subset) the data table. I receive the column name from the webapp as input, and I want to subset based on the value of that column, like so: subset(myData, THECOLUMN == someValue) Except where both THECOLUMN and someValue are variables. Is there a syntax for passing the desired column name as a

Using grep to help subset a data frame in R

阅读更多关于 Using grep to help subset a data frame in R

问题 I am having trouble subsetting my data. I want the data subsetted on column x, where the first 3 characters begin G45. My data frame: x <- c("G448", "G459", "G479", "G406") y <- c(1:4) My.Data <- data.frame (x,y) I have tried: subset (My.Data, x=="G45*") But I am unsure how to use wildcards. I have also tried grep() to find the indicies: grep ("G45*", My.Data$x) but it returns all 4 rows, rather than just those beginning G45, probably also as I am unsure how to use wildcards. 回答1: It's pretty

subsetting a Python DataFrame

阅读更多关于 subsetting a Python DataFrame

问题 I am transitioning from R to Python. I just began using Pandas. I have an R code that subsets nicely: k1 <- subset(data, Product = p.id & Month < mn & Year == yr, select = c(Time, Product)) Now, I want to do similar stuff in Python. this is what I have got so far: import pandas as pd data = pd.read_csv("../data/monthly_prod_sales.csv") #first, index the dataset by Product. And, get all that matches a given 'p.id' and time. data.set_index('Product') k = data.ix[[p.id, 'Time']] # then, index

Calculating all of the subsets of a set of numbers

阅读更多关于 Calculating all of the subsets of a set of numbers

I want to find the subsets of a set of integers. It is the first step of "Sum of Subsets" algorithm with backtracking. I have written the following code, but it doesn't return the correct answer: BTSum(0, nums); ///************** ArrayList<Integer> list = new ArrayList<Integer>(); public static ArrayList<Integer> BTSum(int n, ArrayList<Integer> numbers) { if (n == numbers.size()) { for (Integer integer : list) { System.out.print(integer+", "); } System.out.println("********************"); list.removeAll(list); System.out.println(); } else { for (int i = n; i < numbers.size(); i++) { if (i ==

Subset of rows containing NA (missing) values in a chosen column of a data frame

阅读更多关于 Subset of rows containing NA (missing) values in a chosen column of a data frame

问题 We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column ( VaR2 ) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA , for missing data. Var1 Var2 10 2010/01/01 20 NA 30 2010/03/01 We would like to use the subset command to define a new data frame new_DF such that it only contains rows that have an NA\' value from the column ( VaR2 ). In the example given, only Row

Selecting columns in R data frame based on those not in a vector

阅读更多关于 Selecting columns in R data frame based on those *not* in a vector

问题 I\'m familiar with being able to extract columns from an R data frame (or matrix) like so: df.2 <- df[, c(\"name1\", \"name2\", \"name3\")] But can one use a ! or other tool to select all but those listed columns ? For background, I have a data frame with quite a few column vectors and I\'d like to avoid: Typing out the majority of the names when I could just remove a minority Using the much shorter df.2 <- df[, c(1,3,5)] because when my .csv file changes, my code goes to heck since the

Looping through t.tests for data frame subsets in r

阅读更多关于 Looping through t.tests for data frame subsets in r

问题 I have a data frame \'math.numeric\' with 32 variables. Each row represents a student and each variable is an attribute. The students have been put into 5 groups based on their final grade. The data looks as follows: head(math.numeric) school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason ... group 1 1 18 2 1 1 4 4 1 5 1 2 1 1 17 2 1 2 1 1 1 3 1 2 1 1 15 2 2 2 1 1 1 3 3 3 1 1 15 2 1 2 4 2 2 4 2 4 1 1 16 2 1 2 3 3 3 3 2 3 1 2 16 2 2 2 4 3 4 3 4 4 I am performing t-tests on each

Split/subset a data frame by factors in one column [duplicate]

阅读更多关于 Split/subset a data frame by factors in one column [duplicate]

问题 This question already has answers here : Split data.frame based on levels of a factor into new data.frames (2 answers) Closed 2 years ago . My data is like this (for example): ID Rate State 1 24 AL 2 35 MN 3 46 FL 4 34 AL 5 78 MN 6 99 FL Data: structure(list(ID = 1:6, Rate = c(24L, 35L, 46L, 34L, 78L, 99L), State = structure(c(1L, 3L, 2L, 1L, 3L, 2L), .Label = c(\"AL\",\"FL\", \"MN\"), class = \"factor\")), .Names = c(\"ID\", \"Rate\", \"State\"), class = \"data.frame\", row.names = c(NA, -6L