subset

In R: subset or dplyr::filter with variable from vector

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-05 11:21:15
df <- data.frame(a=LETTERS[1:4], b=rnorm(4) ) vals <- c("B","D") I can filter/subset df with values in val with: dplyr::filter(df, a %in% vals) subset(df, a %in% vals) Both gives: a b 2 B 0.4481627 4 D 0.2916513 What if I have a variable name in a vector, e.g.: > names(df)[1] [1] "a" Then it doesnt work - I guess because its quoted dplyr::filter(df, names(df)[1] %in% vals) [1] a b <0 rows> (or 0-length row.names) How do you do this ? UPDATE ( what if its dplyr::tbl_df(df) ) Answers below work fine for data.frames, but not for dplyr::tbl_df wrapped data: df<-dplyr::tbl_df(df) dplyr::filter(df,

Data Frame Subset Performance

一曲冷凌霜 提交于 2019-12-05 06:59:36
I have a couple of large data frames (1 million+ rows x 6-10 columns) I need to subset repeatedly. The subsetting section is the slowest part of my code and I curious if there is way to do this faster. load("https://dl.dropbox.com/u/4131944/Temp/DF_IOSTAT_ALL.rda") start_in <- strptime("2012-08-20 13:00", "%Y-%m-%d %H:%M") end_in<- strptime("2012-08-20 17:00", "%Y-%m-%d %H:%M") system.time(DF_IOSTAT_INT <- DF_IOSTAT_ALL[DF_IOSTAT_ALL$date_stamp >= start_in & DF_IOSTAT_ALL$date_stamp <= end_in,]) > system.time(DF_IOSTAT_INT <- DF_IOSTAT_ALL[DF_IOSTAT_ALL$date_stamp >= start_in & DF_IOSTAT_ALL

Find all unique subsets of a set of values

大兔子大兔子 提交于 2019-12-05 06:49:52
问题 I have an algorithm problem. I am trying to find all unique subset of values from a larger set of values. For example say I have the set {1,3,7,9} . What algorithm can I use to find these subsets of 3? {1,3,7} {1,3,9} {1,7,9} {3,7,9} Subsets should not repeat, and order is unimportant, set {1,2,3} is the same as set {3,2,1} for these purposes. Psudocode (or the regular kind) is encouraged. A brute force approach is obviously possible, but not desired. For example such a brute force method

How to slice a dataframe by selecting a range of columns and rows based on names and not indexes?

醉酒当歌 提交于 2019-12-05 06:34:50
This is a follow-up question of the question I asked here . There I learned a) how to do this for columns (see below) and b) that the selection of rows and columns seems to be quite differently handled in R which means that I cannot use the same approach for rows. So suppose I have a pandas dataframe like this: import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(10, size=(6, 6)), columns=['c' + str(i) for i in range(6)], index=["r" + str(i) for i in range(6)]) c0 c1 c2 c3 c4 c5 r0 4 2 3 9 9 0 r1 9 0 8 1 7 5 r2 2 6 7 5 4 7 r3 6 9 9 1 3 4 r4 1 1 1 3 0 3 r5 0 8 5 8 2 9 then

How to subset consecutive rows if they meet a condition

房东的猫 提交于 2019-12-05 03:36:16
I am using R to analyze a number of time series (1951-2013) containing daily values of Max and Min temperatures. The data has the following structure: YEAR MONTH DAY MAX MIN 1985 1 1 22.8 9.4 1985 1 2 28.6 11.7 1985 1 3 24.7 12.2 1985 1 4 17.2 8.0 1985 1 5 17.9 7.6 1985 1 6 17.7 8.1 I need to find the frequency of heat waves based on this definition: A period of three or more consecutive days ‎with a daily maximum and minimum temperature exceeding the 90th percentile of the maximum ‎and minimum temperatures for all days in the studied period. Basically, I want to subset those consecutive days

Can I define an enum as a subset of another enum's cases?

亡梦爱人 提交于 2019-12-05 03:04:27
Note: This is basically the same question as another one I've posted on Stackoverflow yesterday. However, I figured that I used a poor example in that question that didn't quite boil it down to the essence of what I had in mind. As all replies to that original post refer to that first question I thought it might be a better idea to put the new example in a separate question — no duplication intended. Model Game Characters That Can Move Let's define an enum of directions for use in a simple game: enum Direction { case up case down case left case right } Now in the game I need two kinds of

jq: selecting a subset of keys from an object

戏子无情 提交于 2019-12-05 02:57:08
Given an input json string of keys from an array, return an object with only the entries that had keys in the original object and in the input array. I have a solution but I think that it isn't elegant ( {($k):$input[$k]} feels especially clunky...) and that this is a chance for me to learn. jq -n '{"1":"a","2":"b","3":"c"}' \ | jq --arg keys '["1","3","4"]' \ '. as $input | ( $keys | fromjson ) | map( . as $k | $input | select(has($k)) | {($k):$input[$k]} ) | add' Any ideas how to clean this up? I feel like Extracting selected properties from a nested JSON object with jq is a good starting

Merging two data.frames by key column

孤者浪人 提交于 2019-12-05 02:14:48
I have two dataframes. In the first one, I have a KEY/ID column and two variables: KEY V1 V2 1 10 2 2 20 4 3 30 6 4 40 8 5 50 10 In the second dataframe, I have a KEY/ID column and a third variable KEY V3 1 5 2 10 3 20 I would like to extract the rows of the first dataframe that are also in the second dataframe by matching them according to the KEY column. I would also like to add the V3 column to final dataset. KEY V1 V2 V3 1 10 2 5 2 20 4 10 3 30 6 20 This are my attempts by using the subset and the merge function subset(data1, data1$KEY == data2$KEY) merge(data1, data2, by.x = "KEY", by.y =

R: how to remove certain rows in data.frame

∥☆過路亽.° 提交于 2019-12-05 02:10:40
问题 > data = data.frame(a = c(100, -99, 322, 155, 256), b = c(23, 11, 25, 25, -999)) > data a b 1 100 23 2 -99 11 3 322 25 4 155 25 5 256 -999 For such a data.frame I would like to remove any row that contains -99 or -999. So my resulting data.frame should only consist of rows 1, 3, and 4. I was thinking of writing a loop for this, but I am hoping there's an easier way. (If my data.frame were to have columns a-z, then the loop method would be very clunky). My loop would probably look something

Find sum of subset with multiplication

人走茶凉 提交于 2019-12-05 01:15:33
问题 Let's say we have got a set {a_1, a_2, a_3, ..., a_n} The goal is to find a sum that we create in the following way: We find all subsets whose length is 3, then multiply each subset's elements (for the subset {b_1, b_2, b_3} the result will be b_1*b_2*b_3 ). At the end we sum up all these products. I am looking for a shortest time-execution algorithm. Example SET: {3, 2, 1, 2} Let S be our sum. S = 3*2*1 + 3*2*2 + 2*1*2 + 3*1*2 = 28 回答1: It is easier to calculate sum of multiplied triplets