subset | 易学教程

In R: subset or dplyr::filter with variable from vector

阅读更多关于 In R: subset or dplyr::filter with variable from vector

df <- data.frame(a=LETTERS[1:4], b=rnorm(4) ) vals <- c("B","D") I can filter/subset df with values in val with: dplyr::filter(df, a %in% vals) subset(df, a %in% vals) Both gives: a b 2 B 0.4481627 4 D 0.2916513 What if I have a variable name in a vector, e.g.: > names(df)[1] [1] "a" Then it doesnt work - I guess because its quoted dplyr::filter(df, names(df)[1] %in% vals) [1] a b <0 rows> (or 0-length row.names) How do you do this ? UPDATE ( what if its dplyr::tbl_df(df) ) Answers below work fine for data.frames, but not for dplyr::tbl_df wrapped data: df<-dplyr::tbl_df(df) dplyr::filter(df,

Data Frame Subset Performance

阅读更多关于 Data Frame Subset Performance

I have a couple of large data frames (1 million+ rows x 6-10 columns) I need to subset repeatedly. The subsetting section is the slowest part of my code and I curious if there is way to do this faster. load("https://dl.dropbox.com/u/4131944/Temp/DF_IOSTAT_ALL.rda") start_in <- strptime("2012-08-20 13:00", "%Y-%m-%d %H:%M") end_in<- strptime("2012-08-20 17:00", "%Y-%m-%d %H:%M") system.time(DF_IOSTAT_INT <- DF_IOSTAT_ALL[DF_IOSTAT_ALL$date_stamp >= start_in & DF_IOSTAT_ALL$date_stamp <= end_in,]) > system.time(DF_IOSTAT_INT <- DF_IOSTAT_ALL[DF_IOSTAT_ALL$date_stamp >= start_in & DF_IOSTAT_ALL

Find all unique subsets of a set of values

阅读更多关于 Find all unique subsets of a set of values

问题 I have an algorithm problem. I am trying to find all unique subset of values from a larger set of values. For example say I have the set {1,3,7,9} . What algorithm can I use to find these subsets of 3? {1,3,7} {1,3,9} {1,7,9} {3,7,9} Subsets should not repeat, and order is unimportant, set {1,2,3} is the same as set {3,2,1} for these purposes. Psudocode (or the regular kind) is encouraged. A brute force approach is obviously possible, but not desired. For example such a brute force method

How to slice a dataframe by selecting a range of columns and rows based on names and not indexes?

阅读更多关于 How to slice a dataframe by selecting a range of columns and rows based on names and not indexes?

This is a follow-up question of the question I asked here . There I learned a) how to do this for columns (see below) and b) that the selection of rows and columns seems to be quite differently handled in R which means that I cannot use the same approach for rows. So suppose I have a pandas dataframe like this: import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(10, size=(6, 6)), columns=['c' + str(i) for i in range(6)], index=["r" + str(i) for i in range(6)]) c0 c1 c2 c3 c4 c5 r0 4 2 3 9 9 0 r1 9 0 8 1 7 5 r2 2 6 7 5 4 7 r3 6 9 9 1 3 4 r4 1 1 1 3 0 3 r5 0 8 5 8 2 9 then

How to subset consecutive rows if they meet a condition

阅读更多关于 How to subset consecutive rows if they meet a condition

I am using R to analyze a number of time series (1951-2013) containing daily values of Max and Min temperatures. The data has the following structure: YEAR MONTH DAY MAX MIN 1985 1 1 22.8 9.4 1985 1 2 28.6 11.7 1985 1 3 24.7 12.2 1985 1 4 17.2 8.0 1985 1 5 17.9 7.6 1985 1 6 17.7 8.1 I need to find the frequency of heat waves based on this definition: A period of three or more consecutive days ‎with a daily maximum and minimum temperature exceeding the 90th percentile of the maximum ‎and minimum temperatures for all days in the studied period. Basically, I want to subset those consecutive days

Can I define an enum as a subset of another enum's cases?

阅读更多关于 Can I define an enum as a subset of another enum's cases?

Note: This is basically the same question as another one I've posted on Stackoverflow yesterday. However, I figured that I used a poor example in that question that didn't quite boil it down to the essence of what I had in mind. As all replies to that original post refer to that first question I thought it might be a better idea to put the new example in a separate question — no duplication intended. Model Game Characters That Can Move Let's define an enum of directions for use in a simple game: enum Direction { case up case down case left case right } Now in the game I need two kinds of

jq: selecting a subset of keys from an object

阅读更多关于 jq: selecting a subset of keys from an object

Given an input json string of keys from an array, return an object with only the entries that had keys in the original object and in the input array. I have a solution but I think that it isn't elegant ( {($k):$input[$k]} feels especially clunky...) and that this is a chance for me to learn. jq -n '{"1":"a","2":"b","3":"c"}' \ | jq --arg keys '["1","3","4"]' \ '. as $input | ( $keys | fromjson ) | map( . as $k | $input | select(has($k)) | {($k):$input[$k]} ) | add' Any ideas how to clean this up? I feel like Extracting selected properties from a nested JSON object with jq is a good starting

Merging two data.frames by key column

阅读更多关于 Merging two data.frames by key column

I have two dataframes. In the first one, I have a KEY/ID column and two variables: KEY V1 V2 1 10 2 2 20 4 3 30 6 4 40 8 5 50 10 In the second dataframe, I have a KEY/ID column and a third variable KEY V3 1 5 2 10 3 20 I would like to extract the rows of the first dataframe that are also in the second dataframe by matching them according to the KEY column. I would also like to add the V3 column to final dataset. KEY V1 V2 V3 1 10 2 5 2 20 4 10 3 30 6 20 This are my attempts by using the subset and the merge function subset(data1, data1$KEY == data2$KEY) merge(data1, data2, by.x = "KEY", by.y =

R: how to remove certain rows in data.frame

阅读更多关于 R: how to remove certain rows in data.frame

问题 > data = data.frame(a = c(100, -99, 322, 155, 256), b = c(23, 11, 25, 25, -999)) > data a b 1 100 23 2 -99 11 3 322 25 4 155 25 5 256 -999 For such a data.frame I would like to remove any row that contains -99 or -999. So my resulting data.frame should only consist of rows 1, 3, and 4. I was thinking of writing a loop for this, but I am hoping there's an easier way. (If my data.frame were to have columns a-z, then the loop method would be very clunky). My loop would probably look something

Find sum of subset with multiplication

阅读更多关于 Find sum of subset with multiplication

问题 Let's say we have got a set {a_1, a_2, a_3, ..., a_n} The goal is to find a sum that we create in the following way: We find all subsets whose length is 3, then multiply each subset's elements (for the subset {b_1, b_2, b_3} the result will be b_1*b_2*b_3 ). At the end we sum up all these products. I am looking for a shortest time-execution algorithm. Example SET: {3, 2, 1, 2} Let S be our sum. S = 3*2*1 + 3*2*2 + 2*1*2 + 3*1*2 = 28 回答1: It is easier to calculate sum of multiplied triplets