subset | 易学教程

Oracle/SQL - Select specified range of sequential records

阅读更多关于 Oracle/SQL - Select specified range of sequential records

问题 I'm tryint to select a subset of records, 5000 through 10000 from a join. I've gotten queries like this to work in the past, but they were slightly less complex. Here is the query I'm trying to use and if I remove the rownum/rnum references (and therefore the outer select) I receive all my records as expected so I know that logic is good. SELECT * FROM ( SELECT unique cl.riid_, rownum as rnum FROM <table 1> cl, <table 3> mil WHERE cl.opt = 0 AND (cl.st_ != 'QT' OR cl.st_ IS NULL) AND cl.hh =

R: make 2 subset vectors so that values are different index-wise, and also different across each vector

阅读更多关于 R: make 2 subset vectors so that values are different index-wise, and also different across each vector

问题 Following up on this question, I want to do something similar, but this time I have one more requirement. I want to make 2 vectors subsetting from the same data. I need replace to be set to FALSE because I need all values to be different across a , and all values to be different across b . Apart from that, values cannot be the same in a and b for the same index position. Note that sampling vector v is always fixed, as is the sample length l . Doing the following, I only fulfil one criterium

subsetting Panel Data conditional on consecutive strings of length

阅读更多关于 subsetting Panel Data conditional on consecutive strings of length

问题 I'm stuck trying to subset some panel data, i.e. ids within group, using dplyr . I want to exact all id s, within each group, grp that has a NUM series with a minimum smaller than 2 and a maximum greater than 2. I've constructed a minimal working example below that should illustrate the issue. I have been working with filter() , row_number() == c(1,n()) , and tried to separate it out and merge, i.e. different types of _join , it back together, but I am stuck and I am now turning to the SO

Group-wise subsetting where feasible

阅读更多关于 Group-wise subsetting where feasible

问题 I would like to subset rows of my data library(data.table); set.seed(333); n <- 100 dat <- data.table(id=1:n, group=rep(1:2,each=n/2), x=runif(n,100,120), y=runif(n,200,220), z=runif(n,300,320)) > head(dat) id group x y z 1: 1 1 109.3400 208.6732 308.7595 2: 2 1 101.6920 201.0989 310.1080 3: 3 1 119.4697 217.8550 313.9384 4: 4 1 111.4261 205.2945 317.3651 5: 5 1 100.4024 212.2826 305.1375 6: 6 1 114.4711 203.6988 319.4913 in several stages within each group. I need to automate this and it

How can I skip groups while subsetting with key by in data.table?

阅读更多关于 How can I skip groups while subsetting with key by in data.table?

问题 I have this DT: dt=data.table(ID=c(rep(letters[1:2],each=4),'b'),value=seq(1,9)) ID value 1: a 1 2: a 2 3: a 3 4: a 4 5: b 5 6: b 6 7: b 7 8: b 8 9: b 9 I need to eliminate groups while subsetting but only when the data fulfils some condition. Something like this does not work: dt[,{if (.N==4) .SD else NULL v1},by="ID"] So that I need to remove the groups that do not meet the condition. In this example I would like to skip the groups which length is different than 4. So that I get: ID value 1

R finding the first value in a data frame that falls within a given threshold

阅读更多关于 R finding the first value in a data frame that falls within a given threshold

问题 I am a fairly new user and I need your help with a task that I am stuck on. If my question has been asked/answered before I would be grateful if you could kindly guide me to the relevant page. I have the following data set (lbnp_br) which is optical density (OD) measured over time (in seconds): time OD 1891 -244.6 1891.5 -244.4 1892 -242 1892.5 -242 1893 -241.1 1893.5 -242.4 1894 -245.2 1894.5 -249.6 **1895 -253.9** 1895.5 -254.5 1896 -251.9 1896.5 -246.7 1897 -242.4 1897.5 -234.6 1898 -225.5

Subsetting columns works on data.frame but not on data.table

阅读更多关于 Subsetting columns works on data.frame but not on data.table

问题 I can select a few columns from a data.frame : > z[c("events","users")] events users 1 26246016 201816 2 942767 158793 3 29211295 137205 4 30797086 124314 but not from a data.table : > best[c("events","users")] Starting binary search ...Error in `[.data.table`(best, c("events", "users")) : typeof x.pixel_id (integer) != typeof i.V1 (character) Calls: [ -> [.data.table What do I do? Is there a better way than to turn the data.table back into a data.frame? 回答1: Column subsetting should be done

Creating a representative sample from a large CSV

阅读更多关于 Creating a representative sample from a large CSV

问题 I have the following dataset: head -2 trip_data_1.csv medallion,hack_license,vendor_id,rate_code,store_and_fwd_flag,pickup_datetime,dropoff_datetime,passenger_count,trip_time_in_secs,trip_distance,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude 89D227B655E5C82AECF13C3F540D4CF4,BA96DE419E711691B9445D6A6307C170,CMT,1,N,2013-01-01 15:11:48,2013-01-01 15:18:10,4,382,1.00,-73.978165,40.757977,-73.989838,40.751171 A simple count of records by date gives me the following output:

R Selecting column in a data frame by column in another data frame

阅读更多关于 R Selecting column in a data frame by column in another data frame

问题 I am facing a problem when trying to subset my data, maybe you could help me. What I need is to subset data from first data frame by a column when this column value is equal to the value of a column in the second data frame. The following are the dataframes I'm using: > head(places) Zona Poble lat lon alt 1 1 Zorita 40.7353 -0.165748 691.867 2 1 Morella 40.6287 -0.113284 955.719 3 1 Forcall 40.6621 -0.209759 753.882 4 2 Benasal 40.3943 -0.126111 848.171 5 2 Cati 40.4532 0.060409 667.610 6 2

What's the most efficient algorithm for generating all k-subsetsof an n-set?

阅读更多关于 What's the most efficient algorithm for generating all k-subsetsof an n-set?

问题 We are given a set of n elements and we'd like to generate all k -subsets this set. For example, if S={1,2,3} and k=2 , then the answer would be {1,2}, {1,3}, {2,3} (order not important). There are {n choose k} k -subsets of an n -set (by definition :-), which is O(n^k) (although this is not tight). Obviously any algorithm for the problem will have to run in time Omega({n choose k}) . What is the currently fastest known algorithm for this problem? Can the lower bound of {n choose k} actually