duplicates

Rounding milliseconds of POSIXct in data.table v1.9.2 (ok in 1.8.10)

时间秒杀一切 提交于 2019-12-03 07:02:26
I have a weird result for my data.table v1.9.2 : DT timestamp 1: 2013-01-01 17:51:00.707 2: 2013-01-01 17:51:59.996 3: 2013-01-01 17:52:00.059 4: 2013-01-01 17:54:23.901 5: 2013-01-01 17:54:23.914 str(DT) Classes ‘data.table’ and 'data.frame': 5 obs. of 1 variable: $ timestamp: POSIXct, format: "2013-01-01 17:51:00.707" "2013-01-01 17:51:59.996" "2013-01-01 17:52:00.059" "2013-01-01 17:54:23.901" ... - attr(*, "sorted")= chr "timestamp" - attr(*, ".internal.selfref")=<externalptr> When I apply the duplicated() function I get the following result: duplicated(DT) [1] FALSE FALSE FALSE FALSE TRUE

average between duplicated rows in R

橙三吉。 提交于 2019-12-03 06:37:23
I have a data frame df with rows that are duplicates for the names column but not for the values column: name value etc1 etc2 A 9 1 X A 10 1 X A 11 1 X B 2 1 Y C 40 1 Y C 50 1 Y I need to aggregate the duplicate names into one row, while calculating the mean over the values column. The expected output is as follows: name value etc1 etc2 A 10 1 X B 2 1 Y C 45 1 Y I have tried to use df[duplicated(df$name),] but of course this does not give me the mean over the duplicates. I would like to use aggregate() , but the problem is that the FUN part of this function will apply to all the other columns

Copy data from Amazon S3 to Redshift and avoid duplicate rows

╄→尐↘猪︶ㄣ 提交于 2019-12-03 06:17:17
问题 I am copying data from Amazon S3 to Redshift. During this process, I need to avoid the same files being loaded again. I don't have any unique constraints on my Redshift table. Is there a way to implement this using the copy command? http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html I tried adding unique constraint and setting column as primary key with no luck. Redshift does not seem to support unique/primary key constraints. 回答1: My solution is to run a 'delete'

How to remove duplicated records\\observations WITHOUT sorting in SAS?

穿精又带淫゛_ 提交于 2019-12-03 06:16:43
I wonder if there is a way to unduplicate records WITHOUT sorting?Sometimes, I want to keep original order and just want to remove duplicated records. Is it possible? BTW, below are what I know regarding unduplicating records, which does sorting in the end.. 1. proc sql; create table yourdata_nodupe as select distinct * From abc; quit; 2. proc sort data=YOURDATA nodupkey; by var1 var2 var3 var4 var5; run; You could use a hash object to keep track of which values have been seen as you pass through the data set. Only output when you encounter a key that hasn't been observed yet. This outputs in

How to remove repeated elements in a vector, similar to 'set' in Python

时光总嘲笑我的痴心妄想 提交于 2019-12-03 05:32:07
问题 I have a vector with repeated elements, and would like to remove them so that each element appears only once. In Python I could construct a Set from a vector to achieve this, but how can I do this in R? 回答1: You can check out unique function. > v = c(1, 1, 5, 5, 2, 2, 6, 6, 1, 3) > unique(v) [1] 1 5 2 6 3 回答2: This does the same thing. Slower, but useful if you also want a logical vector of the duplicates: v[duplicated(v)] 回答3: To remove contiguous duplicated elements only, you can compare

Unbelievable duplicate in an Entity Framework Query

北慕城南 提交于 2019-12-03 05:21:53
My SQL query against a particular view returns me 3 different rows. select * from vwSummary where vidate >= '10-15-2010' and vidate <= '10-15-2010' and idno = '0330' order by viDate But if i run the same query through my entity framework, I get 3 rows but all the 3 rows are same, equivalent to the third row. firstVisibleDate = new DateTime(2010, 10, 15); lastVisibleDate = new DateTime(2010, 10, 15); var p1 = (from v in db.vwSummary where v.viDate >= firstVisibleDate && v.viDate <= lastVisibleDate && v.IDNo == "0330" select v).ToList(); Can someone please help me to resolve this issue. EDIT: I

How to count duplicates values in NSArray?

核能气质少年 提交于 2019-12-03 05:18:09
问题 Value of my NSArray includes the duplicates. I find the duplicates but now how can I find the no. they repeat? 回答1: You can use NSCountedSet for this. Add all your objects to a counted set, then use the countForObject: method to find out how often each object appears. 回答2: Example: NSArray *names = [NSArray arrayWithObjects:@"John", @"Jane", @"John", nil]; NSCountedSet *set = [[NSCountedSet alloc] initWithArray:names]; for (id item in set) { NSLog(@"Name=%@, Count=%lu", item, (unsigned long)

generic code duplication detection tool

强颜欢笑 提交于 2019-12-03 04:40:46
I'm looking for a code duplication tool that is language agnostic. It's easy to find language specific code duplication tools (for Java, C, PHP, ...), but I'd like to run some code duplication analysis on a templates in a custom syntax. I don't care about advanced parsing of the syntax, just straight line based raw string comparison is fine. Whitespace insensitive matching would be a plus, but not required. (It's not that hard to normalize/eliminate whitespace myself.) Does anybody know a tool that can be (mis)used for something like this? Thanks. Doon Have a look Simian , you can use it for

Detecting duplicate files

浪子不回头ぞ 提交于 2019-12-03 04:37:29
I'd like to detect duplicate files in a directory tree. When two identical files are found only one of the duplicates will be preserved and the remaining duplicates will be deleted to save the disk space. The duplicate means files having the same content which may differ in file names and path. I was thinking about using hash algorithms for this purpose but there is a chance that different files have the same hashes, so I need some additional mechanism to tell me that the files aren't the same even though the hashes are the same because I don't want to delete two different files. Which

Best way to detect duplicate uploaded files in a Java Environment?

拈花ヽ惹草 提交于 2019-12-03 03:57:10
As part of a Java based web app, I'm going to be accepting uploaded .xls & .csv (and possibly other types of) files. Each file will be uniquely renamed with a combination of parameters and a timestamp. I'd like to be able to identify any duplicate files. By duplicate I mean, the exact same file regardless of the name. Ideally, I'd like to be able to detect the duplicates as quickly as possible after the upload, so that the server could include this info in the response. (If the processing time by file size doesn't cause too much of a lag.) I've read about running MD5 on the files and storing