duplicate-removal

Remove duplicate rows leaving oldest row Only?

懵懂的女人 提交于 2019-11-27 01:59:09
I have a table of data and there are many duplicate entries from user submissions. I want to delete all duplicates rows based on the field subscriberEmail , leaving only the original submission. In other words, I want to search for all duplicate emails, and delete those rows, leaving only the original. How can I do this without swapping tables? My table contains unique IDs for each row. Since you're using the id column as an indicator of which record is 'original': delete x from myTable x join myTable z on x.subscriberEmail = z.subscriberEmail where x.id > z.id This will leave one record per

Removing duplicate columns and rows from a NumPy 2D array

こ雲淡風輕ζ 提交于 2019-11-27 01:02:01
I'm using a 2D shape array to store pairs of longitudes+latitudes. At one point, I have to merge two of these 2D arrays, and then remove any duplicated entry. I've been searching for a function similar to numpy.unique, but I've had no luck. Any implementation I've been thinking on looks very "unoptimizied". For example, I'm trying with converting the array to a list of tuples, removing duplicates with set, and then converting to an array again: coordskeys = np.array(list(set([tuple(x) for x in coordskeys]))) Are there any existing solutions, so I do not reinvent the wheel? To make it clear, I

Remove all duplicates except last instance

删除回忆录丶 提交于 2019-11-26 23:20:20
问题 So I have a dataset in R with the following layout as an example: ID Date Tally 1 2/1/2011 1 2 2/1/2011 2 3 2/1/2011 3 1 2/1/2011 4 2 2/1/2011 5 1 2/1/2011 6 3 2/1/2011 7 4 2/1/2011 8 2 2/1/2011 9 I want to remove all instances except the LAST instance of the post id. Right now everything I can find online, and functions I am using is removing everything except the FIRST instance. So my new data frame would look like: ID Date Tally 1 2/1/2011 6 3 2/1/2011 7 4 2/1/2011 8 2 2/1/2011 9 How do I

Eliminating duplicate values based on only one column of the table

浪子不回头ぞ 提交于 2019-11-26 22:23:52
My query: SELECT sites.siteName, sites.siteIP, history.date FROM sites INNER JOIN history ON sites.siteName = history.siteName ORDER BY siteName,date First part of the output: How can I remove the duplicates in siteName column? I want to leave only the updated one based on date column. In the example output above, I need the rows 1, 3, 6, 10 This is where the window function row_number() comes in handy: SELECT s.siteName, s.siteIP, h.date FROM sites s INNER JOIN (select h.*, row_number() over (partition by siteName order by date desc) as seqnum from history h ) h ON s.siteName = h.siteName and

duplicates in multiple columns

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-26 20:08:20
I have a data frame like so > df a b c d 1 1 2 A 1001 2 2 4 B 1002 3 3 6 B 1002 4 4 8 C 1003 5 5 10 D 1004 6 6 12 D 1004 7 7 13 E 1005 8 8 14 E 1006 I want to remove the rows where there are repeated values in column c AND column d. So in this example rows 2,3,5 and 6 would removed. I have used this, which works: df[!(df$c %in% df$c[duplicated(df$c)] & df$d %in% df$d[duplicated(df$d)]),] >df a b c d 1 1 2 A 1001 4 4 8 C 1003 7 7 13 E 1005 8 8 14 E 1006 but it seems clunky and I can't help but think there is a better way. Any suggestions? In case anyone wants to re-create the data-frame here is

How to delete duplicates in SQL table based on multiple fields

南楼画角 提交于 2019-11-26 19:09:49
问题 I have a table of games, which is described as follows: +---------------+-------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------------+-------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | date | date | NO | | NULL | | | time | time | NO | | NULL | | | hometeam_id | int(11) | NO | MUL | NULL | | | awayteam_id | int(11) | NO | MUL | NULL | | | locationcity | varchar(30) | NO | |

Remove duplicates keeping entry with largest absolute value

假如想象 提交于 2019-11-26 18:55:10
Let's say I have four samples: id=1, 2, 3, and 4, with one or more measurements on each of those samples: > a <- data.frame(id=c(1,1,2,2,3,4), value=c(1,2,3,-4,-5,6)) > a id value 1 1 1 2 1 2 3 2 3 4 2 -4 5 3 -5 6 4 6 I want to remove duplicates, keeping only one entry per ID - the one having the largest absolute value of the "value" column. I.e., this is what I want: > a[c(2,4,5,6), ] id value 2 1 2 4 2 -4 5 3 -5 6 4 6 How might I do this in R? aa <- a[order(a$id, -abs(a$value) ), ] #sort by id and reverse of abs(value) aa[ !duplicated(aa$id), ] # take the first row within each id id value 2

Delete rows that exist in another data frame?

早过忘川 提交于 2019-11-26 18:52:00
I have the two following data frames (example): df1: name profile type strand A 4.5 1 + B 3.2 1 + C 5.5 1 + D 14.0 1 - E 45.1 1 - F 32.8 1 - G 19.9 1 + df2: name A B C G I would like to delete the rows in df1 for which df1$name = df2$name to get the following: Output: name profile type strand D 14.0 1 - E 45.1 1 - F 32.8 1 - If anyone could tell me which piece of code to use it would be a lot of help, seemed simple at first but I've been messing it up since yesterday. You need the %in% operator. So, df1[!(df1$name %in% df2$name),] should give you what you want. df1$name %in% df2$name tests

how to merge 2 List<T> with removing duplicate values in C#

蓝咒 提交于 2019-11-26 18:36:57
I have two lists List that i need to combine and removing duplicate values of both lists A bit hard to explain, so let me show an example of what the code looks like, and what i want as a result, in sample I use int type not ResultAnalysisFileSql class. first_list = [1, 12, 12, 5] second_list = [12, 5, 7, 9, 1] The result of combining the two lists should result in this list: resulting_list = [1, 12, 5, 7, 9] You'll notice that the result has the first list, including its two "12" values, and in second_list has an additional 12, 1 and 5 value. ResultAnalysisFileSql class [Serializable] public

Delete duplicate records from a SQL table without a primary key

好久不见. 提交于 2019-11-26 18:31:35
I have the below table with the below records in it create table employee ( EmpId number, EmpName varchar2(10), EmpSSN varchar2(11) ); insert into employee values(1, 'Jack', '555-55-5555'); insert into employee values (2, 'Joe', '555-56-5555'); insert into employee values (3, 'Fred', '555-57-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6, 'Lisa', '555-70-5555'); insert into employee values (1, 'Jack', '555-55-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into