duplicates

Delete duplicate records based on multiple columns

蓝咒 提交于 2019-12-04 16:05:38
问题 In our system we run hourly imports from an external database. Due to an error in the import scripts, there are now some duplicate records. A duplicate is deemed where any record has the same :legacy_id and :company . What code can I run to find and delete these duplicates? I was playing around with this: Product.select(:legacy_id,:company).group(:legacy_id,:company).having("count(*) > 1") It seemed to return some of the duplicates, but I wasn't sure how to delete from there? Any ideas? 回答1:

Return duplicate records

梦想与她 提交于 2019-12-04 15:26:58
I simply want to return duplicate records from a table. In my case, a record is duplicate if more than one record has the same value in col1, col2, col3, and col4. SELECT col1, col2, col3, col4 , COUNT(*) AS cnt FROM yourTable GROUP BY col1, col2, col3, col4 HAVING COUNT(*) > 1 If there are additional columns that you want to be shown, you can JOIN the above to the table: SELECT t.* , dup.cnt FROM yourTable t JOIN ( SELECT col1, col2, col3, col4 , COUNT(*) AS cnt FROM yourTable GROUP BY col1, col2, col3, col4 HAVING COUNT(*) > 1 ) AS dup ON t.col1 = dup.col1 AND t.col2 = dup.col2 AND t.col3 =

Tree with no duplicate children

谁说我不能喝 提交于 2019-12-04 14:08:37
Using anytree I produced such tree: A ├── B │ └── C │ └── D │ └── F └── B └── C └── E └── G Is there a way to remove all duplicate children and turn it into the tree below (recursive for children at all possible levels)? A └── B └── C ├── D | └── F └── E └── G Edit: What I am trying to achieve is a tree of all links on a website. So everything between slashes would become a child: .../child/... (second slash is optional). The above is just a representation of my problem, but I hope it's clear. Here is my Node generation: root = Node('A') for link in links: children = link.split('/') cur_root =

Django Inline for ManyToMany generate duplicate queries

三世轮回 提交于 2019-12-04 12:53:11
I'm experiencing some major performing issue with my django admin. Lots of duplicate queries based on how many inlines that I have. models.py class Setting(models.Model): name = models.CharField(max_length=50, unique=True) class Meta: ordering = ('name',) def __str__(self): return self.name class DisplayedGroup(models.Model): name = models.CharField(max_length=30, unique=True) position = models.PositiveSmallIntegerField(default=100) class Meta: ordering = ('priority',) def __str__(self): return self.name class Machine(models.Model): name = models.CharField(max_length=20, unique=True) settings

Near Duplicate Detection in Data Streams

最后都变了- 提交于 2019-12-04 12:34:41
问题 I am currently working on a streaming API that generates a lot of textual content. As expected, the API gives out a lot of duplicates and we also have a business requirement to filter near duplicate data. I did a bit of research on duplicate detection in data streams and read about Stable Bloom Filters. Stable bloom filters are data structures for duplicate detection in data streams with an upper bound on the false positive rate. But, I want to identify near duplicates and I also looked at

Checking duplicates, sum them and delete one row after summing

跟風遠走 提交于 2019-12-04 12:32:15
问题 I have a dataframe which contains some duplicates. I want to sum rows of two columns where there is a duplicate and then delete the unwanted row. Here is an example of the data, Year ID Lats Longs N n c_id 2015 200 30.5417 -20.5254 150 30 4142 2015 200 30.5417 -20.5254 90 50 4142 I want to sum columns N and n into one row. the rest of the information i.e. Lats , Longs , ID and Year is to remain the same e.g., Year ID Lats Long N n c_id 2015 200 30.5417 -20.5254 240 80 4142 回答1: Solution using

Extract duplicate objects from a List in Java 8

☆樱花仙子☆ 提交于 2019-12-04 12:01:35
问题 This code removes duplicates from the original list, but I want to extract the duplicates from the original list -> not removing them (this package name is just part of another project): Given: a Person pojo: package at.mavila.learn.kafka.kafkaexercises; import org.apache.commons.lang3.builder.ToStringBuilder; public class Person { private final Long id; private final String firstName; private final String secondName; private Person(final Builder builder) { this.id = builder.id; this

Removing duplicates from multiple self left joins

久未见 提交于 2019-12-04 11:54:21
I am dynamically generating a query like below that creates different combinations of rules by left joining (any number of times) on itself and avoiding rules with some of the same attributes as part of the joins conditions e.g. SELECT count(*) FROM rules AS t1 LEFT JOIN rules AS t2 ON t1.id != t2.id AND ... LEFT JOIN rules AS t3 ON t1.id != t2.id AND t1.id != t3.id AND t2.id != t3.id AND ... I am currently removing duplicates by creating an array of ids from the joined rows then sorting and grouping by them: SELECT sort(array[t1.id, t2.id, t3.id]) AS ids ... GROUP BY ids I would like to know

Find possible duplicates in two columns ignoring case and special characters

元气小坏坏 提交于 2019-12-04 11:37:31
问题 Query SELECT COUNT(*), name, number FROM tbl GROUP BY name, number HAVING COUNT(*) > 1 It sometimes fails to find duplicates between lower case and upper case. E.g.: sunny and Sunny don't show up as a duplicates. So how to find all possible duplicates in PostgreSQL for two columns. 回答1: lower()/ upper() Use one of these to fold characters to either lower or upper case. Special characters are not affected: SELECT count(*), lower(name), number FROM tbl GROUP BY lower(name), number HAVING count(

Exclude duplicate values in certain columns using ddply

烈酒焚心 提交于 2019-12-04 11:19:07
I have a data frame with the following structure: > dftest element seqnames start end width strand tx_id tx_name 1 1 chr19 58858172 58864865 6694 - 36769 NM_130786 2 10 chr8 18248755 18258723 9969 + 16614 NM_000015 3 100 chr20 43248163 43280376 32214 - 37719 NM_000022 4 1000 chr18 25530930 25757445 226516 - 33839 NM_001792 5 10000 chr1 243651535 244006584 355050 - 4182 NM_181690 6 10000 chr1 243663021 244006584 343564 - 4183 NM_005465 1316 100302285 chr12 12264886 12264967 82 + 24050 NR_036052 1317 100302285 chr12 9392066 9392147 82 - 25034 NR_036052 1318 100302285 chr2 232578024 232578105 82