duplicates

How to break ties when comparing columns in SQL

非 Y 不嫁゛ 提交于 2020-08-10 23:56:13
问题 I am trying to delete duplicates in Postgres. I am using this as the base of my query: DELETE FROM case_file as p WHERE EXISTS ( SELECT FROM case_file as p1 WHERE p1.serial_no = p.serial_no AND p1.cfh_status_dt < p.cfh_status_dt ); It works well, except that when the dates cfh_status_dt are equal then neither of the records are removed. For rows that have the same serial_no and the date is the same, I would like to keep the one that has a registration_no (if any do, this column also has NULLS

search for cross-field duplicates in postgresql and bring back matched pairs

白昼怎懂夜的黑 提交于 2020-08-10 19:15:48
问题 I have a table of contacts. The table contains a mobile_phone column as well as a home_phone column. I'd like to fetch all pairs of duplicate contacts where a pair is two contacts sharing a phone number. Note that if contact A's mobile_phone matches contact B's home_phone, this is also a duplicate. Here is an example of three contacts that should match. contact_id|mobile_phone|home_phone|other columns such as email.......|... -------------------------------------------------------------------

search for cross-field duplicates in postgresql and bring back matched pairs

江枫思渺然 提交于 2020-08-10 19:15:08
问题 I have a table of contacts. The table contains a mobile_phone column as well as a home_phone column. I'd like to fetch all pairs of duplicate contacts where a pair is two contacts sharing a phone number. Note that if contact A's mobile_phone matches contact B's home_phone, this is also a duplicate. Here is an example of three contacts that should match. contact_id|mobile_phone|home_phone|other columns such as email.......|... -------------------------------------------------------------------

Return 0 to second instance of duplicate

旧城冷巷雨未停 提交于 2020-08-09 18:10:20
问题 I have a similar data set to the following: A B C 1 10 5 1 20 1 2 30 1 2 30 1 I'd like to add a column returning 1, until we hit a duplicate of A & B, when I need to return a 0, but only for the second instance, so: A B C D 1 10 5 1 1 20 1 1 2 30 1 1 2 30 1 0 Any help appreciated. 回答1: An option would be df$D <- as.integer(!duplicated(df[c("A", "B")])) df$D #[1] 1 1 1 0 回答2: Just a doodle with library(dplyr) : df %>% group_by(A,B) %>% mutate(D = +((1:n())==1)) Or if you want it to be zero

Pandas drop duplicates ignoring NaN

此生再无相见时 提交于 2020-08-09 08:15:06
问题 In a Pandas df, I am trying to drop duplicates across multiple columns. Lots of data per row is NaN . This is only an example, the data is a mixed bag, so many different combinations exist. df.drop_duplicates() IDnum name formNumber 1 NaN AP GROUP 028-11964 2 1364615.0 AP GROUP NaN 3 NaN AP GROUP NaN Hopeful Output: IDnum name formNumber 1 1364615.0 AP GROUP 028-11964 EDIT: If the df.drop_duplicates() looks like this, would it change the solution? : df.drop_duplicates() IDnum name formNumber

return indices of duplicated elements corresponding to the unique elements in R

↘锁芯ラ 提交于 2020-08-08 05:57:08
问题 anyone know if there's a build in function in R that can return indices of duplicated elements corresponding to the unique elements? For instance I have a vector a <- ["A","B","B","C","C"] unique(a) will give ["A","B","C"] duplicated(a) will give [F,F,T,F,T] is there a build-in function to get a vector of indices for the same length as original vector a, that shows the location a's elements in the unique vecor (which is [1,2,2,3,3] in this example)? i.e., something like the output variable

return indices of duplicated elements corresponding to the unique elements in R

橙三吉。 提交于 2020-08-08 05:56:19
问题 anyone know if there's a build in function in R that can return indices of duplicated elements corresponding to the unique elements? For instance I have a vector a <- ["A","B","B","C","C"] unique(a) will give ["A","B","C"] duplicated(a) will give [F,F,T,F,T] is there a build-in function to get a vector of indices for the same length as original vector a, that shows the location a's elements in the unique vecor (which is [1,2,2,3,3] in this example)? i.e., something like the output variable

Bash: find non-repeated elements in an array

六眼飞鱼酱① 提交于 2020-07-30 03:39:08
问题 I'm looking for a way to find non-repeated elements in an array in bash. Simple example: joined_arrays=(CVE-2015-4840 CVE-2015-4840 CVE-2015-4860 CVE-2015-4860 CVE-2016-3598) <magic> non_repeated=(CVE-2016-3598) To give context, the goal here is to end up with an array of all package update CVEs that aren't generally available via 'yum update' on a host due to being excluded. The way I came up with doing such a thing is to populate 3 preliminary arrays: available_updates=() #just what 'yum

Getting error while using itertools in Python

烂漫一生 提交于 2020-07-23 07:47:24
问题 This is the continuation of the OP1 and OP2. Specifically, the objective is to remove duplicates if more than one dict has the same content for the key paper_title . However, the line throw an error if there inconsistency in the way the list is imputed, such that if there is a combination of dict and str TypeError: string indices must be integers The complete code which generates the aforementioned error is as below: - from itertools import groupby def extract_secondary(): # test_list = [{

Getting error while using itertools in Python

ぐ巨炮叔叔 提交于 2020-07-23 07:46:04
问题 This is the continuation of the OP1 and OP2. Specifically, the objective is to remove duplicates if more than one dict has the same content for the key paper_title . However, the line throw an error if there inconsistency in the way the list is imputed, such that if there is a combination of dict and str TypeError: string indices must be integers The complete code which generates the aforementioned error is as below: - from itertools import groupby def extract_secondary(): # test_list = [{