duplicates

Python - Find same values in a list and group together a new list

[亡魂溺海] 提交于 2019-11-27 15:50:04
问题 I'm stuck figuring this out and wonder if anyone could point me in the right direction... From this list: N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5] I'm trying to create: L = [[1],[2,2],[3,3,3],[4,4,4,4],[5,5,5,5,5]] Any value which is found to be the same is grouped into it's own sublist. Here is my attempt so far, I'm thinking I should use a while loop? global n n = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5] #Sorted list l = [] #Empty list to append values to def compare(val): """ This function receives

Delete duplicate rows in two columns simultaneously [duplicate]

二次信任 提交于 2019-11-27 15:34:03
This question already has an answer here: duplicates in multiple columns 2 answers I would like to delete duplicate rows based in two collumns, instead just one. My input df : RAW.PVAL GR allrl Bak 0.05 fr EN1 B12 0.05 fg EN1 B11 0.45 fr EN2 B10 0.35 fg EN2 B066 My output: RAW.PVAL GR allrl Bak 0.05 fr EN1 B12 0.45 fg EN2 B10 0.35 fg EN2 B066 I had tried df<- subset(df, !duplicated(allrl, RAW.PVAL)) , but I do not work to delete rows with this two columns simultaneously duplicated. Thank you! If you want to use subset , you could try: subset(df, !duplicated(subset(df, select=c(allrl, RAW.PVAL)

Python: Remove Duplicate Items from Nested list

纵然是瞬间 提交于 2019-11-27 15:20:51
mylist = [[1,2],[4,5],[3,4],[4,3],[2,1],[1,2]] I want to remove duplicate items, duplicated items can be reversed. The result should be : mylist = [[1,2],[4,5],[3,4]] How do I achieve this in Python? If the Order Matters you can always use OrderedDict >>> unq_lst = OrderedDict() >>> for e in lst: unq_lst.setdefault(frozenset(e),[]).append(e) >>> map(list, unq_lst.keys()) [[1, 2], [4, 5], [3, 4]] lst=[[1,2],[4,5],[3,4],[4,3],[2,1],[1,2]] fset = set(frozenset(x) for x in lst) lst = [list(x) for x in fset] This won't preserve order from your original list, nor will it preserve order of your

Keep first row by multiple columns in an R data.table

好久不见. 提交于 2019-11-27 15:18:01
I'd like to get the first row only from a data.table, grouped by multiple columns. This is straightforward with a single column, e.g.: (dt <- data.table(x = c(1, 1, 1, 2), y = c(1, 1, 2, 2), z = c(1, 2, 1, 2))) # x y z # |1: 1 1 1 # |2: 1 1 2 # |3: 1 2 1 # |4: 2 2 2 dt[!duplicated(x)] # Remove rows 2-3 # x y z # |1: 1 1 1 # |2: 2 2 2 But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2: dt[!duplicated(x, y)] # Keeps only original data set # x y z # |1: 1 1 1 # |2: 1 1 2 # |3: 1 2 1 # |4: 2 2 2 dt[!duplicated(list(x, y))] # Same as

Group objects by multiple properties in array then sum up their values

点点圈 提交于 2019-11-27 15:07:20
Grouping elements in array by multiple properties is the closest match to my question as it indeed groups objects by multiple keys in an array. Problem is this solution doesn't sum up the properties value then remove the duplicates, it instead nests all the duplicates in a two-dimensional arrays. Expected behavior I have an array of objects which must be grouped by shape and color . var arr = [ {shape: 'square', color: 'red', used: 1, instances: 1}, {shape: 'square', color: 'red', used: 2, instances: 1}, {shape: 'circle', color: 'blue', used: 0, instances: 0}, {shape: 'square', color: 'blue',

How to remove duplicate lines from a file

懵懂的女人 提交于 2019-11-27 14:57:41
问题 I have a tool that generates tests and predicts the output. The idea is that if I have a failure I can compare the prediction to the actual output and see where they diverged. The problem is the actual output contains some lines twice, which confuses diff . I want to remove the duplicates, so that I can compare them easily. Basically, something like sort -u but without the sorting. Is there any unix command line tool that can do this? 回答1: uniq(1) SYNOPSIS uniq [OPTION]... [INPUT [OUTPUT]]

How do keep only unique words within each string in a vector

血红的双手。 提交于 2019-11-27 14:51:46
I have data that looks like this: vector = c("hello I like to code hello","Coding is fun", "fun fun fun") I want to remove duplicate words (space delimited) i.e. the output should look like vector_cleaned [1] "hello I like to code" [2] "coding is fun" [3] "fun" Split it up ( strsplit on spaces), use unique (in lapply ), and paste it back together: vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ") # [1] "hello i like to code" "coding is fun" "fun" ## OR vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L)) Update based on

Removing duplicates with unique index

强颜欢笑 提交于 2019-11-27 14:41:06
I inserted between two tables fields A,B,C,D, believing I had created a Unique Index on A,B,C,D to prevent duplicates. However I somehow simply made a normal index on those. So duplicates got inserted. It is 20 million record table. If I change my existing index from normal to unique or simply a add a new unique index for A,B,C,D will the duplicates be removed or will adding fail since unique records exist? I'd test it yet it is 30 mil records and I neither wish to mess the table up or duplicate it. Paul Spiegel If you have duplicates in your table and you use ALTER TABLE mytable ADD UNIQUE

Tree contains duplicate file entries

我的梦境 提交于 2019-11-27 14:35:19
After some issues with our hosting, we decided to move our Git repository to GitHub. So I cloned the repository and tried pushing that to GitHub. However, I stumbled upon some errors we have never encountered before: C:\repositories\appName [master]> git push -u origin master Counting objects: 54483, done. Delta compression using up to 2 threads. Compressing objects: 100% (18430/18430), done. error: object 9eac1e639bbf890f4d1d52e04c32d72d5c29082e:contains duplicate file entries fatal: Error in object fatal: sha1 file '<stdout>' write error: Invalid arguments error: failed to push some refs to

Count duplicates within an Array of Objects

∥☆過路亽.° 提交于 2019-11-27 14:30:43
I have an array of objects as follows within my server side JS: [ { "Company": "IBM" }, { "Person": "ACORD LOMA" }, { "Company": "IBM" }, { "Company": "MSFT" }, { "Place": "New York" } ] I need to iterate through this structure, detect any duplicates and then create a count of a duplicate is found along side each value. Both of the values must match to qualify as a duplicate e.g. "Company": "IBM" is not a match for "Company": "MSFT". I have the options of changing the inbound array of objects if needed. I would like the output to be an object, but am really struggling to get this to work. EDIT