duplicates

counting duplicates in a sorted sequence using command line tools

喜夏-厌秋 提交于 2019-12-03 03:27:48
问题 I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within this sorted list. I need to find the count for each unique number in that list. For e.g. if the output of cmd1 is: 100 100 100 99 99 26 25 24 24 I need another command that I can pipe the above output to, so that, I get: 100 3 99 2 26 1 25 1 24 2 回答1: how about; $ echo "100 100 100 99 99

How to remove duplicate words from a plain text file using linux command

好久不见. 提交于 2019-12-03 03:13:56
问题 I have a plain text file with words, which are separated by comma, for example: word1, word2, word3, word2, word4, word5, word 3, word6, word7, word3 i want to delete the duplicates and to become: word1, word2, word3, word4, word5, word6, word7 Any Ideas? I think, egrep can help me, but i'm not sure, how to use it exactly.... 回答1: Assuming that the words are one per line, and the file is already sorted: uniq filename If the file's not sorted: sort filename | uniq If they're not one per line,

How to find duplicate filenames (recursively) in a given directory? BASH

天涯浪子 提交于 2019-12-03 03:12:37
I need to find every duplicate filenames in a given dir tree. I dont know, what dir tree user will give as a script argument, so I dont know the directory hierarchy. I tried this: #!/bin/sh find -type f | while IFS= read vo do echo `basename "$vo"` done but thats not really what I want. It finds only one duplicate and then ends, even, if there are more duplicate filenames, also - it doesnt print a whole path (prints only a filename) and duplicate count. I wanted to do something similar to this command: find DIRNAME | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 " but it doenst work for me

Determine when columns of a data.frame change value and return indices of the change

半城伤御伤魂 提交于 2019-12-03 02:45:53
I am trying to find a way to determine when a set of columns changes value in a data.frame. Let me get straight to the point, please consider the following example: x<-data.frame(cnt=1:10, code=rep('ELEMENT 1',10), val0=rep(5,10), val1=rep(6,10),val2=rep(3,10)) x[4,]$val0=6 The cnt column is a unique ID (could be a date, or time column, for simplicity it's an int here) The code column is like an code for the set of rows (imagine several such groups but with different codes). The code and cnt are the keys in my data.table. The val0,val1,val2 columns are something like scores. The data.frame

Removing duplicate rows in vi?

拥有回忆 提交于 2019-12-03 00:57:03
问题 I have a text file that contains a long list of entries (one on each line). Some of these are duplicates, and I would like to know if it is possible (and if so, how) to remove any duplicates. I am interested in doing this from within vi/vim, if possible. 回答1: If you're OK with sorting your file, you can use: :sort u 回答2: Try this: :%s/^\(.*\)\(\n\1\)\+$/\1/ It searches for any line immediately followed by one or more copies of itself, and replaces it with a single copy. Make a copy of your

Remove rows when cells are equal [duplicate]

做~自己de王妃 提交于 2019-12-03 00:47:38
问题 This question already has an answer here : Find rows in a data frame where two columns are equal (1 answer) Closed 2 years ago . I have a table: df <- read.table(text=" a b 5 a a 2 c a 3 d d 2 a a 1 b d 2 ") colnames(df) <- c("Gen1","Gen2", "N") I would like to remove the rows when Gen1 = Gen2. For example I would get for this example: result <- read.table(text=" a b 5 c a 3 b d 2 ") colnames(df) <- c("Gen1","Gen2", "N") I tried with duplicated but duplicate is working per rows, not columns.

Keep only non-duplicate rows based on a Column Value [duplicate]

故事扮演 提交于 2019-12-02 23:51:40
问题 This question already has answers here : How can I remove all duplicates so that NONE are left in a data frame? (2 answers) Closed 2 years ago . This is follow up to a previous question. The dataset looks like the following: dat <- read.table(header=TRUE, text=" ID Veh oct nov dec jan feb 1120 1 7 47 152 259 140 2000 1 5 88 236 251 145 2000 2 14 72 263 331 147 1133 1 6 71 207 290 242 2000 3 7 47 152 259 140 2002 1 5 88 236 251 145 2006 1 14 72 263 331 147 2002 2 6 71 207 290 242 ") dat ID Veh

Python : How to find duplicates in a list and update these duplicate items by renaming them with a progressive letter added

China☆狼群 提交于 2019-12-02 23:17:18
问题 I have a list of items like this: ['T1','T2','T2','T2','T2','T3','T3' ] I need to make sure that duplicates are renamed with a progressive letter added like this: ['T1','T2A','T2B','T2C','T2D','T3A','T3B'] but only if there is more than 1 occurrence of the same item. Also, is it possible to do so without generating a new list? Any ideas? 回答1: from collections import Counter from string import ascii_uppercase as letters def gen(L): c = Counter(L) for elt, count in c.items(): if count == 1:

How do I create a multiple column unique constraint in SQL Server

試著忘記壹切 提交于 2019-12-02 22:44:57
I have a table that contains, for example, two fields that I want to make unique within the database. For example: create table Subscriber ( ID int not null, DataSetId int not null, Email nvarchar(100) not null, ... ) The ID column is the primary key and both DataSetId and Email are indexed. What I want to be able to do is prevent the same Email and DataSetId combination appearing in the table or, to put it another way, the Email value must be unique for a given DataSetId. I tried creating a unique index on the columns CREATE UNIQUE NONCLUSTERED INDEX IX_Subscriber_Email ON Subscriber

Remove duplicates from array comparing the properties of its objects

故事扮演 提交于 2019-12-02 22:44:37
Suppose I have a class Event, and it has 2 properties: action (NSString) and date (NSDate). And suppose I have an array of Event objects. The problem is that "date" properties can match. I need to remove the duplicates, meaning that 2 different objects with the same date IS a duplicate. I can remove duplicates in any array of strings or nsdates, they are easy to compare. But how to do it with complex objects, where their properties are to be compared? Don't ask me what I did so far, cos' the only thing coming in my mind is a bubble sort, but it's a newbie solution, and slow . Quite any help is