duplicates

Removing duplicated rows but keep the ones with a particular value in one column (pandas python)

眉间皱痕 提交于 2019-12-24 16:06:40
问题 I would like to do the following: If two rows have exactly the same value in 3 columns ("ID","symbol", and "date") and have either "X" or "T" in one column ("message"), then remove both of these rows. However, if two rows have the same value in the same 3 columns but a value different than "X" or "T" in the other column, then leave intact. Here is an example of my data frame: df = pd.DataFrame({"ID":["AA-1", "AA-1", "C-0" ,"BB-2", "BB-2"], "symbol":["A","A","C","B","B"], "date":["06/24/2014",

MySQL giving duplicate Entry error when trying to increment date field?

心不动则不痛 提交于 2019-12-24 15:27:08
问题 I am reading in data from an XML file. Due to an error at the source it is one day out, so after loading into the database I use this SQL statement to increment the date. UPDATE 2011_electricity SET DATE = DATE_ADD( DATE, INTERVAL 1 DAY ) Last week it worked fine, however now I get an error: MySQL said: #1062 - Duplicate entry '2011-07-20' for key 1 I have one primary key on the data field. This is how the database looks: date energy daynum 2011-06-29 0.05 4197 2011-07-19 0.20 4219 2011-07-20

How to find duplicate files in large filesystem whilst avoiding MemoryError

橙三吉。 提交于 2019-12-24 15:19:22
问题 I am trying to avoid duplicates in my mp3 collection (quite large). I want to check for duplicates by checking file contents, instead of looking for same file name. I have written the code below to do this but it throws a MemoryError after about a minute. Any suggestions on how I can get this to work? import os import hashlib walk = os.walk('H:\MUSIC NEXT GEN') mySet = set() dupe = [] hasher = hashlib.md5() for dirpath, subdirs, files in walk: for f in files: fileName = os.path.join(dirpath,

SQL: Counting and Numbering Duplicates - Optimising Correlated Subquery

北城余情 提交于 2019-12-24 14:44:28
问题 In an SQLite database I have one table where I need to count the duplicates across certain columns (i.e. rows where 3 particular columns are the same) and then also number each of these cases (i.e. if there are 2 occurrences of a particular duplicate, they need to be numbered as 1 and 2). I'm finding it a bit difficult to explain in words so I'll use a simplified example below. The data I have is similar to the following (first line is header row, table is referenced in following as

SQL: Counting and Numbering Duplicates - Optimising Correlated Subquery

旧时模样 提交于 2019-12-24 14:32:04
问题 In an SQLite database I have one table where I need to count the duplicates across certain columns (i.e. rows where 3 particular columns are the same) and then also number each of these cases (i.e. if there are 2 occurrences of a particular duplicate, they need to be numbered as 1 and 2). I'm finding it a bit difficult to explain in words so I'll use a simplified example below. The data I have is similar to the following (first line is header row, table is referenced in following as

Filter rows having duplicate IDs [duplicate]

淺唱寂寞╮ 提交于 2019-12-24 14:18:33
问题 This question already has answers here : Finding ALL duplicate rows, including “elements with smaller subscripts” (5 answers) Closed 2 years ago . My data is like this: dat <- read.table(header=TRUE, text=" ID Veh oct nov dec jan feb 1120 1 7 47 152 259 140 2000 1 5 88 236 251 145 2000 2 14 72 263 331 147 1133 1 6 71 207 290 242 2000 3 7 47 152 259 140 2002 1 5 88 236 251 145 2006 1 14 72 263 331 147 2002 2 6 71 207 290 242 ") dat ID Veh oct nov dec jan feb 1 1120 1 7 47 152 259 140 2 2000 1

What is the best way to delete duplicate values from MySQL Table?

≡放荡痞女 提交于 2019-12-24 13:50:30
问题 I have the following SQL to delete duplicate values form a table, DELETE p1 FROM `ProgramsList` p1, `ProgramsList` p2 WHERE p1.CustId = p2.CustId AND p1.CustId = 1 AND p1.`Id`>p2.`Id` AND p1.`ProgramName` = p2.`ProgramName`; Id is auto incremental for a given CustId ProgramName must be unique (currently it is not) The above SQL takes about 4 to 5 hours to complete with about 1,000,000 records Could anyone suggest a quicker way of deleting duplicates from a table? 回答1: First, You might try

Is it a Bad Practice to Have Both a .rvmrc and a .ruby-version in a Ruby Project?

五迷三道 提交于 2019-12-24 12:54:02
问题 There are two Ruby projects I am currently working on which have both a .rvmrc and a .ruby-version file in their root dir. I use rvm to manage my Ruby versions in my local development environment, and have my own local .rvmrc files in my home directory's copy of various Ruby versions, so naturally I get the rvm warning when I change directory into these projects: You are using '.rvmrc', it requires trusting, it is slower and it is not compatible with other ruby managers, you can switch to '

Counting duplicate values

可紊 提交于 2019-12-24 12:42:11
问题 I am trying to return the number of times a duplicate value occurs in a column in excel For instance: Column A | Column B 12345678 | Return 1 12345678 | Return 2 12345678 | Return 3 23456789 | Return 1 23456789 | Return 2 34567891 | Return 1 I should have made my example better, this would be the the dupes are lumped together. In my case they are not. Column A | Column B 12345678 | Return 1 23456789 | Return 1 12345678 | Return 2 23456789 | Return 2 34567891 | Return 1 12345678 | Return 3 回答1

SQL two criteria from one group-by

Deadly 提交于 2019-12-24 12:34:59
问题 I have a table with some "functionally duplicate" records - different IDs, but the 4 columns of "user data" (of even more columns) are identical. I've got a query working that will select all records that have such duplicates. Now I want to select, from each group of duplicates, first any of them that have column A not null - and I've verified from the data that there are at most 1 such rows per group - and if there are none in this particular group, then the minimum of column ID . How do I