multiple-columns

How to select and delete columns with duplicate name in pandas DataFrame

怎甘沉沦 提交于 2020-07-17 07:25:54
问题 I have a huge DataFrame , where some columns have the same names. When I try to pick a column that exists twice, (eg del df['col name'] or df2=df['col name'] ) I get an error. What can I do? 回答1: You can adress columns by index: >>> df = pd.DataFrame([[1,2],[3,4],[5,6]], columns=['a','a']) >>> df a a 0 1 2 1 3 4 2 5 6 >>> df.iloc[:,0] 0 1 1 3 2 5 Or you can rename columns, like >>> df.columns = ['a','b'] >>> df a b 0 1 2 1 3 4 2 5 6 回答2: Another solution: def remove_dup_columns(frame): keep

Divide two dataframes with python

放肆的年华 提交于 2020-05-29 10:20:16
问题 I have two dataframes : df1 and df2 df1 : TIMESTAMP eq1 eq2 eq3 2016-05-10 13:20:00 40 30 10 2016-05-10 13:40:00 40 10 20 df2 : TIMESTAMP eq1 eq2 eq3 2016-05-10 13:20:00 10 20 30 2016-05-10 13:40:00 10 20 20 I would like to divide df1 by df2 : each column of df1 by all column of df2 to get this result df3 : TIMESTAMP eq1 eq2 eq3 2016-05-10 13:20:00 40/(10+10) 30/(20+20) 10/(30+20) 2016-05-10 13:40:00 40/(10+10) 10/(20+20) 20/(30+20) Any idea please? 回答1: You can use div, but before set_index

Expanding all columns simultaneously in Power Query

安稳与你 提交于 2020-05-27 13:02:08
问题 Need help expanding all columns in a spreadsheet simultaneously using Power Query. I have transposed the spreadsheet from this: to this: Each table is a long column of values (9,000+ rows). I would like each column to be a separate ID. Expanding columns manually would be a tedious job and our team is adding data from new study participants (IDs) regularly, so I need help creating a code that can expand all columns simultaneously without having to indicate the column names (IDs) in the code.

How to merge two files based on data in multiple columns?

大兔子大兔子 提交于 2020-04-16 03:47:06
问题 I have two separate files, each containing a different number of columns which I want to merge based on data in multiple columns. file1 VMNF01000015.1 1769465 1769675 . . - Focub_II5_mimp_1 VMNF01000014.1 3225875 3226081 . . + Focub_II5_mimp_1 VMNF01000014.1 3226046 3226081 . . - Focub_II5_mimp_1 VMNF01000014.1 3585246 3585281 . . - Focub_II5_mimp_1 VMNF01000014.1 3692468 3692503 . . - Focub_II5_mimp_1 VMNF01000014.1 3715380 3715415 . . + Focub_II5_mimp_1 VMNF01000014.1 2872478 2872511 . . -

Postgres find all rows in database tables matching criteria on a given column

眉间皱痕 提交于 2020-04-16 03:18:09
问题 I am trying to write sub-queries so that I search all tables for a column named id and since there are multiple tables with id column, I want to add the condition, so that id = 3119093 . My attempt was: Select * from information_schema.tables where id = '3119093' and id IN ( Select table_name from information_schema.columns where column_name = 'id' ); This didn't work so I tried: Select * from information_schema.tables where table_name IN ( Select table_name from information_schema.columns

Encrypting a columnar transposition cipher

亡梦爱人 提交于 2020-02-24 12:00:08
问题 I'm trying to figure out how to encrypt a columnar transposition cipher in Python given a plaintext uppercase string and a number key of any length. For example, if the key is 3124 and the string is 'IHAVETWOCATS', it would organize the string like so: 3124 IHAV ETWO CATS and then return the characters in column 1 first, then column 2, etc, until finally returning the encrypted string 'HTAAWTIECVOS' . So far I know that I'll need to use an accumulator, and I've been toying with the idea of

Selecting multiple columns R vs python pandas

牧云@^-^@ 提交于 2020-01-30 03:31:28
问题 I am an R user who is currently learning Python and I am trying to replicate a method of selecting columns used in R into Python. In R, I could select multiple columns like so: df[,c(2,4:10)] In Python, I know how iloc works, but I couldn't split between a single column number and a consecutive set of them. This wouldn't work df.iloc[:,[1,3:10]] So, I'll have to drop the second column like so: df.iloc[:,1:10].drop(df.iloc[:,1:10].columns[1] , axis=1) Is there a more efficient way of

Selecting multiple columns R vs python pandas

混江龙づ霸主 提交于 2020-01-30 03:29:15
问题 I am an R user who is currently learning Python and I am trying to replicate a method of selecting columns used in R into Python. In R, I could select multiple columns like so: df[,c(2,4:10)] In Python, I know how iloc works, but I couldn't split between a single column number and a consecutive set of them. This wouldn't work df.iloc[:,[1,3:10]] So, I'll have to drop the second column like so: df.iloc[:,1:10].drop(df.iloc[:,1:10].columns[1] , axis=1) Is there a more efficient way of

SQL get number of columns in a particular row having a particular value

♀尐吖头ヾ 提交于 2020-01-24 14:27:41
问题 I know that I can get the number of rows having a particular value in a particular column by making use of the COUNT(*) function, but I want to find the number of COLUMNS in a particular row that have a particular value. Any suggestions on how to do this? I would have posted an example of what I've tried up till now, but I'm completely lost on this one... Edit 1 - Here's some sample data and the expected result: Table - trackbill | u1 | u1paid | u2 | u2paid | u3 | u3paid | u4 | u4paid | u5 |

Using awk how do I print all lines containing duplicates of specific columns?

微笑、不失礼 提交于 2020-01-17 06:56:34
问题 Input: a;3;c;1 a;4;b;2 a;5;c;1 Output: a;3;c;1 a;5;c;1 Hence, all lines which have duplicates of columns 1,3 and 4 should be printed. 回答1: If a 2-pass approach is OK: $ awk -F';' '{key=$1 FS $3 FS $4} NR==FNR{cnt[key]++;next} cnt[key]>1' file file a;3;c;1 a;5;c;1 otherwise: $ awk -F';' ' { key=$1 FS $3 FS $4; a[key,++cnt[key]]=$0 } END { for (key in cnt) if (cnt[key] > 1) for (i=1; i<=cnt[key]; i++) print a[key,i] } ' file a;3;c;1 a;5;c;1 The output order of keys in that second script will be