dataframe

change column values in list of dataframes in R

点点圈 提交于 2021-02-04 18:03:32
问题 I have a list of 12 dataframes. the name of the list is kvish_1_10t.tables . each data frame has a column "day_mean" (always the 7's column in all the dataframes). its impotant to say that all the dataframes look exacly the same. this is an example of one of the tables: X2014_kvish_1_10t kvish keta maslul yom nefah date day_mean 1 1 10 1 1 1936 2014-09-07 00:00:00 2910.958 2 1 10 1 1 966 2014-09-07 01:00:00 2910.958 3 1 10 1 1 737 2014-09-07 02:00:00 2910.958 4 1 10 1 1 596 2014-09-07 03:00

Merge 2 columns into one in dataframe [closed]

放肆的年华 提交于 2021-02-04 16:55:12
问题 Closed. This question needs debugging details. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . Improve this question This should be simple, but I am struggling with it. I want to combine two columns in a single dataframe into one. I have separate columns for custemer ID (20227) and year (2009). I want to create a new column that has both (2009_20227). 回答1: Some alternative way with function

Save skip rows in pandas read csv

淺唱寂寞╮ 提交于 2021-02-04 16:44:27
问题 I have a list of skip rows ( say [1,5,10] --> row numbers) and when I passed this to pandas read_csv , it ignores those rows. But, I need to save these skipped rows in a different text file. I went through pandas read_csv documentation and few other articles, but have no idea how to save this into a text file. Example : Input file : a,b,c # Some Junk to Skip 1 4,5,6 # Some junk to skip 2 9,20,9 2,3,4 5,6,7 Code : skiprows = [1,3] df = pandas.read_csv(file, skip_rows = skiprows) Now output.txt

R- Changing encoding of column in dataframe?

最后都变了- 提交于 2021-02-04 15:32:29
问题 I am trying to change the encoding of a column in a dataframe. stri_enc_mark(data_updated$text) # [1] "UTF-8" "ASCII" "ASCII" "UTF-8" "ASCII" "ASCII" "UTF-8" "UTF-8" "UTF-8" # [10] "ASCII" "ASCII" "UTF-8" "ASCII" "UTF-8" "ASCII" "UTF-8" "ASCII" "UTF-8" # [19] "ASCII" "UTF-8" "ASCII" "UTF-8" "ASCII" "UTF-8" "UTF-8" "ASCII" "ASCII" # [28] "ASCII" "ASCII" "UTF-8" "ASCII" "ASCII" "ASCII" "UTF-8" "UTF-8" "ASCII" When I try to convert it, it does not throw an error, but still has no effect on the

Pandas Dataframe check if a value exists using regex

China☆狼群 提交于 2021-02-04 15:28:27
问题 I have a big dataframe and I want to check if any cell contains admin string. col1 col2 ... coln 0 323 roster_admin ... rota_user 1 542 assignment_rule_admin ... application_admin 2 123 contact_user ... configuration_manager 3 235 admin_incident ... incident_user ... ... ... ... ... I tried to use df.isin(['*admin*']).any() but it seems like isin doesn't support regex. How can I search though all columns using regex? I have avoided using loops because the dataframe contains over 10 million

Remove columns with same value from a dataframe

久未见 提交于 2021-02-04 15:08:30
问题 I've got a data frame like this one 1 1 1 K 1 K K 2 1 2 K 1 K K 3 8 3 K 1 K K 4 8 2 K 1 K K 1 1 1 K 1 K K 2 1 2 K 1 K K I want to remove all the columns with the same value, i.e K, so my result will be like this 1 1 1 1 2 1 2 1 3 8 3 1 4 8 2 1 1 1 1 1 2 1 2 1 I try to iterate in a for by columns but I didn't get anything. Any ideas? 回答1: To select columns with more than one value regardless of type: uniquelength <- sapply(d,function(x) length(unique(x))) d <- subset(d, select=uniquelength>1)

How to select duplicate rows with pandas?

空扰寡人 提交于 2021-02-04 10:59:23
问题 I have a dataframe like this: import pandas as pd dic = {'A':[100,200,250,300], 'B':['ci','ci','po','pa'], 'C':['s','t','p','w']} df = pd.DataFrame(dic) My goal is to separate the row in 2 dataframes: df1 = contains all the rows that do not repeat values along column B (unque rows). df2 = containts only the rows who repeat themeselves. The result should look like this: df1 = A B C df2 = A B C 0 250 po p 0 100 ci s 1 300 pa w 1 250 ci t Note: the dataframes could be in general very big and

Correlation between two dataframes by row

瘦欲@ 提交于 2021-02-04 10:22:13
问题 I have 2 data frames w/ 5 columns and 100 rows each. id price1 price2 price3 price4 price5 1 11.22 25.33 66.47 53.76 77.42 2 33.56 33.77 44.77 34.55 57.42 ... I would like to get the correlation of the corresponding rows, basically for(i in 1:100){ cor(df1[i, 1:5], df2[i, 1:5]) } but without using a for-loop. I'm assuming there's someway to use plyr to do it but can't seem to get it right. Any suggestions? 回答1: Depending on whether you want a cool or fast solution you can use either diag(cor

Is there an R dplyr method for merge with all=TRUE?

泄露秘密 提交于 2021-02-04 09:57:29
问题 I have two R dataframes I want to merge. In straight R you can do: cost <- data.frame(farm=c('farm A', 'office'), cost=c(10, 100)) trees <- data.frame(farm=c('farm A', 'farm B'), trees=c(20,30)) merge(cost, trees, all=TRUE) which produces: farm cost trees 1 farm A 10 20 2 office 100 NA 3 farm B NA 30 I am using dplyr , and would prefer a solution such as: left_join(cost, trees) which produces something close to what I want: farm cost trees 1 farm A 10 20 2 office 100 NA In dplyr I can see

Is there an R dplyr method for merge with all=TRUE?

怎甘沉沦 提交于 2021-02-04 09:57:06
问题 I have two R dataframes I want to merge. In straight R you can do: cost <- data.frame(farm=c('farm A', 'office'), cost=c(10, 100)) trees <- data.frame(farm=c('farm A', 'farm B'), trees=c(20,30)) merge(cost, trees, all=TRUE) which produces: farm cost trees 1 farm A 10 20 2 office 100 NA 3 farm B NA 30 I am using dplyr , and would prefer a solution such as: left_join(cost, trees) which produces something close to what I want: farm cost trees 1 farm A 10 20 2 office 100 NA In dplyr I can see