dataframe | 易学教程

Creating new column based on multiple possible cell possibilities across several columns

阅读更多关于 Creating new column based on multiple possible cell possibilities across several columns

问题 data[, allkneePR := Reduce(`|`, lapply(.SD, `==`, "0082")), .SDcols=PR1:PR3] Hey, I'm trying to look for different diagnoses c("0082", "0083", "0084") across a range of rows and columns in data.table (the dataset is huge). If one of the values is "0082" or "0083" or "0084" in any of the columns PR1:PR3 I want another column that indicates true. Right now this works with the above code, but I am trying to add in multiple diagnoses, not just "0082". I tried the any() function which doesn't work

How to convert “svyrep.design”’ to a data.frame?

阅读更多关于 How to convert “svyrep.design”’ to a data.frame?

问题 I'd like to convert a svyrep.design / survey.design object in R into a data frame. I'm aware that this object would be quite large. library(survey) data(api) # loads "apiclus2" sample data dclus2 <- svydesign(id=~dnum+snum, weights=~pw, data=apiclus2) The above applies weights a data frame, turning it into a survey object. dclus2 = as.data.frame(dclus2) Error message: # Error in as.data.frame.default(dclus2) : # cannot coerce class ‘c("survey.design2", "survey.design")’ to a data.frame` I'd

How to convert “svyrep.design”’ to a data.frame?

阅读更多关于 How to convert “svyrep.design”’ to a data.frame?

reshaping a data frame in pandas

阅读更多关于 reshaping a data frame in pandas

问题 Is there a simple way in pandas to reshape the following data frame: df = pd.DataFrame({'n':[1,1,2,2,1,1,2,2], 'l':['a','b','a','b','a','b','a','b'], 'v':[12,43,55,19,23,52,61,39], 'g':[0,0,0,0,1,1,1,1] }) to this format?: g a1 b1 a2 b2 0 12 43 55 19 1 23 52 61 39 回答1: In [75]: df['ln'] = df['l'] + df['n'].astype(str) In [76]: df.set_index(['g', 'ln'])['v'].unstack('ln') Out[76]: ln a1 a2 b1 b2 g 0 12 55 43 19 1 23 61 52 39 [2 rows x 4 columns] If you need that ordering then: In [77]: df.set

Merging two Dataframes in R by ID, One is the subset of the other

阅读更多关于 Merging two Dataframes in R by ID, One is the subset of the other

问题 I have 2 dataframes in R: 'dfold' with 175 variables and 'dfnew' with 75 variables. The 2 datframes are matched by a primary key (that is 'pid'). dfnew is a subset of dfold, so that all variables in dfnew are also on dfold but with updated, imputed values (no NAs anymore). At the same time dfold has more variables, and I will need them in the analysis phase. I would like to merge the 2 dataframes in dfmerge so to update common variables from dfnew --> dfold but at the same time retaining pre

Similarity between 2 dataframe columns

阅读更多关于 Similarity between 2 dataframe columns

问题 I have two dataframes and each have a column called Song. However sometimes the songs are spelled differently. How can I used difflib (or something similar) to get the Song spelling of one dataframe in a new column of the other dataframe? ex: Dataframe1 Song Artist like a virgi madonna Dataframe2 Song Rank like a virgin 2 Result Song Artist SongAlt like a virgin Madonna like a virgi 回答1: Step 1: Merge whatever can be merged In [67]: df1 Out[67]: Song Artist 0 mysong myartist 1 like a virgi

Similarity between 2 dataframe columns

阅读更多关于 Similarity between 2 dataframe columns

Create bins with awk histogram-like

阅读更多关于 Create bins with awk histogram-like

问题 Here's my input file : 1.37987 1.21448 0.624999 1.28966 1.77084 1.088 1.41667 I would like to create bins of a size of my choice to get histogram-like output, e.g. something like this for 0.1 bins, starting from 0 : 0 0.1 0 ... 0.5 0.6 0 0.6 0.7 1 ... 1.0 1.1 1 1.1 1.2 0 1.2 1.3 2 1.3 1.4 1 ... My file is too big for R, so I'm looking for an awk solution (also open to anything else that I can understand, as I'm still a Linux beginner). This was sort of already answered in this post : awk

Set the color for scatter-plot with DataFrame.plot

阅读更多关于 Set the color for scatter-plot with DataFrame.plot

问题 I am using python to plot a pandas DataFrame I set the color for plotting like this: allDf = pd.DataFrame({ 'x':[0,1,2,4,7,6], 'y':[0,3,2,4,5,7], 'a':[1,1,1,0,0,0], 'c':['red','green','blue','red','green','blue'] },index = ['p1','p2','p3','p4','p5','p6']) allDf.plot(kind='scatter',x='x',y='y',c='c') plt.show() However it doesn't work (every point has a blue color) If I changed the definition of DataFrame like this 'c':[1,2,1,2,1,2] It appears color but only black and white, I want to use blue

Set the color for scatter-plot with DataFrame.plot

阅读更多关于 Set the color for scatter-plot with DataFrame.plot