dataframe | 易学教程

Read multiple csv files into separate pandas dataframes

阅读更多关于 Read multiple csv files into separate pandas dataframes

问题 I've seen a few answers on reading multiple csv files into separate Pandas dataframes, and am still running into trouble. I've read my csv files and file names into a dictionary: path = os.getcwd() file_names = ['file1', 'thisisanotherfile', 'file3'] df_dict = {x: pd.read_csv('{}/{}.csv'.format(path, x)) for x in file_names} Which seems to work: print(df_dict['file1']) However what I'm looking for is a Pandas dataframe called 'file1' where I can access the data. Is it possible to get this

Set the color for scatter-plot with DataFrame.plot

阅读更多关于 Set the color for scatter-plot with DataFrame.plot

问题 I am using python to plot a pandas DataFrame I set the color for plotting like this: allDf = pd.DataFrame({ 'x':[0,1,2,4,7,6], 'y':[0,3,2,4,5,7], 'a':[1,1,1,0,0,0], 'c':['red','green','blue','red','green','blue'] },index = ['p1','p2','p3','p4','p5','p6']) allDf.plot(kind='scatter',x='x',y='y',c='c') plt.show() However it doesn't work (every point has a blue color) If I changed the definition of DataFrame like this 'c':[1,2,1,2,1,2] It appears color but only black and white, I want to use blue

Read multiple csv files into separate pandas dataframes

阅读更多关于 Read multiple csv files into separate pandas dataframes

Read multiple csv files into separate pandas dataframes

阅读更多关于 Read multiple csv files into separate pandas dataframes

pySpark adding columns from a list

阅读更多关于 pySpark adding columns from a list

问题 I have a datafame and would like to add columns to it, based on values from a list. The list of my values will vary from 3-50 values. I'm new to pySpark and I'm trying to append these values as new columns (empty) to my df. I've seen recommended code of how to add [one column][1] to a dataframe but not multiple from a list. mylist = ['ConformedLeaseRecoveryTypeId', 'ConformedLeaseStatusId', 'ConformedLeaseTypeId', 'ConformedLeaseRecoveryTypeName', 'ConformedLeaseStatusName',

R: Compare dates in two dataframes and isolate rows that match within a certain time period in R

阅读更多关于 R: Compare dates in two dataframes and isolate rows that match within a certain time period in R

问题 I have two dataframes in R: df1: ID Date Discharged 1 2014-08-04 2 2014-12-10 3 2015-01-01 df2: ID Check-in-Date 1 2013-01-02 1 2014-08-11 2 2014-12-14 2 2015-05-01 3 2012-05-06 3 2015-01-05 I need to compare df1 with df2 based on ID and see which person checked in for another appointment within 7 days of being discharged. How would I accomplish this since df2 has duplicate IDs? I'd like to create a new column in df1 with 1 if the person checked in and 0 if they didn't. I also need a new

How to create a Pandas DataFrame from a list of lists with different lengths?

阅读更多关于 How to create a Pandas DataFrame from a list of lists with different lengths?

问题 I have data in the format as follows data = [["a", "b", "c"], ["b", "c"], ["d", "e", "f", "c"]] and I would like to have a DataFrame with all unique strings as columns and binary values of occurrence as such a b c d e f 0 1 1 1 0 0 0 1 0 1 1 0 0 0 2 0 0 1 1 1 1 I have a working code using list comprehensions but it's pretty slow for large data. # vocab_list contains all the unique keys, which is obtained when reading in data from file df = pd.DataFrame([[1 if word in entry else 0 for word in

Wrap a 2 Column Data Frame in R

阅读更多关于 Wrap a 2 Column Data Frame in R

问题 Let's say I have a data frame with x values (10 in this example), and 2 columns. Is it possible to print that data frame and have it wrap its output across a desired amount of rows, rather than have it print as just x rows? An example below with 10 values: Current output: V1 V2 1 -0.54850033 2 -0.41569523 3 1.25346656 4 2.08200119 5 1.18916344 . .......... 10 0.18345154 Desired output: V1 V2 V1 V2 1 -0.54850033 6 -0.45362345 2 -0.41569523 7 1.23466542 3 1.25346656 8 2.98907097 4 2.08200119 9

How to use previous N values in pandas column to fill NaNs?

阅读更多关于 How to use previous N values in pandas column to fill NaNs?

问题 Say I have a time series data as below. df priceA priceB 0 25.67 30.56 1 34.12 28.43 2 37.14 29.08 3 Nan 34.23 4 32 Nan 5 18.75 41.1 6 Nan 45.12 7 23 39.67 8 Nan 36.45 9 36 Nan Now I want to fill NaNs in column priceA by taking mean of previous N values in the column. In this case take N=3. And for column priceB I have to fill Nan by value M rows above(current index-M). I tried to write for loop for it which is not a good practice as my data is too large. Is there a better way to do this? N=3

How to use previous N values in pandas column to fill NaNs?

阅读更多关于 How to use previous N values in pandas column to fill NaNs?