dataframe

Read multiple csv files into separate pandas dataframes

折月煮酒 提交于 2021-02-08 07:49:57
问题 I've seen a few answers on reading multiple csv files into separate Pandas dataframes, and am still running into trouble. I've read my csv files and file names into a dictionary: path = os.getcwd() file_names = ['file1', 'thisisanotherfile', 'file3'] df_dict = {x: pd.read_csv('{}/{}.csv'.format(path, x)) for x in file_names} Which seems to work: print(df_dict['file1']) However what I'm looking for is a Pandas dataframe called 'file1' where I can access the data. Is it possible to get this

Set the color for scatter-plot with DataFrame.plot

不羁岁月 提交于 2021-02-08 07:49:51
问题 I am using python to plot a pandas DataFrame I set the color for plotting like this: allDf = pd.DataFrame({ 'x':[0,1,2,4,7,6], 'y':[0,3,2,4,5,7], 'a':[1,1,1,0,0,0], 'c':['red','green','blue','red','green','blue'] },index = ['p1','p2','p3','p4','p5','p6']) allDf.plot(kind='scatter',x='x',y='y',c='c') plt.show() However it doesn't work (every point has a blue color) If I changed the definition of DataFrame like this 'c':[1,2,1,2,1,2] It appears color but only black and white, I want to use blue

Read multiple csv files into separate pandas dataframes

梦想的初衷 提交于 2021-02-08 07:49:11
问题 I've seen a few answers on reading multiple csv files into separate Pandas dataframes, and am still running into trouble. I've read my csv files and file names into a dictionary: path = os.getcwd() file_names = ['file1', 'thisisanotherfile', 'file3'] df_dict = {x: pd.read_csv('{}/{}.csv'.format(path, x)) for x in file_names} Which seems to work: print(df_dict['file1']) However what I'm looking for is a Pandas dataframe called 'file1' where I can access the data. Is it possible to get this

Read multiple csv files into separate pandas dataframes

一世执手 提交于 2021-02-08 07:49:01
问题 I've seen a few answers on reading multiple csv files into separate Pandas dataframes, and am still running into trouble. I've read my csv files and file names into a dictionary: path = os.getcwd() file_names = ['file1', 'thisisanotherfile', 'file3'] df_dict = {x: pd.read_csv('{}/{}.csv'.format(path, x)) for x in file_names} Which seems to work: print(df_dict['file1']) However what I'm looking for is a Pandas dataframe called 'file1' where I can access the data. Is it possible to get this

pySpark adding columns from a list

非 Y 不嫁゛ 提交于 2021-02-08 07:38:35
问题 I have a datafame and would like to add columns to it, based on values from a list. The list of my values will vary from 3-50 values. I'm new to pySpark and I'm trying to append these values as new columns (empty) to my df. I've seen recommended code of how to add [one column][1] to a dataframe but not multiple from a list. mylist = ['ConformedLeaseRecoveryTypeId', 'ConformedLeaseStatusId', 'ConformedLeaseTypeId', 'ConformedLeaseRecoveryTypeName', 'ConformedLeaseStatusName',

R: Compare dates in two dataframes and isolate rows that match within a certain time period in R

做~自己de王妃 提交于 2021-02-08 07:31:57
问题 I have two dataframes in R: df1: ID Date Discharged 1 2014-08-04 2 2014-12-10 3 2015-01-01 df2: ID Check-in-Date 1 2013-01-02 1 2014-08-11 2 2014-12-14 2 2015-05-01 3 2012-05-06 3 2015-01-05 I need to compare df1 with df2 based on ID and see which person checked in for another appointment within 7 days of being discharged. How would I accomplish this since df2 has duplicate IDs? I'd like to create a new column in df1 with 1 if the person checked in and 0 if they didn't. I also need a new

How to create a Pandas DataFrame from a list of lists with different lengths?

社会主义新天地 提交于 2021-02-08 07:27:31
问题 I have data in the format as follows data = [["a", "b", "c"], ["b", "c"], ["d", "e", "f", "c"]] and I would like to have a DataFrame with all unique strings as columns and binary values of occurrence as such a b c d e f 0 1 1 1 0 0 0 1 0 1 1 0 0 0 2 0 0 1 1 1 1 I have a working code using list comprehensions but it's pretty slow for large data. # vocab_list contains all the unique keys, which is obtained when reading in data from file df = pd.DataFrame([[1 if word in entry else 0 for word in

Wrap a 2 Column Data Frame in R

烈酒焚心 提交于 2021-02-08 06:57:23
问题 Let's say I have a data frame with x values (10 in this example), and 2 columns. Is it possible to print that data frame and have it wrap its output across a desired amount of rows, rather than have it print as just x rows? An example below with 10 values: Current output: V1 V2 1 -0.54850033 2 -0.41569523 3 1.25346656 4 2.08200119 5 1.18916344 . .......... 10 0.18345154 Desired output: V1 V2 V1 V2 1 -0.54850033 6 -0.45362345 2 -0.41569523 7 1.23466542 3 1.25346656 8 2.98907097 4 2.08200119 9

How to use previous N values in pandas column to fill NaNs?

烈酒焚心 提交于 2021-02-08 06:52:11
问题 Say I have a time series data as below. df priceA priceB 0 25.67 30.56 1 34.12 28.43 2 37.14 29.08 3 Nan 34.23 4 32 Nan 5 18.75 41.1 6 Nan 45.12 7 23 39.67 8 Nan 36.45 9 36 Nan Now I want to fill NaNs in column priceA by taking mean of previous N values in the column. In this case take N=3. And for column priceB I have to fill Nan by value M rows above(current index-M). I tried to write for loop for it which is not a good practice as my data is too large. Is there a better way to do this? N=3

How to use previous N values in pandas column to fill NaNs?

守給你的承諾、 提交于 2021-02-08 06:51:28
问题 Say I have a time series data as below. df priceA priceB 0 25.67 30.56 1 34.12 28.43 2 37.14 29.08 3 Nan 34.23 4 32 Nan 5 18.75 41.1 6 Nan 45.12 7 23 39.67 8 Nan 36.45 9 36 Nan Now I want to fill NaNs in column priceA by taking mean of previous N values in the column. In this case take N=3. And for column priceB I have to fill Nan by value M rows above(current index-M). I tried to write for loop for it which is not a good practice as my data is too large. Is there a better way to do this? N=3