pandas

Read multiple csv files into separate pandas dataframes

梦想的初衷 提交于 2021-02-08 07:49:11
问题 I've seen a few answers on reading multiple csv files into separate Pandas dataframes, and am still running into trouble. I've read my csv files and file names into a dictionary: path = os.getcwd() file_names = ['file1', 'thisisanotherfile', 'file3'] df_dict = {x: pd.read_csv('{}/{}.csv'.format(path, x)) for x in file_names} Which seems to work: print(df_dict['file1']) However what I'm looking for is a Pandas dataframe called 'file1' where I can access the data. Is it possible to get this

Comparing two excel file with pandas

与世无争的帅哥 提交于 2021-02-08 07:49:04
问题 I have two excel file, A and B. A is Master copy where updated record of employee Name and Organization Name ( Name and Org ) is available. File B contains Name and Org columns with bit older record and many other columns which we are not interested in. Name Org 0 abc ddc systems 1 sdc ddc systems 2 csc ddd systems 3 rdc kbf org 4 rfc kbf org I want to do two operation on this: 1) I want to compare Excel B (column Name and Org ) with Excel A (column Name and Org ) and update file B with all

Read multiple csv files into separate pandas dataframes

一世执手 提交于 2021-02-08 07:49:01
问题 I've seen a few answers on reading multiple csv files into separate Pandas dataframes, and am still running into trouble. I've read my csv files and file names into a dictionary: path = os.getcwd() file_names = ['file1', 'thisisanotherfile', 'file3'] df_dict = {x: pd.read_csv('{}/{}.csv'.format(path, x)) for x in file_names} Which seems to work: print(df_dict['file1']) However what I'm looking for is a Pandas dataframe called 'file1' where I can access the data. Is it possible to get this

Comparing two excel file with pandas

China☆狼群 提交于 2021-02-08 07:48:54
问题 I have two excel file, A and B. A is Master copy where updated record of employee Name and Organization Name ( Name and Org ) is available. File B contains Name and Org columns with bit older record and many other columns which we are not interested in. Name Org 0 abc ddc systems 1 sdc ddc systems 2 csc ddd systems 3 rdc kbf org 4 rfc kbf org I want to do two operation on this: 1) I want to compare Excel B (column Name and Org ) with Excel A (column Name and Org ) and update file B with all

Improve speed parsing XML with elements and namespace, into Pandas

馋奶兔 提交于 2021-02-08 07:39:23
问题 So I have a 52M xml file, which consists of 115139 elements. from lxml import etree tree = etree.parse(file) root = tree.getroot() In [76]: len(root) Out[76]: 115139 I have this function that iterates over the elements within root and inserts each parsed element inside a Pandas DataFrame. def fnc_parse_xml(file, columns): start = datetime.datetime.now() df = pd.DataFrame(columns=columns) tree = etree.parse(file) root = tree.getroot() xmlns = './/{' + root.nsmap[None] + '}' for loc,e in

Improve speed parsing XML with elements and namespace, into Pandas

别说谁变了你拦得住时间么 提交于 2021-02-08 07:37:23
问题 So I have a 52M xml file, which consists of 115139 elements. from lxml import etree tree = etree.parse(file) root = tree.getroot() In [76]: len(root) Out[76]: 115139 I have this function that iterates over the elements within root and inserts each parsed element inside a Pandas DataFrame. def fnc_parse_xml(file, columns): start = datetime.datetime.now() df = pd.DataFrame(columns=columns) tree = etree.parse(file) root = tree.getroot() xmlns = './/{' + root.nsmap[None] + '}' for loc,e in

Date issue with scatter and LinearRegression

ⅰ亾dé卋堺 提交于 2021-02-08 07:29:54
问题 I have two issues and I believe both are released to the date format. I have a cvs with dates and values: 2012-01-03 00:00:00 95812 2012-01-04 00:00:00 101265 ... 2016-10-21 00:00:00 93594 after i load it with read_csv I'm trying to parse the date with: X.Dated = pd.to_datetime(X.Dated, format='%Y-%m-%d %H:%M:%S', errors='raise') I also tried with: dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') X = pd.read_csv('sales.csv', parse_dates=['Dated'], date_parser=dateparse) and

Add dataframe and button to same sheet with XlsxWriter

家住魔仙堡 提交于 2021-02-08 07:29:35
问题 I am able to create an excel file with in one sheet the data from a data frame and in a second sheet a button to run a macro What I need is to have both the data from the dataframe than the button in the same sheet This is the code I found that I have tried to modify: import pandas as pd import xlsxwriter df = pd.DataFrame({'Data': [10, 20, 30, 40]}) writer = pd.ExcelWriter('hellot.xlsx', engine='xlsxwriter') worksheet = workbook.add_worksheet() #df.to_excel(writer, sheet_name='Sheet1')

Plotly: How to change the format of the values for the x axis?

ぐ巨炮叔叔 提交于 2021-02-08 07:29:34
问题 I need to create a graph from data with python. I took my inspiration from various website and I've made this script : import plotly.express as px import plotly.graph_objs as go import statsmodels.api as sm value = [1, 2, 3, 4, 5, 5, 5, 6, 6, 7, 8] date = [ 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] fig = px.scatter(x=date, y=value ) fig.add_trace(go.Scatter(x=date, y=value, mode='lines',name='MB Used' )) trend = sm.OLS(value,sm.add_constant(date)).fit().fittedvalues

How to create a Pandas DataFrame from a list of lists with different lengths?

社会主义新天地 提交于 2021-02-08 07:27:31
问题 I have data in the format as follows data = [["a", "b", "c"], ["b", "c"], ["d", "e", "f", "c"]] and I would like to have a DataFrame with all unique strings as columns and binary values of occurrence as such a b c d e f 0 1 1 1 0 0 0 1 0 1 1 0 0 0 2 0 0 1 1 1 1 I have a working code using list comprehensions but it's pretty slow for large data. # vocab_list contains all the unique keys, which is obtained when reading in data from file df = pd.DataFrame([[1 if word in entry else 0 for word in