pandas | 易学教程

Read multiple csv files into separate pandas dataframes

阅读更多关于 Read multiple csv files into separate pandas dataframes

问题 I've seen a few answers on reading multiple csv files into separate Pandas dataframes, and am still running into trouble. I've read my csv files and file names into a dictionary: path = os.getcwd() file_names = ['file1', 'thisisanotherfile', 'file3'] df_dict = {x: pd.read_csv('{}/{}.csv'.format(path, x)) for x in file_names} Which seems to work: print(df_dict['file1']) However what I'm looking for is a Pandas dataframe called 'file1' where I can access the data. Is it possible to get this

Comparing two excel file with pandas

阅读更多关于 Comparing two excel file with pandas

问题 I have two excel file, A and B. A is Master copy where updated record of employee Name and Organization Name ( Name and Org ) is available. File B contains Name and Org columns with bit older record and many other columns which we are not interested in. Name Org 0 abc ddc systems 1 sdc ddc systems 2 csc ddd systems 3 rdc kbf org 4 rfc kbf org I want to do two operation on this: 1) I want to compare Excel B (column Name and Org ) with Excel A (column Name and Org ) and update file B with all

Read multiple csv files into separate pandas dataframes

阅读更多关于 Read multiple csv files into separate pandas dataframes

Comparing two excel file with pandas

阅读更多关于 Comparing two excel file with pandas

Improve speed parsing XML with elements and namespace, into Pandas

阅读更多关于 Improve speed parsing XML with elements and namespace, into Pandas

问题 So I have a 52M xml file, which consists of 115139 elements. from lxml import etree tree = etree.parse(file) root = tree.getroot() In [76]: len(root) Out[76]: 115139 I have this function that iterates over the elements within root and inserts each parsed element inside a Pandas DataFrame. def fnc_parse_xml(file, columns): start = datetime.datetime.now() df = pd.DataFrame(columns=columns) tree = etree.parse(file) root = tree.getroot() xmlns = './/{' + root.nsmap[None] + '}' for loc,e in

Improve speed parsing XML with elements and namespace, into Pandas

阅读更多关于 Improve speed parsing XML with elements and namespace, into Pandas

Date issue with scatter and LinearRegression

阅读更多关于 Date issue with scatter and LinearRegression

问题 I have two issues and I believe both are released to the date format. I have a cvs with dates and values: 2012-01-03 00:00:00 95812 2012-01-04 00:00:00 101265 ... 2016-10-21 00:00:00 93594 after i load it with read_csv I'm trying to parse the date with: X.Dated = pd.to_datetime(X.Dated, format='%Y-%m-%d %H:%M:%S', errors='raise') I also tried with: dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') X = pd.read_csv('sales.csv', parse_dates=['Dated'], date_parser=dateparse) and

Add dataframe and button to same sheet with XlsxWriter

阅读更多关于 Add dataframe and button to same sheet with XlsxWriter

问题 I am able to create an excel file with in one sheet the data from a data frame and in a second sheet a button to run a macro What I need is to have both the data from the dataframe than the button in the same sheet This is the code I found that I have tried to modify: import pandas as pd import xlsxwriter df = pd.DataFrame({'Data': [10, 20, 30, 40]}) writer = pd.ExcelWriter('hellot.xlsx', engine='xlsxwriter') worksheet = workbook.add_worksheet() #df.to_excel(writer, sheet_name='Sheet1')

Plotly: How to change the format of the values for the x axis?

阅读更多关于 Plotly: How to change the format of the values for the x axis?

问题 I need to create a graph from data with python. I took my inspiration from various website and I've made this script : import plotly.express as px import plotly.graph_objs as go import statsmodels.api as sm value = [1, 2, 3, 4, 5, 5, 5, 6, 6, 7, 8] date = [ 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] fig = px.scatter(x=date, y=value ) fig.add_trace(go.Scatter(x=date, y=value, mode='lines',name='MB Used' )) trend = sm.OLS(value,sm.add_constant(date)).fit().fittedvalues

How to create a Pandas DataFrame from a list of lists with different lengths?

阅读更多关于 How to create a Pandas DataFrame from a list of lists with different lengths?

问题 I have data in the format as follows data = [["a", "b", "c"], ["b", "c"], ["d", "e", "f", "c"]] and I would like to have a DataFrame with all unique strings as columns and binary values of occurrence as such a b c d e f 0 1 1 1 0 0 0 1 0 1 1 0 0 0 2 0 0 1 1 1 1 I have a working code using list comprehensions but it's pretty slow for large data. # vocab_list contains all the unique keys, which is obtained when reading in data from file df = pd.DataFrame([[1 if word in entry else 0 for word in