pandas

Syntax to use df.apply() with datetime.strptime [duplicate]

你说的曾经没有我的故事 提交于 2021-02-08 10:25:59
问题 This question already has answers here : How to convert string to datetime format in pandas python? (2 answers) Closed 2 years ago . Consider the following table 'df': date sales 0 2021-04-10 483 1 2022-02-03 226 2 2021-09-23 374 3 2021-10-17 186 4 2021-07-17 35 I would like to convert the column date that is currently a string to a date by using apply() and datetime.strptime() . I tried the following: format_date = "%Y-%m-%d" df["date_new"] = df.loc[:,"date"].apply(datetime.strptime,df.loc[:

Vectorizing an iterative function on Pandas DataFrame

一世执手 提交于 2021-02-08 10:23:08
问题 I have a dataframe where the first row is the initial condition. df = pd.DataFrame({"Year": np.arange(4), "Pop": [0.4] + [np.nan]* 3}) and a function f(x,r) = r*x*(1-x) , where r = 2 is a constant and 0 <= x <= 1 . I want to produce the following dataframe by applying the function to column Pop row-by-row iteratively. I.e., df.Pop[i] = f(df.Pop[i-1], r=2) df = pd.DataFrame({"Year": np.arange(4), "Pop": [0.4, 0.48, 4992, 0.49999872]}) Question: Is it possible to do this in a vectorized way? I

Set Xticks frequency to dataframe index

若如初见. 提交于 2021-02-08 10:21:16
问题 I currently have a dataframe that has as an index the years from 1990 to 2014 (25 rows). I want my plot to have the X axis with all the years showing. I'm using add_subplot as I plan to have 4 plots in this figure (all of them with the same X axis). To create the dataframe: import pandas as pd import numpy as np index = np.arange(1990,2015,1) columns = ['Total Population','Urban Population'] pop_plot = pd.DataFrame(index=index, columns=columns) pop_plot = df_.fillna(0) pop_plot['Total

Set Xticks frequency to dataframe index

给你一囗甜甜゛ 提交于 2021-02-08 10:19:00
问题 I currently have a dataframe that has as an index the years from 1990 to 2014 (25 rows). I want my plot to have the X axis with all the years showing. I'm using add_subplot as I plan to have 4 plots in this figure (all of them with the same X axis). To create the dataframe: import pandas as pd import numpy as np index = np.arange(1990,2015,1) columns = ['Total Population','Urban Population'] pop_plot = pd.DataFrame(index=index, columns=columns) pop_plot = df_.fillna(0) pop_plot['Total

Bokeh: Generating graphs in a loop, the output graph's file sizes keep increasing

我与影子孤独终老i 提交于 2021-02-08 10:00:35
问题 I'm using bokeh to plot 100 graph files in a loop. for k in files: # Read the log file data into a df. log_file_name = str(k) + ".csv" logged_data = pd.read_csv("csv/"+log_file_name, parse_dates=["dttm_utc"], date_parser=dateparse) new_logged_data = logged_data.set_index("dttm_utc") mean_data = new_logged_data.resample("3D", how=[np.mean]) # Extract the energy values and time stamps out into two ds. energy_data = mean_data["value"]["mean"] time_data = mean_data.index # Plotting output_file(

Bokeh: Generating graphs in a loop, the output graph's file sizes keep increasing

一曲冷凌霜 提交于 2021-02-08 09:59:12
问题 I'm using bokeh to plot 100 graph files in a loop. for k in files: # Read the log file data into a df. log_file_name = str(k) + ".csv" logged_data = pd.read_csv("csv/"+log_file_name, parse_dates=["dttm_utc"], date_parser=dateparse) new_logged_data = logged_data.set_index("dttm_utc") mean_data = new_logged_data.resample("3D", how=[np.mean]) # Extract the energy values and time stamps out into two ds. energy_data = mean_data["value"]["mean"] time_data = mean_data.index # Plotting output_file(

How to lag data by x specific days on a multi index pandas dataframe?

只谈情不闲聊 提交于 2021-02-08 09:58:56
问题 I have a dataframe that has dates, assets, and then price/volume data. I'm trying to pull in data from 7 days ago, but the issue is that I can't use shift() because my table has missing dates in it. date cusip price price_7daysago 1/1/2017 a 1 1/1/2017 b 2 1/2/2017 a 1.2 1/2/2017 b 2.3 1/8/2017 a 1.1 1 1/8/2017 b 2.2 2 I've tried creating a lambda function to try to use loc and timedelta to create this shifting, but I was only able to output empty numpy arrays: def row_delta(x, df, days,

Use Pandas to Get Multiple Tables From Webpage

柔情痞子 提交于 2021-02-08 09:57:32
问题 I am using Pandas to parse the data from the following page: http://kenpom.com/index.php?y=2014 To get the data, I am writing: dfs = pd.read_html(url) The data looks great and is perfectly parsed, except it only takes data from the 40 first rows. It seems to be a problem with the separation of the tables, that makes it so that pandas does no get all the information. How do you get pandas to get all the data from all the tables on that webpage? 回答1: The HTML of page you have posted have

Use Pandas to Get Multiple Tables From Webpage

安稳与你 提交于 2021-02-08 09:56:49
问题 I am using Pandas to parse the data from the following page: http://kenpom.com/index.php?y=2014 To get the data, I am writing: dfs = pd.read_html(url) The data looks great and is perfectly parsed, except it only takes data from the 40 first rows. It seems to be a problem with the separation of the tables, that makes it so that pandas does no get all the information. How do you get pandas to get all the data from all the tables on that webpage? 回答1: The HTML of page you have posted have

Python for merging multiple files from a directory into one single file

心不动则不痛 提交于 2021-02-08 09:49:40
问题 I need a single file with many columns(=number of files in the directory), from multiple file in the directory.. Each files has unique IDs which will not change for all files and so I need to merge these files based on that id. For example, file_1 looks like this id pool1 ABL1 1352 ABL12 1236 ABL13 1022 ABL14 815 ABL15 1591 ABL16 2703 And so as the other files the first column is same for all other files in the directory and second columns are different. I am looking for a output which looks