pandas | 易学教程

Python list to pandas dataframe

阅读更多关于 Python list to pandas dataframe

问题 I have a list that follows this format: a=['date name','10150425010245 name1','10150425020245 name2'] I am trying to convert this to Pandas df: newlist=[] for item in a: newlist.append(item.split(' ')) Now, convert this to df: pd.DataFrame(newlist) which results in 0 1 0 date name 1 10150425010245 name1 2 10150425020245 name2 I want to have 'date' and 'name' as header, but I can't manage to do that. Is there a more efficient way to automatically convert a list of strings into a dataframe than

Get column and row index pairs of Pandas DataFrame matching some criteria

阅读更多关于 Get column and row index pairs of Pandas DataFrame matching some criteria

问题 Suppose I have a Pandas DataFrame like following. These values are based on a distance matrix. A = pd.DataFrame([(1.0,0.8,0.6708203932499369,0.6761234037828132,0.7302967433402214), (0.8,1.0,0.6708203932499369,0.8451542547285166,0.9128709291752769), (0.6708203932499369,0.6708203932499369,1.0,0.5669467095138409,0.6123724356957946), (0.6761234037828132,0.8451542547285166,0.5669467095138409,1.0,0.9258200997725514), (0.7302967433402214,0.9128709291752769,0.6123724356957946,0.9258200997725514,1.0)

python pandas extracting numbers within text to a new column

阅读更多关于 python pandas extracting numbers within text to a new column

问题 I have the following text in column A: A hellothere_3.43 hellothere_3.9 I would like to extract only the numbers to another new column B (next to A), e.g: B 3.43 3.9 I use: str.extract('(\d.\d\d)', expand=True) but this copies only the 3.43 (i.e. the exact number of digits). Is there a way to make it more generic? Many thanks! 回答1: Use Regex. Ex: import pandas as pd df = pd.DataFrame({"A": ["hellothere_3.43", "hellothere_3.9"]}) df["B"] = df["A"].str.extract("(\d*\.?\d+)", expand=True) print

Creating a custom cumulative sum that calculates the downstream quantities given a list of locations and their order

阅读更多关于 Creating a custom cumulative sum that calculates the downstream quantities given a list of locations and their order

问题 I am trying to come up with some code that will essentially calculate the cumulative value at locations below it. Taking the cumulative sum almost accomplishes this, but some locations contribute to the same downstream point. Additionally, the most upstream points (or starting points) will not have any values contributing to them and can remain their starting value in the final cumulative DataFrame. Let's say I have the following DataFrame for each site. df = pd.DataFrame({ "Site 1": np

Gantt Chart for USGS Hydrology Data with Python?

阅读更多关于 Gantt Chart for USGS Hydrology Data with Python?

问题 I have a compiled a dataframe that contains USGS streamflow data at several different streamgages. Now I want to create a Gantt chart similar to this. Currently, my data has columns as site names and a date index as rows. Here is a sample of my data. The problem with the Gantt chart example I linked is that my data has gaps between the start and end dates that would normally define the horizontal time-lines. Many of the examples I found only account for the start and end date, but not missing

Converting a Dask column into new Dask column of type datetime

阅读更多关于 Converting a Dask column into new Dask column of type datetime

问题 I have an unparsed column in a dask dataframe (df) that I am using pandas to convert to datetime and put into a new column in the dask dataframe. However it breaks as column assignment doesn't support type DatetimeIndex. df['New Column'] = pd.to_datetime(np.array(df.index.values), format='%Y/%m/%d %H:%M') 回答1: this should work import dask.dataframe as dd # note df is a dask dataframe df['New Column'] = dd.to_datetime(df.index, format='%Y/%m/%d %H:%M') 来源： https://stackoverflow.com/questions

Filling data using .fillNA(), data pulled from Quandl

阅读更多关于 Filling data using .fillNA(), data pulled from Quandl

问题 I've pulled some stock data from Quandl for both Crude Oil prices (WTI) and Caterpillar (CAT) price. When I concatenate the two dataframes together I'm left with some NaNs. My ultimate goal is to run a .Pearsonr() to assess the correlation (along with p-values), however I can't get Pearsonr() to work because of all the Nan's. So I'm trying to clean them up. When I use the .fillNA() function it doesn't seem to be working. I've even tried .interpolate() as well as .dropna(). None of them appear

Filling data using .fillNA(), data pulled from Quandl

阅读更多关于 Filling data using .fillNA(), data pulled from Quandl

How to update a graph using matplotlib

阅读更多关于 How to update a graph using matplotlib

问题 I'm using Panda and matplotlib to draw graphs in Python. I would like a live updating gaph. Here is my code: import matplotlib.pyplot as plt import matplotlib.animation as animation import time import numpy as np import MySQLdb import pandas def animate(): conn = MySQLdb.connect(host="localhost", user="root", passwd="", db="sentiment_index", use_unicode=True, charset="utf8") c = conn.cursor() query = """ SELECT t_date , score FROM mytable where t_date BETWEEN Date_SUB(NOW(), Interval 2 DAY)

How to update a graph using matplotlib

阅读更多关于 How to update a graph using matplotlib