pandas

Python list to pandas dataframe

我怕爱的太早我们不能终老 提交于 2021-02-08 05:10:32
问题 I have a list that follows this format: a=['date name','10150425010245 name1','10150425020245 name2'] I am trying to convert this to Pandas df: newlist=[] for item in a: newlist.append(item.split(' ')) Now, convert this to df: pd.DataFrame(newlist) which results in 0 1 0 date name 1 10150425010245 name1 2 10150425020245 name2 I want to have 'date' and 'name' as header, but I can't manage to do that. Is there a more efficient way to automatically convert a list of strings into a dataframe than

Get column and row index pairs of Pandas DataFrame matching some criteria

╄→尐↘猪︶ㄣ 提交于 2021-02-08 04:59:50
问题 Suppose I have a Pandas DataFrame like following. These values are based on a distance matrix. A = pd.DataFrame([(1.0,0.8,0.6708203932499369,0.6761234037828132,0.7302967433402214), (0.8,1.0,0.6708203932499369,0.8451542547285166,0.9128709291752769), (0.6708203932499369,0.6708203932499369,1.0,0.5669467095138409,0.6123724356957946), (0.6761234037828132,0.8451542547285166,0.5669467095138409,1.0,0.9258200997725514), (0.7302967433402214,0.9128709291752769,0.6123724356957946,0.9258200997725514,1.0)

python pandas extracting numbers within text to a new column

三世轮回 提交于 2021-02-08 04:44:23
问题 I have the following text in column A: A hellothere_3.43 hellothere_3.9 I would like to extract only the numbers to another new column B (next to A), e.g: B 3.43 3.9 I use: str.extract('(\d.\d\d)', expand=True) but this copies only the 3.43 (i.e. the exact number of digits). Is there a way to make it more generic? Many thanks! 回答1: Use Regex. Ex: import pandas as pd df = pd.DataFrame({"A": ["hellothere_3.43", "hellothere_3.9"]}) df["B"] = df["A"].str.extract("(\d*\.?\d+)", expand=True) print

Creating a custom cumulative sum that calculates the downstream quantities given a list of locations and their order

╄→尐↘猪︶ㄣ 提交于 2021-02-08 04:41:42
问题 I am trying to come up with some code that will essentially calculate the cumulative value at locations below it. Taking the cumulative sum almost accomplishes this, but some locations contribute to the same downstream point. Additionally, the most upstream points (or starting points) will not have any values contributing to them and can remain their starting value in the final cumulative DataFrame. Let's say I have the following DataFrame for each site. df = pd.DataFrame({ "Site 1": np

Gantt Chart for USGS Hydrology Data with Python?

ε祈祈猫儿з 提交于 2021-02-08 04:37:55
问题 I have a compiled a dataframe that contains USGS streamflow data at several different streamgages. Now I want to create a Gantt chart similar to this. Currently, my data has columns as site names and a date index as rows. Here is a sample of my data. The problem with the Gantt chart example I linked is that my data has gaps between the start and end dates that would normally define the horizontal time-lines. Many of the examples I found only account for the start and end date, but not missing

Converting a Dask column into new Dask column of type datetime

这一生的挚爱 提交于 2021-02-08 04:29:10
问题 I have an unparsed column in a dask dataframe (df) that I am using pandas to convert to datetime and put into a new column in the dask dataframe. However it breaks as column assignment doesn't support type DatetimeIndex. df['New Column'] = pd.to_datetime(np.array(df.index.values), format='%Y/%m/%d %H:%M') 回答1: this should work import dask.dataframe as dd # note df is a dask dataframe df['New Column'] = dd.to_datetime(df.index, format='%Y/%m/%d %H:%M') 来源: https://stackoverflow.com/questions

Filling data using .fillNA(), data pulled from Quandl

徘徊边缘 提交于 2021-02-08 03:50:12
问题 I've pulled some stock data from Quandl for both Crude Oil prices (WTI) and Caterpillar (CAT) price. When I concatenate the two dataframes together I'm left with some NaNs. My ultimate goal is to run a .Pearsonr() to assess the correlation (along with p-values), however I can't get Pearsonr() to work because of all the Nan's. So I'm trying to clean them up. When I use the .fillNA() function it doesn't seem to be working. I've even tried .interpolate() as well as .dropna(). None of them appear

Filling data using .fillNA(), data pulled from Quandl

て烟熏妆下的殇ゞ 提交于 2021-02-08 03:49:22
问题 I've pulled some stock data from Quandl for both Crude Oil prices (WTI) and Caterpillar (CAT) price. When I concatenate the two dataframes together I'm left with some NaNs. My ultimate goal is to run a .Pearsonr() to assess the correlation (along with p-values), however I can't get Pearsonr() to work because of all the Nan's. So I'm trying to clean them up. When I use the .fillNA() function it doesn't seem to be working. I've even tried .interpolate() as well as .dropna(). None of them appear

How to update a graph using matplotlib

泄露秘密 提交于 2021-02-08 03:48:53
问题 I'm using Panda and matplotlib to draw graphs in Python. I would like a live updating gaph. Here is my code: import matplotlib.pyplot as plt import matplotlib.animation as animation import time import numpy as np import MySQLdb import pandas def animate(): conn = MySQLdb.connect(host="localhost", user="root", passwd="", db="sentiment_index", use_unicode=True, charset="utf8") c = conn.cursor() query = """ SELECT t_date , score FROM mytable where t_date BETWEEN Date_SUB(NOW(), Interval 2 DAY)

How to update a graph using matplotlib

☆樱花仙子☆ 提交于 2021-02-08 03:48:52
问题 I'm using Panda and matplotlib to draw graphs in Python. I would like a live updating gaph. Here is my code: import matplotlib.pyplot as plt import matplotlib.animation as animation import time import numpy as np import MySQLdb import pandas def animate(): conn = MySQLdb.connect(host="localhost", user="root", passwd="", db="sentiment_index", use_unicode=True, charset="utf8") c = conn.cursor() query = """ SELECT t_date , score FROM mytable where t_date BETWEEN Date_SUB(NOW(), Interval 2 DAY)