pandas

Fastest way to read huge MySQL table in python

二次信任 提交于 2021-02-07 07:56:00
问题 I was trying to read a very huge MySQL table made of several millions of rows. I have used Pandas library and chunks . See the code below: import pandas as pd import numpy as np import pymysql.cursors connection = pymysql.connect(user='xxx', password='xxx', database='xxx', host='xxx') try: with connection.cursor() as cursor: query = "SELECT * FROM example_table;" chunks=[] for chunk in pd.read_sql(query, connection, chunksize = 1000): chunks.append(chunk) #print(len(chunks)) result = pd

Pandas groupby year object plotting it year over year

旧街凉风 提交于 2021-02-07 07:55:05
问题 I want to plot 6 years of 12 month period data on one 12 month axis from Dec - Jan. import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt df = pd.Series(np.random.randn(72), index=pd.date_range('1/1/2000', periods=72, freq='M')) grouped = df.groupby(df.index.map(lambda x: x.year)) grouped.plot() So I'm getting the breaks in the lines between each year. However, what I want to do is have the year stacked over each other. Any simple and clean ways to do

Pandas groupby year object plotting it year over year

ぃ、小莉子 提交于 2021-02-07 07:53:06
问题 I want to plot 6 years of 12 month period data on one 12 month axis from Dec - Jan. import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt df = pd.Series(np.random.randn(72), index=pd.date_range('1/1/2000', periods=72, freq='M')) grouped = df.groupby(df.index.map(lambda x: x.year)) grouped.plot() So I'm getting the breaks in the lines between each year. However, what I want to do is have the year stacked over each other. Any simple and clean ways to do

Pandas - How to replace string with zero values in a DataFrame series?

╄→尐↘猪︶ㄣ 提交于 2021-02-07 07:50:12
问题 I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'. What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical

pandas - concat with columns of same categories turns to object

て烟熏妆下的殇ゞ 提交于 2021-02-07 07:32:36
问题 I want to concatenate two dataframes with category-type columns, by first adding the missing categories to each column. df = pd.DataFrame({"a": pd.Categorical(["foo", "foo", "bar"]), "b": [1, 2, 1]}) df2 = pd.DataFrame({"a": pd.Categorical(["baz"]), "b": [1]}) df["a"] = df["a"].cat.add_categories("baz") df2["a"] = df2["a"].cat.add_categories(["foo", "bar"]) In theory the categories for both "a" columns are the same: In [33]: df.a.cat.categories Out[33]: Index(['bar', 'foo', 'baz'], dtype=

pandas - concat with columns of same categories turns to object

£可爱£侵袭症+ 提交于 2021-02-07 07:32:33
问题 I want to concatenate two dataframes with category-type columns, by first adding the missing categories to each column. df = pd.DataFrame({"a": pd.Categorical(["foo", "foo", "bar"]), "b": [1, 2, 1]}) df2 = pd.DataFrame({"a": pd.Categorical(["baz"]), "b": [1]}) df["a"] = df["a"].cat.add_categories("baz") df2["a"] = df2["a"].cat.add_categories(["foo", "bar"]) In theory the categories for both "a" columns are the same: In [33]: df.a.cat.categories Out[33]: Index(['bar', 'foo', 'baz'], dtype=

Pandas: Comparing rows within groups

余生颓废 提交于 2021-02-07 07:26:45
问题 I have a dataframe that is grouped by 'Key'. I need to compare rows within each group to identify whether I want to keep each row of the group or whether I want just one row of a group. In the condition to keep all rows of a group: if there is one row that has the color 'red' and area of '12' and shape of 'circle' AND another row (within the same group) that has a color of 'green' and an area of '13' and shape of 'square', then I want to keep all rows in that group. Otherwise if this scenario

Pandas datetime anchored offset for (-) MonthBegin doesn't work as expected

試著忘記壹切 提交于 2021-02-07 07:16:49
问题 I need to move back to the beginning of the month but if i'm already at the beginning I want to stay there. Pandas anchored offset with n=0 is supposed to do exactly that but it doesn't produce the expected results between the anchored points for the (-) MonthBegin . For example for this pd.Timestamp('2017-01-06 00:00:00') - pd.tseries.offsets.MonthBegin(n=0) I expect to move me back to Timestamp('2017-01-01 00:00:00') but instead I get Timestamp('2017-02-01 00:00:00') What am I doing wrong?

Show first 10 rows of multi-index pandas dataframe

笑着哭i 提交于 2021-02-07 07:11:42
问题 I have a multilevel index pandas DataFrame where the first level is year and the second level is username . I only have one column which is already sorted in a descending manner. I want to show the first 2 rows of each index level 0. What I have : count year username 2010 b 677 a 505 c 400 d 300 ... 2014 a 100 b 80 What I want : count year username 2010 b 677 a 505 2011 c 677 d 505 2012 e 677 f 505 2013 g 677 i 505 2014 h 677 j 505 回答1: Here is an answer. Maybe there is a better way to do

Passing datetime-like object to seaborn.lmplot

狂风中的少年 提交于 2021-02-07 06:51:17
问题 I am trying to do a plot of values over time using seaborn linear model plot but I get the error TypeError: invalid type promotion I have read that it is not possible to plot pandas date objects, but that seems really strange given seaborn requires you pass a pandas DataFrame to the plots. Below is a simple example. Does anyone know how I can get this to work? import pandas as pd import seaborn as sns; sns.set(color_codes=True) import matplotlib.pyplot as plt date = ['1975-12-03','2008-08-20'