pandas | 易学教程

Fastest way to read huge MySQL table in python

阅读更多关于 Fastest way to read huge MySQL table in python

问题 I was trying to read a very huge MySQL table made of several millions of rows. I have used Pandas library and chunks . See the code below: import pandas as pd import numpy as np import pymysql.cursors connection = pymysql.connect(user='xxx', password='xxx', database='xxx', host='xxx') try: with connection.cursor() as cursor: query = "SELECT * FROM example_table;" chunks=[] for chunk in pd.read_sql(query, connection, chunksize = 1000): chunks.append(chunk) #print(len(chunks)) result = pd

Pandas groupby year object plotting it year over year

阅读更多关于 Pandas groupby year object plotting it year over year

问题 I want to plot 6 years of 12 month period data on one 12 month axis from Dec - Jan. import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt df = pd.Series(np.random.randn(72), index=pd.date_range('1/1/2000', periods=72, freq='M')) grouped = df.groupby(df.index.map(lambda x: x.year)) grouped.plot() So I'm getting the breaks in the lines between each year. However, what I want to do is have the year stacked over each other. Any simple and clean ways to do

Pandas groupby year object plotting it year over year

阅读更多关于 Pandas groupby year object plotting it year over year

Pandas - How to replace string with zero values in a DataFrame series?

阅读更多关于 Pandas - How to replace string with zero values in a DataFrame series?

问题 I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'. What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical

pandas - concat with columns of same categories turns to object

阅读更多关于 pandas - concat with columns of same categories turns to object

问题 I want to concatenate two dataframes with category-type columns, by first adding the missing categories to each column. df = pd.DataFrame({"a": pd.Categorical(["foo", "foo", "bar"]), "b": [1, 2, 1]}) df2 = pd.DataFrame({"a": pd.Categorical(["baz"]), "b": [1]}) df["a"] = df["a"].cat.add_categories("baz") df2["a"] = df2["a"].cat.add_categories(["foo", "bar"]) In theory the categories for both "a" columns are the same: In [33]: df.a.cat.categories Out[33]: Index(['bar', 'foo', 'baz'], dtype=

pandas - concat with columns of same categories turns to object

阅读更多关于 pandas - concat with columns of same categories turns to object

Pandas: Comparing rows within groups

阅读更多关于 Pandas: Comparing rows within groups

问题 I have a dataframe that is grouped by 'Key'. I need to compare rows within each group to identify whether I want to keep each row of the group or whether I want just one row of a group. In the condition to keep all rows of a group: if there is one row that has the color 'red' and area of '12' and shape of 'circle' AND another row (within the same group) that has a color of 'green' and an area of '13' and shape of 'square', then I want to keep all rows in that group. Otherwise if this scenario

Pandas datetime anchored offset for (-) MonthBegin doesn't work as expected

阅读更多关于 Pandas datetime anchored offset for (-) MonthBegin doesn't work as expected

问题 I need to move back to the beginning of the month but if i'm already at the beginning I want to stay there. Pandas anchored offset with n=0 is supposed to do exactly that but it doesn't produce the expected results between the anchored points for the (-) MonthBegin . For example for this pd.Timestamp('2017-01-06 00:00:00') - pd.tseries.offsets.MonthBegin(n=0) I expect to move me back to Timestamp('2017-01-01 00:00:00') but instead I get Timestamp('2017-02-01 00:00:00') What am I doing wrong?

Show first 10 rows of multi-index pandas dataframe

阅读更多关于 Show first 10 rows of multi-index pandas dataframe

问题 I have a multilevel index pandas DataFrame where the first level is year and the second level is username . I only have one column which is already sorted in a descending manner. I want to show the first 2 rows of each index level 0. What I have : count year username 2010 b 677 a 505 c 400 d 300 ... 2014 a 100 b 80 What I want : count year username 2010 b 677 a 505 2011 c 677 d 505 2012 e 677 f 505 2013 g 677 i 505 2014 h 677 j 505 回答1: Here is an answer. Maybe there is a better way to do

Passing datetime-like object to seaborn.lmplot

阅读更多关于 Passing datetime-like object to seaborn.lmplot

问题 I am trying to do a plot of values over time using seaborn linear model plot but I get the error TypeError: invalid type promotion I have read that it is not possible to plot pandas date objects, but that seems really strange given seaborn requires you pass a pandas DataFrame to the plots. Below is a simple example. Does anyone know how I can get this to work? import pandas as pd import seaborn as sns; sns.set(color_codes=True) import matplotlib.pyplot as plt date = ['1975-12-03','2008-08-20'