pandas

Creating a Pandas dataframe from elements of a dictionary

非 Y 不嫁゛ 提交于 2021-02-07 14:46:39
问题 I'm trying to create a pandas dataframe from a dictionary. The dictionary is set up as nvalues = {"y1": [1, 2, 3, 4], "y2": [5, 6, 7, 8], "y3": [a, b, c, d]} I would like the dataframe to include only "y1" and " y2" . So far I can accomplish this using df = pd.DataFrame.from_dict(nvalues) df.drop("y3", axis=1, inplace=True) I would like to know if it is possible to accomplish this without having df.drop() 回答1: You can specify columns in the DataFrame constructor: pd.DataFrame(nvalues, columns

Conditional formatting for 2- or 3-scale coloring of cells of a table

那年仲夏 提交于 2021-02-07 14:45:56
问题 I would like to output a simple table to a PDF file with some conditional formatting of 2- or 3-scale coloring of cells dependent on the value. Like the red-white-green color scaling in Microsoft Excel conditional formatting option. import pandas import numpy as np df = pandas.DataFrame(np.random.randn(10, 2), columns=list('ab')) print df #Output: a b 0 -1.625192 -0.949186 1 -0.089884 0.825922 2 2.117651 -0.046258 3 -0.921751 -0.144447 4 -0.294095 -1.774725 5 -0.780523 -0.435909 6 0.544958 0

Conditional formatting for 2- or 3-scale coloring of cells of a table

旧巷老猫 提交于 2021-02-07 14:45:40
问题 I would like to output a simple table to a PDF file with some conditional formatting of 2- or 3-scale coloring of cells dependent on the value. Like the red-white-green color scaling in Microsoft Excel conditional formatting option. import pandas import numpy as np df = pandas.DataFrame(np.random.randn(10, 2), columns=list('ab')) print df #Output: a b 0 -1.625192 -0.949186 1 -0.089884 0.825922 2 2.117651 -0.046258 3 -0.921751 -0.144447 4 -0.294095 -1.774725 5 -0.780523 -0.435909 6 0.544958 0

Conditional formatting for 2- or 3-scale coloring of cells of a table

和自甴很熟 提交于 2021-02-07 14:44:25
问题 I would like to output a simple table to a PDF file with some conditional formatting of 2- or 3-scale coloring of cells dependent on the value. Like the red-white-green color scaling in Microsoft Excel conditional formatting option. import pandas import numpy as np df = pandas.DataFrame(np.random.randn(10, 2), columns=list('ab')) print df #Output: a b 0 -1.625192 -0.949186 1 -0.089884 0.825922 2 2.117651 -0.046258 3 -0.921751 -0.144447 4 -0.294095 -1.774725 5 -0.780523 -0.435909 6 0.544958 0

Seaborn/Matplotlib: how to access line values in FacetGrid?

 ̄綄美尐妖づ 提交于 2021-02-07 14:27:29
问题 I'm trying to shade the area between two lines in a Seaborn FacetGrid. The fill_between method will do this, but I need to access the values of each line in each subplot to pass them in. Here's my code: import pandas as pd import matplotlib.pyplot as plt import seaborn as sns data = [{'Change': 0.0, 'Language': 'Algonquin', 'Type': 'Mother tongue', 'Year': '2011'}, {'Change': 0.0, 'Language': 'Algonquin', 'Type': 'Spoken at home', 'Year': '2011'}, {'Change': -21.32, 'Language': 'Algonquin',

Seaborn/Matplotlib: how to access line values in FacetGrid?

僤鯓⒐⒋嵵緔 提交于 2021-02-07 14:25:33
问题 I'm trying to shade the area between two lines in a Seaborn FacetGrid. The fill_between method will do this, but I need to access the values of each line in each subplot to pass them in. Here's my code: import pandas as pd import matplotlib.pyplot as plt import seaborn as sns data = [{'Change': 0.0, 'Language': 'Algonquin', 'Type': 'Mother tongue', 'Year': '2011'}, {'Change': 0.0, 'Language': 'Algonquin', 'Type': 'Spoken at home', 'Year': '2011'}, {'Change': -21.32, 'Language': 'Algonquin',

Python: Extracting XML to DataFrame (Pandas)

可紊 提交于 2021-02-07 14:21:22
问题 a have an XML file that looks like this: <?xml version="1.0" encoding="utf-8"?> <comments> <row Id="1" PostId="2" Score="0" Text="(...)" CreationDate="2011-08-30T21:15:28.063" UserId="16" /> <row Id="2" PostId="17" Score="1" Text="(...)" CreationDate="2011-08-30T21:24:56.573" UserId="27" /> <row Id="3" PostId="26" Score="0" Text="(...)" UserId="9" /> </comments> What I'm trying to do is to extract ID, Text and CreationDate colums into pandas DF and I've tryied following: import xml.etree

Pandas Datetime Interval Resample to Seconds

人盡茶涼 提交于 2021-02-07 14:20:15
问题 Given the following dataframe: import pandas as pd pd.DataFrame({"start": ["2017-01-01 13:09:01", "2017-01-01 13:09:07", "2017-01-01 13:09:12"], "end": ["2017-01-01 13:09:05", "2017-01-01 13:09:09", "2017-01-01 13:09:14"], "status": ["OK", "ERROR", "OK"]}) HAVE: | start | end | status | |---------------------|---------------------|--------| | 2017-01-01 13:09:01 | 2017-01-01 13:09:05 | OK | | 2017-01-01 13:09:07 | 2017-01-01 13:09:09 | ERROR | | 2017-01-01 13:09:12 | 2017-01-01 13:09:14 | OK

How to calculate a percentile ranking of a column of data relative to another column using python

梦想的初衷 提交于 2021-02-07 14:19:44
问题 I have two columns of data representing the same quantity; one column is from my training data, the other is from my validation data. I know how to calculate the percentile rankings of the training data efficiently using: pandas.DataFrame(training_data).rank(pct = True).values My question is, how can I efficiently get a similar set of percentile rankings of the validation data column relative to the training data column? That is, for each value in the validation data column, how can I find

pandas groupby: can I select an agg function by one level of a column MultiIndex?

自闭症网瘾萝莉.ら 提交于 2021-02-07 14:16:14
问题 I have a pandas DataFrame with a MultiIndex of columns: columns=pd.MultiIndex.from_tuples( [(c, i) for c in ['a', 'b'] for i in range(3)]) df = pd.DataFrame(np.random.randn(4, 6), index=[0, 0, 1, 1], columns=columns) print(df) a b 0 1 2 0 1 2 0 0.582804 0.753118 -0.900950 -0.914657 -0.333091 -0.965912 0 0.498002 -0.842624 0.155783 0.559730 -0.300136 -1.211412 1 0.727019 1.522160 1.679025 1.738350 0.593361 0.411907 1 1.253759 -0.806279 -2.177582 -0.099210 -0.839822 -0.211349 I want to group by