pandas

TypeError: data must be list or dict-like in CUDF

好久不见. 提交于 2021-02-11 12:03:47
问题 I am implementing CUDF to speed up my python process. Firstly, I import CUDF and removed multiprocessing code, and initialize variables with CUDF. After changing into CUDF it gives a dictionary error. How I can remove these loops to make effective implementation? Code import more_itertools import pandas as pd import numpy as np import itertools from os import cpu_count from sklearn.metrics import confusion_matrix, accuracy_score, roc_curve, auc import matplotlib.pyplot as plt import json

Separate pandas dataframe using sklearn's KFold

心不动则不痛 提交于 2021-02-11 09:59:10
问题 I had obtained the index of training set and testing set with code below. df = pandas.read_pickle(filepath + filename) kf = KFold(n_splits = n_splits, shuffle = shuffle, random_state = randomState) result = next(kf.split(df), None) #train can be accessed with result[0] #test can be accessed with result[1] I wonder if there is any faster way to separate them into 2 dataframe respectively with the row indexes I retrieved. 回答1: You need DataFrame.iloc for select rows by positions: Sample : np

Separate pandas dataframe using sklearn's KFold

寵の児 提交于 2021-02-11 09:58:32
问题 I had obtained the index of training set and testing set with code below. df = pandas.read_pickle(filepath + filename) kf = KFold(n_splits = n_splits, shuffle = shuffle, random_state = randomState) result = next(kf.split(df), None) #train can be accessed with result[0] #test can be accessed with result[1] I wonder if there is any faster way to separate them into 2 dataframe respectively with the row indexes I retrieved. 回答1: You need DataFrame.iloc for select rows by positions: Sample : np

Creating a grouped sorted bar plot using pandas

戏子无情 提交于 2021-02-11 09:41:23
问题 I have been trying to create a grouped sorted bar plot such as this one http://chrisalbon.com/python/matplotlib_grouped_bar_plot.html from a DataFrame created from dict by doing: food = {'Apples as fruit': 4.68, 'Berries': 7.71, 'Butter': 12.73, 'Cheese': 4.11, 'Dairy, Other': 4.97} dframe = pd.DataFrame([food]) dframe.plot(kind='bar') Apples as fruit Berries Butter Cheese Dairy, Other 0 4.68 7.71 12.73 4.11 4.97 The first group should have Apples and Berries and the second should have Butter

Converting columns with date in names to separate rows in Python

筅森魡賤 提交于 2021-02-11 09:41:08
问题 I already got answer to this question in R, wondering how this can be implemented in Python. Let's say we have a pandas DataFrame like this: import pandas as pd d = pd.DataFrame({'2019Q1':[1], '2019Q2':[2], '2019Q3':[3]}) which displays like this: 2019Q1 2019Q2 2019Q3 0 1 2 3 How can I transform it to looks like this: Year Quarter Value 2019 1 1 2019 2 2 2019 3 3 回答1: Using DataFrame.stack with DataFrame.pop and Series.str.split: df = d.stack().reset_index(level=1).rename(columns={0:'Value'})

Creating a grouped sorted bar plot using pandas

◇◆丶佛笑我妖孽 提交于 2021-02-11 09:39:45
问题 I have been trying to create a grouped sorted bar plot such as this one http://chrisalbon.com/python/matplotlib_grouped_bar_plot.html from a DataFrame created from dict by doing: food = {'Apples as fruit': 4.68, 'Berries': 7.71, 'Butter': 12.73, 'Cheese': 4.11, 'Dairy, Other': 4.97} dframe = pd.DataFrame([food]) dframe.plot(kind='bar') Apples as fruit Berries Butter Cheese Dairy, Other 0 4.68 7.71 12.73 4.11 4.97 The first group should have Apples and Berries and the second should have Butter

Replace values based on multiple conditions with groupby mean in Pandas

此生再无相见时 提交于 2021-02-11 09:38:54
问题 Say I have a dataframe as follows: df = pd.DataFrame({'date': pd.date_range(start='2013-01-01', periods=6, freq='M'), 'value': [3, 3.5, -5, 2, 7, 6.8], 'type': ['a', 'a', 'a', 'b', 'b', 'b']}) df['pct'] = df.groupby(['type'])['value'].pct_change() Ouput: date value type pct 0 2013-01-31 3.0 a NaN 1 2013-02-28 3.5 a 0.166667 2 2013-03-31 -5.0 a -2.428571 3 2013-04-30 2.0 b NaN 4 2013-05-31 7.0 b 2.500000 5 2013-06-30 6.8 b -0.028571 I want to replace the pct values which is bigger than 0.2 or

Replace values based on multiple conditions with groupby mean in Pandas

孤人 提交于 2021-02-11 09:38:27
问题 Say I have a dataframe as follows: df = pd.DataFrame({'date': pd.date_range(start='2013-01-01', periods=6, freq='M'), 'value': [3, 3.5, -5, 2, 7, 6.8], 'type': ['a', 'a', 'a', 'b', 'b', 'b']}) df['pct'] = df.groupby(['type'])['value'].pct_change() Ouput: date value type pct 0 2013-01-31 3.0 a NaN 1 2013-02-28 3.5 a 0.166667 2 2013-03-31 -5.0 a -2.428571 3 2013-04-30 2.0 b NaN 4 2013-05-31 7.0 b 2.500000 5 2013-06-30 6.8 b -0.028571 I want to replace the pct values which is bigger than 0.2 or

Python - dual y axis chart, align zero

爷,独闯天下 提交于 2021-02-11 09:18:10
问题 I'm trying to create a horizontal bar chart, with dual x axes. The 2 axes are very different in scale, 1 set goes from something like -5 to 15 (positive and negative value), the other set is more like 100 to 500 (all positive values). When I plot this, I'd like to align the 2 axes so zero shows at the same position, and only the negative values are to the left of this. Currently the set with all positive values starts at the far left, and the set with positive and negative starts in the

Python - dual y axis chart, align zero

ぐ巨炮叔叔 提交于 2021-02-11 09:17:07
问题 I'm trying to create a horizontal bar chart, with dual x axes. The 2 axes are very different in scale, 1 set goes from something like -5 to 15 (positive and negative value), the other set is more like 100 to 500 (all positive values). When I plot this, I'd like to align the 2 axes so zero shows at the same position, and only the negative values are to the left of this. Currently the set with all positive values starts at the far left, and the set with positive and negative starts in the