pandas

Python/PyTables: Is it possible to have different data types for different columns of an array?

耗尽温柔 提交于 2021-02-11 12:33:29
问题 I create an expandable earray of Nx4 columns. Some columns require float64 datatype, the others can be managed with int32. Is it possible to vary the data types among the columns? Right now I just use one (float64, below) for all, but it takes huge disk space for (>10 GB) files. For example, how can I ensure column 1-2 elements are int32 and 3-4 elements are float64 ? import tables f1 = tables.open_file("table.h5", "w") a = f1.create_earray(f1.root, "dataset_1", atom=tables.Float32Atom(),

Pandas: Adding an excel SUMIF column like =A1/SUMIF(B:B,B1,A:A)

给你一囗甜甜゛ 提交于 2021-02-11 12:29:34
问题 I have a pandas DataFrame like: pet treats lbs 0 cat 2 5.0 1 dog 1 9.9 2 snek 3 1.1 3 cat 6 4.5 4 dog 1 9.4 I would like to add a fourth column that takes each treat as a percentage of the total treats for pets of that kind. So, the treat value in row 0, divided by the sum of all treats for pets matching "cat" (and so on for each row). In Excel, I think I would do something like this: A B C D 1 cat 2 5.0 =B1/SUMIF(A:A,A1,B:B) 2 dog 1 9.9 =B2/SUMIF(A:A,A2,B:B) 3 snek 3 1.1 =B3/SUMIF(A:A,A3,B:B

Matplotlib Line vs. Bar plot DateTime axis formatting

ⅰ亾dé卋堺 提交于 2021-02-11 12:29:30
问题 I have a DataFrame with a DateTime index: import pandas as pd from random import randrange dates = pd.date_range(start="2020-02-01",end='2020-04-18',freq='1d') df = pd.DataFrame(index=dates,data=[randrange(10000) for i in range(78)] Now when I plot the data as a line plot, matplotlib produces a nicely formatted x axis: df.plot(figsize=(12,4)) However, if instead I do a bar plot, I now get something quite horrible: df.plot(kind='bar',figsize=(12,4)), This is quite disconcerting, as it is the

How to perform a groupby and transform count with a condition in pandas

邮差的信 提交于 2021-02-11 12:29:06
问题 I have the following dataframe: # Import pandas library import pandas as pd import numpy as np # data data = [['tom', 10,2,'c',100,'x'], ['tom',16 ,3,'a',100,'x'], ['tom', 22,2,'a',100,'x'], ['matt', 10,1,'c',100,'x'], ['matt', 15,5,'b',100,'x'], ['matt', 14,1,'b',100,'x']] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score','Category','Rating','Other']) df['AttemptsbyRating'] = df.groupby(by=['Rating'])['Attempts'].transform('count') df And i am then

How to perform a groupby and transform count with a condition in pandas

烂漫一生 提交于 2021-02-11 12:25:45
问题 I have the following dataframe: # Import pandas library import pandas as pd import numpy as np # data data = [['tom', 10,2,'c',100,'x'], ['tom',16 ,3,'a',100,'x'], ['tom', 22,2,'a',100,'x'], ['matt', 10,1,'c',100,'x'], ['matt', 15,5,'b',100,'x'], ['matt', 14,1,'b',100,'x']] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score','Category','Rating','Other']) df['AttemptsbyRating'] = df.groupby(by=['Rating'])['Attempts'].transform('count') df And i am then

Programmatically picking an inequality operator

怎甘沉沦 提交于 2021-02-11 12:23:23
问题 I'm trying to perform actions based on input from a config file. In the config, there will be specifications for a signal, a comparison, and a value. I'd like to translate that comparison string into a choice of inequality operator. Right now, this looks like def compute_mask(self, signal, comparator, value, df): if comparator == '<': mask = df[signal] < value elif comparator == '<=': mask = df[signal] <= value elif comparator == '=': mask = df[signal] == value elif comparator == '>=': mask =

Programmatically picking an inequality operator

我的未来我决定 提交于 2021-02-11 12:22:33
问题 I'm trying to perform actions based on input from a config file. In the config, there will be specifications for a signal, a comparison, and a value. I'd like to translate that comparison string into a choice of inequality operator. Right now, this looks like def compute_mask(self, signal, comparator, value, df): if comparator == '<': mask = df[signal] < value elif comparator == '<=': mask = df[signal] <= value elif comparator == '=': mask = df[signal] == value elif comparator == '>=': mask =

How do i make a sublist of every CSV row and put that sublist inside a list

被刻印的时光 ゝ 提交于 2021-02-11 12:22:28
问题 I'm making a sublist of every row inside a csv file but i'm getting a single list back wines_list = [] for row in wines: wines_list.append(row) print(wines_list) This returns: ['id', 'country', 'description', 'designation', 'points', 'price', 'province', 'taster_name', 'title', 'variety', 'winery', 'fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol'] But i wan't it to append

Can not Install Geopandas [closed]

心不动则不痛 提交于 2021-02-11 12:15:56
问题 Closed. This question needs debugging details. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last month . Improve this question Can not install geopandas with in python 3.7. Always this error pops up: Problem Problem in anaconda Please help! 回答1: First, based on Geopandas documentations, they recommend to use conda package manager. Secondly, Geopandas relies on other packages and their installations is

Normalize a deeply nested json in pandas

喜欢而已 提交于 2021-02-11 12:14:18
问题 I am trying to normalize a json file that looks like this (a small snippet): [{'trimestre': 'A2000', 'cours': [{"sigle":"TECH 20701", "titre":"La cybersécurité et le gestionnaire",'etudiants': [{'matricule': '22000803', 'nom': 'Boyer,André', 'note': 'C+', 'valeur': 2.3}, {'matricule': '22000829', 'nom': 'Keighan,Maylis', 'note': 'A+', 'valeur': 4.3}, {'matricule': '22000869', 'nom': 'Lahaie,Lyes', 'note': 'B+', 'valeur': 3.3}, {'matricule': '22000973', 'nom': 'Conerardy,Rawaa', 'note': 'B+',