pandas | 易学教程

Python/PyTables: Is it possible to have different data types for different columns of an array?

阅读更多关于 Python/PyTables: Is it possible to have different data types for different columns of an array?

问题 I create an expandable earray of Nx4 columns. Some columns require float64 datatype, the others can be managed with int32. Is it possible to vary the data types among the columns? Right now I just use one (float64, below) for all, but it takes huge disk space for (>10 GB) files. For example, how can I ensure column 1-2 elements are int32 and 3-4 elements are float64 ? import tables f1 = tables.open_file("table.h5", "w") a = f1.create_earray(f1.root, "dataset_1", atom=tables.Float32Atom(),

Pandas: Adding an excel SUMIF column like =A1/SUMIF(B:B,B1,A:A)

阅读更多关于 Pandas: Adding an excel SUMIF column like =A1/SUMIF(B:B,B1,A:A)

问题 I have a pandas DataFrame like: pet treats lbs 0 cat 2 5.0 1 dog 1 9.9 2 snek 3 1.1 3 cat 6 4.5 4 dog 1 9.4 I would like to add a fourth column that takes each treat as a percentage of the total treats for pets of that kind. So, the treat value in row 0, divided by the sum of all treats for pets matching "cat" (and so on for each row). In Excel, I think I would do something like this: A B C D 1 cat 2 5.0 =B1/SUMIF(A:A,A1,B:B) 2 dog 1 9.9 =B2/SUMIF(A:A,A2,B:B) 3 snek 3 1.1 =B3/SUMIF(A:A,A3,B:B

Matplotlib Line vs. Bar plot DateTime axis formatting

阅读更多关于 Matplotlib Line vs. Bar plot DateTime axis formatting

问题 I have a DataFrame with a DateTime index: import pandas as pd from random import randrange dates = pd.date_range(start="2020-02-01",end='2020-04-18',freq='1d') df = pd.DataFrame(index=dates,data=[randrange(10000) for i in range(78)] Now when I plot the data as a line plot, matplotlib produces a nicely formatted x axis: df.plot(figsize=(12,4)) However, if instead I do a bar plot, I now get something quite horrible: df.plot(kind='bar',figsize=(12,4)), This is quite disconcerting, as it is the

How to perform a groupby and transform count with a condition in pandas

阅读更多关于 How to perform a groupby and transform count with a condition in pandas

问题 I have the following dataframe: # Import pandas library import pandas as pd import numpy as np # data data = [['tom', 10,2,'c',100,'x'], ['tom',16 ,3,'a',100,'x'], ['tom', 22,2,'a',100,'x'], ['matt', 10,1,'c',100,'x'], ['matt', 15,5,'b',100,'x'], ['matt', 14,1,'b',100,'x']] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score','Category','Rating','Other']) df['AttemptsbyRating'] = df.groupby(by=['Rating'])['Attempts'].transform('count') df And i am then

How to perform a groupby and transform count with a condition in pandas

阅读更多关于 How to perform a groupby and transform count with a condition in pandas

Programmatically picking an inequality operator

阅读更多关于 Programmatically picking an inequality operator

问题 I'm trying to perform actions based on input from a config file. In the config, there will be specifications for a signal, a comparison, and a value. I'd like to translate that comparison string into a choice of inequality operator. Right now, this looks like def compute_mask(self, signal, comparator, value, df): if comparator == '<': mask = df[signal] < value elif comparator == '<=': mask = df[signal] <= value elif comparator == '=': mask = df[signal] == value elif comparator == '>=': mask =

Programmatically picking an inequality operator

阅读更多关于 Programmatically picking an inequality operator

How do i make a sublist of every CSV row and put that sublist inside a list

阅读更多关于 How do i make a sublist of every CSV row and put that sublist inside a list

问题 I'm making a sublist of every row inside a csv file but i'm getting a single list back wines_list = [] for row in wines: wines_list.append(row) print(wines_list) This returns: ['id', 'country', 'description', 'designation', 'points', 'price', 'province', 'taster_name', 'title', 'variety', 'winery', 'fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol'] But i wan't it to append

Can not Install Geopandas [closed]

阅读更多关于 Can not Install Geopandas [closed]

问题 Closed. This question needs debugging details. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last month . Improve this question Can not install geopandas with in python 3.7. Always this error pops up: Problem Problem in anaconda Please help! 回答1: First, based on Geopandas documentations, they recommend to use conda package manager. Secondly, Geopandas relies on other packages and their installations is

Normalize a deeply nested json in pandas

阅读更多关于 Normalize a deeply nested json in pandas

问题 I am trying to normalize a json file that looks like this (a small snippet): [{'trimestre': 'A2000', 'cours': [{"sigle":"TECH 20701", "titre":"La cybersécurité et le gestionnaire",'etudiants': [{'matricule': '22000803', 'nom': 'Boyer,AndrÃ©', 'note': 'C+', 'valeur': 2.3}, {'matricule': '22000829', 'nom': 'Keighan,Maylis', 'note': 'A+', 'valeur': 4.3}, {'matricule': '22000869', 'nom': 'Lahaie,Lyes', 'note': 'B+', 'valeur': 3.3}, {'matricule': '22000973', 'nom': 'Conerardy,Rawaa', 'note': 'B+',