pandas | 易学教程

Filtering rows of a dataframe based on values in columns

阅读更多关于 Filtering rows of a dataframe based on values in columns

问题 I want to filter the rows of a dataframe that contains values less than ,say 10. import numpy as np import pandas as pd from pprint import pprint df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD')) df = df[df <10] gives, A B C D 0 5.0 NaN NaN NaN 1 NaN NaN NaN NaN 2 0.0 NaN 6.0 NaN 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN 5 6.0 NaN NaN NaN 6 NaN NaN NaN NaN 7 NaN NaN NaN 7.0 8 NaN NaN NaN NaN 9 NaN NaN NaN NaN Expected: 0 5 57 87 95 2 0 80 6 82 5 6 33 74 75 7 71 44 60 7

Pandas Dataframe - for each row, return count of other rows with overlapping dates

阅读更多关于 Pandas Dataframe - for each row, return count of other rows with overlapping dates

问题 I've got a dataframe with projects, start dates, and end dates. For each row I would like to return the number of other projects in process when the project started. How do you nest loops when using df.apply() ? I've tried using a for loop but my dataframe is large and it takes way too long. import datetime as dt data = {'project' :['A', 'B', 'C'], 'pr_start_date':[dt.datetime(2018, 9, 1), dt.datetime(2019, 4, 1), dt.datetime(2019, 6, 8)], 'pr_end_date': [dt.datetime(2019, 6, 15), dt.datetime

Filtering rows of a dataframe based on values in columns

阅读更多关于 Filtering rows of a dataframe based on values in columns

efficient function to find harmonic mean across different pandas dataframes

阅读更多关于 efficient function to find harmonic mean across different pandas dataframes

问题 I have several dataframes with identical shape/types, but slightly different numeric values. I can easily produce a new dataframe with the mean of all input dataframes via: df = pd.concat([input_dataframes]) df = df.groupby(df.index).mean() I want to do the same with harmonic mean (probably the scipy.stats.hmean function). I have attempted to do this using: .groupby(df.index).apply(scipy.stats.hmean) But this alters the structure of the dataframe. Is there a better way to do this, or do I

Merging Dataframe chunks in Pandas

阅读更多关于 Merging Dataframe chunks in Pandas

问题 I currently have a script that will combine multiple csv files into one, the script works fine except that we run out of ram really quickly when larger files start being used. This is an issue for one reason, the script runs on an AWS server and running out of RAM means a server crash. Currently the file size limit is around 250mb each, and that limits us to 2 files, however as the company I work is in Biotech and we're using Genetic Sequencing files, the files we use can range in size from

Merging Dataframe chunks in Pandas

阅读更多关于 Merging Dataframe chunks in Pandas

pandas merge with MultiIndex, when only one level of index is to be used as key

阅读更多关于 pandas merge with MultiIndex, when only one level of index is to be used as key

问题 I have a data frame called df1 with a 2-level MultiIndex (levels: '_Date' and _'ItemId'). There are multiple instances of each value of '_ItemId', like this: _SomeOtherLabel _Date _ItemId 2014-10-05 6588921 AA 6592520 AB 6836143 BA 2014-10-11 6588921 CA 6592520 CB 6836143 DA I have a second data frame called df2 with '_ItemId' used as a key (not the index). In this df, there is only one occurrence of each value of _ItemId: _ItemId _Cat 0 6588921 6_1 1 6592520 6_1 2 6836143 7_1 I want to

Pandas: find maximum value, when and if conditions

阅读更多关于 Pandas: find maximum value, when and if conditions

问题 I have a dataframe, df: id volume saturation time_delay_normalised speed BPR_free_speed BPR_speed Volume time_normalised 27WESTBOUND 580 0.351515152 57 6.54248366 17.88 15.91366177 580 1.59375 27WESTBOUND 588 0.356363636 100 5.107142857 17.88 15.86519847 588 2.041666667 27WESTBOUND 475 0.287878788 64 6.25625 17.88 16.51161331 475 0.666666667 27EASTBOUND 401 0.243030303 59 6.458064516 17.88 16.88283672 401 1.0914583333 27EASTBOUND 438 0.265454545 46 7.049295775 17.88 16.70300418 438 1

Python pandas: how to obtain the datatypes of objects in a mixed-datatype column?

阅读更多关于 Python pandas: how to obtain the datatypes of objects in a mixed-datatype column?

问题 Given a pandas.DataFrame with a column holding mixed datatypes, like e.g. df = pd.DataFrame({'mixed': [pd.Timestamp('2020-10-04'), 999, 'a string']}) I was wondering how to obtain the datatypes of the individual objects in the column (Series)? Suppose I want to modify all entries in the Series that are of a certain type, like multiply all integers by some factor. I could iteratively derive a mask and use it in loc , like m = np.array([isinstance(v, int) for v in df['mixed']]) df.loc[m, 'mixed

Python pandas: how to obtain the datatypes of objects in a mixed-datatype column?

阅读更多关于 Python pandas: how to obtain the datatypes of objects in a mixed-datatype column?