pandas | 易学教程

Python: pandas.cut labels are ignored

阅读更多关于 Python: pandas.cut labels are ignored

问题 I want to cut one column in my pandas.DataFrame using pandas.cut(), but the labels I put into labels argument are not applied. Let me show you an example. I have got the following data frame: >>> import pandas as pd >>> df = pd.DataFrame({'x': [-0.009, 0.089, 0.095, 0.096, 0.198]}) >>> print(df) x 0 -0.009 1 0.089 2 0.095 3 0.096 4 0.198 And I cut x column like this: >>> bins = pd.IntervalIndex.from_tuples([(-0.2, -0.1), (-0.1, 0.0), (0.0, 0.1), (0.1, 0.2)]) >>> labels = [100, 200, 300, 400]

How to convert panda df to sparse df

阅读更多关于 How to convert panda df to sparse df

问题 I have a huge sparse dataset in a dataframe and have been using df.to_sparse but it will be deprecated soon so wanted to switch to pd.Series(pd.SparseArray()) but not sure how to do that for an entire dataframe? My final df is 100K rows and 49K columns so need an automated way. 回答1: You could try something like this : dtype = {key: pd.SparseDtype(df.dtypes[key].type, fill_value=df[key].value_counts().argmax()) for key in df.dtypes.keys()} df = df.astype(dtype) And then check the density with

Adding spaces between strings after sum()

阅读更多关于 Adding spaces between strings after sum()

问题 Assuming that I have the following pandas dataframe: >>> data = pd.DataFrame({ 'X':['a','b'], 'Y':['c','d'], 'Z':['e','f']}) X Y Z 0 a c e 1 b d f The desired output is: 0 a c e 1 b d f When I run the following code, I get: >>> data.sum(axis=1) 0 ace 1 bdf So how do I add columns of strings with space between them? 回答1: Use apply per rows by axis=1 and join : a = data.apply(' '.join, axis=1) print (a) 0 a c e 1 b d f dtype: object Another solution with add spaces, sum and last str.rstrip: a =

Use keywords from dataframe to detect if any present in another dataframe or string

阅读更多关于 Use keywords from dataframe to detect if any present in another dataframe or string

问题 I have two problems: First is... I have one dataframe with category and keywords like this: Category Keywords 0 Fruit ['apple', 'pear', 'plum', 'grape'] 1 Color ['red', 'purple', 'green'] Another dataframe like this: Summary 0 This is a basket of red apples. They are sour. 1 We found a bushel of fruit. They are red. 2 There is a peck of pears that taste sweet. 3 We have a box of plums. I want the end result like this: Category Summary 0 Fruit, Color This is a basket of red apples. They are

How do I efficiently apply pandas.Timestamp functions to a full dataframe/column?

阅读更多关于 How do I efficiently apply pandas.Timestamp functions to a full dataframe/column?

问题 Pandas is a great tool for a number of data tasks. Many functions have been streamlined to efficiently be applied to columns rather than individual cells/rows. One such function is the to_datetime() function, which I use as an example later in this question. However, there are a number of commands in pandas that, as best I can tell from the documentation, do not directly relate to dataframes. The specific function I am interested in is the pandas.Timestamp.isocalendar() function, but there

reshape a pandas dataframe with multiple columns

阅读更多关于 reshape a pandas dataframe with multiple columns

问题 I have an issue in reshaping a pandas DatFrame. It looks like this (the numbers of lines and columns can vary) : columns col1 col2 col3 col4 Species sp1 218.000000 521.000000 533.000000 793.000000 sp1 0.105569 0.252300 0.258111 0.384019 sp1 2 2 2 3 sp2 225.000000 521.000000 540.000000 800.000000 sp2 0.107862 0.249760 0.258869 0.383509 sp2 2 2 2 3 sp3 217.000000 477.000000 512.000000 725.000000 sp3 0.112377 0.247022 0.265148 0.375453 sp3 1 1 3 3 The column Species is my index. I want to

How do I efficiently apply pandas.Timestamp functions to a full dataframe/column?

阅读更多关于 How do I efficiently apply pandas.Timestamp functions to a full dataframe/column?

Excel Datetime SN Conversion in Python

阅读更多关于 Excel Datetime SN Conversion in Python

问题 My csv input file sometimes has excel serial numbers in the date field. I am using the following code as my input file should never contain dates prior to 01/2000. However, this solution is quite time consuming and I am hoping to find a better solution. Thank you. def DateCorrection(x): if pd.to_datetime(x) < pd.to_datetime('2000-01-01'): return pd.to_datetime(datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(x) - 2)) else: return pd.to_datetime(x) 回答1: Assuming your input looks

Use keywords from dataframe to detect if any present in another dataframe or string

阅读更多关于 Use keywords from dataframe to detect if any present in another dataframe or string

How to efficiently load mixed-type pandas DataFrame into an Oracle DB

阅读更多关于 How to efficiently load mixed-type pandas DataFrame into an Oracle DB

问题 Happy new year everyone! I'm currently struggling with ETL performance issues as I'm trying to write larger Pandas DataFrames (1-2 mio rows, 150 columns) into an Oracle data base . Even for just 1000 rows, Panda's default to_sql() method runs well over 2 minutes (see code snippet below). My strong hypothesis is that these performance issues are in some way related to the underlying data types (mostly strings). I ran the same job on 1000 rows of random strings (benchmark: 3 min) and 1000 rows