series | 易学教程

Convert pandas Series to DataFrame

阅读更多关于 Convert pandas Series to DataFrame

问题 I have a Pandas series sf: email email1@email.com [1.0, 0.0, 0.0] email2@email.com [2.0, 0.0, 0.0] email3@email.com [1.0, 0.0, 0.0] email4@email.com [4.0, 0.0, 0.0] email5@email.com [1.0, 0.0, 3.0] email6@email.com [1.0, 5.0, 0.0] And I would like to transform it to the following DataFrame: index | email | list _____________________________________________ 0 | email1@email.com | [1.0, 0.0, 0.0] 1 | email2@email.com | [2.0, 0.0, 0.0] 2 | email3@email.com | [1.0, 0.0, 0.0] 3 | email4@email.com

Combining two Series into a DataFrame in pandas

阅读更多关于 Combining two Series into a DataFrame in pandas

问题 I have two Series s1 and s2 with the same (non-consecutive) indices. How do I combine s1 and s2 to being two columns in a DataFrame and keep one of the indices as a third column? 回答1: I think concat is a nice way to do this. If they are present it uses the name attributes of the Series as the columns (otherwise it simply numbers them): In [1]: s1 = pd.Series([1, 2], index=['A', 'B'], name='s1') In [2]: s2 = pd.Series([3, 4], index=['A', 'B'], name='s2') In [3]: pd.concat([s1, s2], axis=1) Out

Print series of prime numbers in python

阅读更多关于 Print series of prime numbers in python

问题 I am trying to learn Python programming, and I\'m pretty new at this. I was having issues in printing a series of prime numbers from one to hundred. I can\'t figure our what\'s wrong with my code. Here\'s what I wrote; it prints all the odd numbers instead of primes: for num in range(1,101): for i in range(2,num): if (num%i==0): break else: print(num) break 回答1: You need to check all numbers from 2 to n-1 (to sqrt(n) actually, but ok, let it be n). If n is divisible by any of the numbers, it

Python: Pandas Series - Why use loc?

阅读更多关于 Python: Pandas Series - Why use loc?

问题 Why do we use \'loc\' for pandas dataframes? it seems the following code with or without using loc both compile anr run at a simulular speed %timeit df_user1 = df.loc[df.user_id==\'5561\'] 100 loops, best of 3: 11.9 ms per loop or %timeit df_user1_noloc = df[df.user_id==\'5561\'] 100 loops, best of 3: 12 ms per loop So why use loc? Edit: This has been flagged as a duplicate question. But although pandas iloc vs ix vs loc explanation? does mention that * you can do column retrieval just by

JFreechart series tool tip above shape annotation

阅读更多关于 JFreechart series tool tip above shape annotation

问题 I have an XYPlot on which are series and a couple of dynamically added shape annotations with no fill (hence each of the series points are visible). Is it possible to display the series tool tips(that show the coordinate of the series point over which the mouse pointer is currently pointing to) over the annotations? Or how can I re-arrange the elements in order to make the tooltip visible. 回答1: I suspect you are adding the shape annotations to the plot, where they are drawn last. Instead, add

Keep only date part when using pandas.to_datetime

阅读更多关于 Keep only date part when using pandas.to_datetime

问题 I use pandas.to_datetime to parse the dates in my data. Pandas by default represents the dates with datetime64[ns] even though the dates are all daily only. I wonder whether there is an elegant/clever way to convert the dates to datetime.date or datetime64[D] so that, when I write the data to CSV, the dates are not appended with 00:00:00 . I know I can convert the type manually element-by-element: [dt.to_datetime().date() for dt in df.dates] But this is really slow since I have many rows and

Pandas: convert categories to numbers

阅读更多关于 Pandas: convert categories to numbers

问题 Suppose I have a dataframe with countries that goes as: cc | temp US | 37.0 CA | 12.0 US | 35.0 AU | 20.0 I know that there is a pd.get_dummies function to convert the countries to \'one-hot encodings\'. However, I wish to convert them to indices instead such that I will get cc_index = [1,2,1,3] instead. I\'m assuming that there is a faster way than using the get_dummies along with a numpy where clause as shown below: [np.where(x) for x in df.cc.get_dummies().values] This is somewhat easier

Pandas: convert categories to numbers

阅读更多关于 Pandas: convert categories to numbers

Suppose I have a dataframe with countries that goes as: cc | temp US | 37.0 CA | 12.0 US | 35.0 AU | 20.0 I know that there is a pd.get_dummies function to convert the countries to 'one-hot encodings'. However, I wish to convert them to indices instead such that I will get cc_index = [1,2,1,3] instead. I'm assuming that there is a faster way than using the get_dummies along with a numpy where clause as shown below: [np.where(x) for x in df.cc.get_dummies().values] This is somewhat easier to do in R using 'factors' so I'm hoping pandas has something similar. First, change the type of the column

Strings in a DataFrame, but dtype is object

阅读更多关于 Strings in a DataFrame, but dtype is object

问题 Why does Pandas tell me that I have objects, although every item in the selected column is a string — even after explicit conversion. This is my DataFrame: <class \'pandas.core.frame.DataFrame\'> Int64Index: 56992 entries, 0 to 56991 Data columns (total 7 columns): id 56992 non-null values attr1 56992 non-null values attr2 56992 non-null values attr3 56992 non-null values attr4 56992 non-null values attr5 56992 non-null values attr6 56992 non-null values dtypes: int64(2), object(5) Five of

Pandas filtering for multiple substrings in series

阅读更多关于 Pandas filtering for multiple substrings in series

问题 I need to filter rows in a pandas dataframe so that a specific string column contains at least one of a list of provided substrings. The substrings may have unusual / regex characters. The comparison should not involve regex and is case insensitive. For example: lst = [\'kdSj;af-!?\', \'aBC+dsfa?\\-\', \'sdKaJg|dksaf-*\'] I currently apply the mask like this: mask = np.logical_or.reduce([df[col].str.contains(i, regex=False, case=False) for i in lst]) df = df[mask] My dataframe is large (~1mio