series | 易学教程

Pandas: change data type of Series to String

阅读更多关于 Pandas: change data type of Series to String

I use Pandas 'ver 0.12.0' with Python 2.7 and have a dataframe as below: df = pd.DataFrame({'id' : [123,512,'zhub1', 12354.3, 129, 753, 295, 610], 'colour': ['black', 'white','white','white', 'black', 'black', 'white', 'white'], 'shape': ['round', 'triangular', 'triangular','triangular','square', 'triangular','round','triangular'] }, columns= ['id','colour', 'shape']) The id Series consists of some integers and strings. Its dtype by default is object . I want to convert all contents of id to strings. I tried astype(str) , which produces the output below. df['id'].astype(str) 0 1 1 5 2 z 3 1 4

Python: Pandas Series - Why use loc?

阅读更多关于 Python: Pandas Series - Why use loc?

Why do we use 'loc' for pandas dataframes? it seems the following code with or without using loc both compile anr run at a simulular speed %timeit df_user1 = df.loc[df.user_id=='5561'] 100 loops, best of 3: 11.9 ms per loop or %timeit df_user1_noloc = df[df.user_id=='5561'] 100 loops, best of 3: 12 ms per loop So why use loc? Edit: This has been flagged as a duplicate question. But although pandas iloc vs ix vs loc explanation? does mention that * you can do column retrieval just by using the data frame's getitem : * df['time'] # equivalent to df.loc[:, 'time'] it does not say why we use loc,

Print series of prime numbers in python

阅读更多关于 Print series of prime numbers in python

I am trying to learn Python programming, and I'm pretty new at this. I was having issues in printing a series of prime numbers from one to hundred. I can't figure our what's wrong with my code. Here's what I wrote; it prints all the odd numbers instead of primes: for num in range(1,101): for i in range(2,num): if (num%i==0): break else: print(num) break Igor Chubin You need to check all numbers from 2 to n-1 (to sqrt(n) actually, but ok, let it be n). If n is divisible by any of the numbers, it is not prime. If a number is prime, print it. for num in range(2,101): prime = True for i in range(2

Remove NaN from pandas series

阅读更多关于 Remove NaN from pandas series

问题 Is there a way to remove a NaN values from a panda series? I have a series that may or may not have some NaN values in it, and I'd like to return a copy of the series with all the NaNs removed. 回答1: >>> s = pd.Series([1,2,3,4,np.NaN,5,np.NaN]) >>> s[~s.isnull()] 0 1 1 2 2 3 3 4 5 5 update or even better approach as @DSM suggested in comments, using pandas.Series.dropna(): >>> s.dropna() 0 1 1 2 2 3 3 4 5 5 回答2: A small usage of np.nan ! = np.nan s[s==s] Out[953]: 0 1.0 1 2.0 2 3.0 3 4.0 5 5.0

Convert pandas data frame to series

阅读更多关于 Convert pandas data frame to series

问题 I'm somewhat new to pandas. I have a pandas data frame that is 1 row by 23 columns. I want to convert this into a series? I'm wondering what the most pythonic way to do this is? I've tried pd.Series(myResults) but it complains ValueError: cannot copy sequence with size 23 to array axis with dimension 1 . It's not smart enough to realize it's still a "vector" in math terms. Thanks! 回答1: It's not smart enough to realize it's still a "vector" in math terms. Say rather that it's smart enough to

JFreechart series tool tip above shape annotation

阅读更多关于 JFreechart series tool tip above shape annotation

I have an XYPlot on which are series and a couple of dynamically added shape annotations with no fill (hence each of the series points are visible). Is it possible to display the series tool tips(that show the coordinate of the series point over which the mouse pointer is currently pointing to) over the annotations? Or how can I re-arrange the elements in order to make the tooltip visible. I suspect you are adding the shape annotations to the plot, where they are drawn last. Instead, add them to the renderer in Layer.BACKGROUND . As shown below, the circle does not obscure the tool tip at (20,

Sorting a pandas series

阅读更多关于 Sorting a pandas series

问题 I am trying to figure out how to sort the Series generated as a result of a groupby aggregation in a smart way. I generate an aggregation of my DataFrame like this: means = df.testColumn.groupby(df.testCategory).mean() This results in a Series. I now try to sort this by value, but get an error: means.sort() ... -> Exception: This Series is a view of some other array, to sort in-place you must create a copy I then try creating a copy: meansCopy = Series(means) meansCopy.sort() -> Exception:

Pandas selecting by label sometimes return Series, sometimes returns DataFrame

阅读更多关于 Pandas selecting by label sometimes return Series, sometimes returns DataFrame

问题 In Pandas, when I select a label that only has one entry in the index I get back a Series, but when I select an entry that has more then one entry I get back a data frame. Why is that? Is there a way to ensure I always get back a data frame? In [1]: import pandas as pd In [2]: df = pd.DataFrame(data=range(5), index=[1, 2, 3, 3, 3]) In [3]: type(df.loc[3]) Out[3]: pandas.core.frame.DataFrame In [4]: type(df.loc[1]) Out[4]: pandas.core.series.Series 回答1: Granted that the behavior is

Pandas: change data type of Series to String

阅读更多关于 Pandas: change data type of Series to String

问题 I use Pandas \'ver 0.12.0\' with Python 2.7 and have a dataframe as below: df = pd.DataFrame({\'id\' : [123,512,\'zhub1\', 12354.3, 129, 753, 295, 610], \'colour\': [\'black\', \'white\',\'white\',\'white\', \'black\', \'black\', \'white\', \'white\'], \'shape\': [\'round\', \'triangular\', \'triangular\',\'triangular\',\'square\', \'triangular\',\'round\',\'triangular\'] }, columns= [\'id\',\'colour\', \'shape\']) The id Series consists of some integers and strings. Its dtype by default is

Pandas pd.Series.isin performance with set versus array

阅读更多关于 Pandas pd.Series.isin performance with set versus array

问题 In Python generally, membership of a hashable collection is best tested via set . We know this because the use of hashing gives us O(1) lookup complexity versus O(n) for list or np.ndarray . In Pandas, I often have to check for membership in very large collections. I presumed that the same would apply, i.e. checking each item of a series for membership in a set is more efficient than using list or np.ndarray . However, this doesn\'t seem to be the case: import numpy as np import pandas as pd