series | 易学教程

Python Pandas removing substring using another column

阅读更多关于 Python Pandas removing substring using another column

I've tried searching around and can't figure out an easy way to do this, so I'm hoping your expertise can help. I have a pandas data frame with two columns import numpy as np import pandas as pd pd.options.display.width = 1000 testing = pd.DataFrame({'NAME':[ 'FIRST', np.nan, 'NAME2', 'NAME3', 'NAME4', 'NAME5', 'NAME6'], 'FULL_NAME':['FIRST LAST', np.nan, 'FIRST LAST', 'FIRST NAME3', 'FIRST NAME4 LAST', 'ANOTHER NAME', 'LAST NAME']}) which gives me FULL_NAME NAME 0 FIRST LAST FIRST 1 NaN NaN 2 FIRST LAST NAME2 3 FIRST NAME3 NAME3 4 FIRST NAME4 LAST NAME4 5 ANOTHER NAME NAME5 6 LAST NAME NAME6

Python Reindex Producing Nan

阅读更多关于 Python Reindex Producing Nan

问题 Here is the code that I am working with: import pandas as pd test3 = pd.Series([1,2,3], index = ['a','b','c']) test3 = test3.reindex(index = ['f','g','z']) So originally every thing is fine and test3 has an index of 'a' 'b' 'c' and values 1,2,3. But then when I got to reindex test3 I get that my values 1 2 3 are lost. Why is that? The desired output would be: f 1 g 2 z 3 回答1: The docs are clear on this behaviour : Conform Series to new index with optional filling logic, placing NA/NaN in

highcharts: dynamically define colors in pie chart

阅读更多关于 highcharts: dynamically define colors in pie chart

问题 I'm trying to dynamically define color for each seria depending of their type. Below is my code which doesn't work but showing what I'm trying to do. I would like to define colour for certain type eg: if type = 'IS' then color = '#FFCACA' I cannot expect that ret will have all types so I need to know which types are returned in ret and then asociate color to certain type. How to do that? this is code since data received: success: function (ret) { $(function () { var chart; $(document).ready

Python Pandas iterate over rows and access column names

阅读更多关于 Python Pandas iterate over rows and access column names

I am trying to iterate over the rows of a Python Pandas dataframe. Within each row of the dataframe, I am trying to to refer to each value along a row by its column name. Here is what I have: import numpy as np import pandas as pd df = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD')) print df A B C D 0 0.351741 0.186022 0.238705 0.081457 1 0.950817 0.665594 0.671151 0.730102 2 0.727996 0.442725 0.658816 0.003515 3 0.155604 0.567044 0.943466 0.666576 4 0.056922 0.751562 0.135624 0.597252 5 0.577770 0.995546 0.984923 0.123392 6 0.121061 0.490894 0.134702 0.358296 7 0.895856 0.617628 0

Pandas mask / where methods versus NumPy np.where

阅读更多关于 Pandas mask / where methods versus NumPy np.where

问题 I often use Pandas mask and where methods for cleaner logic when updating values in a series conditionally. However, for relatively performance-critical code I notice a significant performance drop relative to numpy.where. While I'm happy to accept this for specific cases, I'm interested to know: Do Pandas mask / where methods offer any additional functionality, apart from inplace / errors / try-cast parameters? I understand those 3 parameters but rarely use them. For example, I have no idea

How to get the number of the most frequent value in a column?

阅读更多关于 How to get the number of the most frequent value in a column?

问题 I have a data frame and I would like to know how many times a given column has the most frequent value. I try to do it in the following way: items_counts = df['item'].value_counts() max_item = items_counts.max() As a result I get: ValueError: cannot convert float NaN to integer As far as I understand, with the first line I get series in which the values from a column are used as key and frequency of these values are used as values. So, I just need to find the largest value in the series and,

assigning column names to a pandas series

阅读更多关于 assigning column names to a pandas series

I have a pandas series object x Ezh2 2 Hmgb 7 Irf1 1 I want to save this as a dataframe with column names Gene and Count respectively I tried x_df = pd.DataFrame(x,columns = ['Gene','count']) but it does not work.The final form I want is Gene Count Ezh2 2 Hmgb 7 Irf1 1 Can you suggest how to do this You can create a dict and pass this as the data param to the dataframe constructor: In [235]: df = pd.DataFrame({'Gene':s.index, 'count':s.values}) df Out[235]: Gene count 0 Ezh2 2 1 Hmgb 7 2 Irf1 1 Alternatively you can create a df from the series, you need to call reset_index as the index will be

Extract values in Pandas value_counts()

阅读更多关于 Extract values in Pandas value_counts()

问题 Say we have used pandas dataframe[column].value_counts() which outputs: apple 5 sausage 2 banana 2 cheese 1 How do you extract the values in the order same as shown above from max to min ? e.g: [apple,sausage,banana,cheese] 回答1: Try this: dataframe[column].value_counts().index.tolist() ['apple', 'sausage', 'banana', 'cheese'] 回答2: #!/usr/bin/env python import pandas as pd # Make example dataframe df = pd.DataFrame([(1, 'Germany'), (2, 'France'), (3, 'Indonesia'), (4, 'France'), (5, 'France'),

How to get the first column of a pandas DataFrame as a Series?

阅读更多关于 How to get the first column of a pandas DataFrame as a Series?

问题 I tried: x=pandas.DataFrame(...) s = x.take([0], axis=1) And s gets a DataFrame, not a Series. 回答1: >>> import pandas as pd >>> df = pd.DataFrame({'x' : [1, 2, 3, 4], 'y' : [4, 5, 6, 7]}) >>> df x y 0 1 4 1 2 5 2 3 6 3 4 7 >>> s = df.ix[:,0] >>> type(s) <class 'pandas.core.series.Series'> >>> =========================================================================== UPDATE If you're reading this after June 2017, ix has been deprecated in pandas 0.20.2, so don't use it. Use loc or iloc

Highcharts - Get crossing point of crossing series

阅读更多关于 Highcharts - Get crossing point of crossing series

I am currently trying to extract the points of multiple crossings of series (a,b,c,d) of a specific series (x). I can't seem to find any function that can aid me in this task. My best bet is to measure the distance of every single point in x with every single point in a,b,c,d... and assume when the distance reaches under some threshold, the point must be a crossing point. I think this approach is far too computational heavy and seems "dirty". I believe there must be easier or better ways, even perhaps functions within highcharts own API. I have searched various sources and sites, but I can't