series

Using if/else in pandas series to create new series based on conditions

倾然丶 夕夏残阳落幕 提交于 2020-01-17 12:38:22
问题 I have a pandas df. Say I have a column "activity" which can be "fun" or "work" and I want to convert it to an integer. What I do is: df["activity_id"] = 1*(df["activity"]=="fun") + 2*(df["activity"]=="work") This works, since I do not know how to put an if/else in there (and if you have 10 activities it can get complicated). However, say I now have the opposite problem, and I want to convert from an id to a string, I cannot use this trick anymore because I cannot multiply a string with a

Pandas Series.rename not reflected in DataFrame columns

大兔子大兔子 提交于 2020-01-15 20:56:42
问题 I'm trying to rename a column by validating the values in the particular columns. Here is the set-up: In [9]: import pandas as pd In [10]: df = pd.DataFrame( ...: {"unknown_field": ['bob@gmail.com', 'shirley@gmail.com', 'groza@pubg.com']} ...: ) In [11]: df Out[11]: unknown_field 0 bob@gmail.com 1 shirley@gmail.com 2 groza@pubg.com Using a validate_column(ser) , which takes Pandas.Series object as parameter, it validates the values in that column and modifies the column name of that

Compute sum of series

怎甘沉沦 提交于 2020-01-15 05:56:07
问题 I need to compute the sum of this series I need the output this way: If n = 3; x = function_name(n) I need to get x = 11. If n = 5; x = function_name(n) I need to get x = 45 . I believe I need a for-loop to iterate; but am finding it difficult to iterate the increment value itself. 回答1: inc=2; sum=1; next=1; n=input('what is n?\n'); for i=2:n next=next+inc; sum=sum+next; inc=inc+2; end disp('sum is '); disp(sum); 回答2: I guess you want the sum of the cumsum of the differences d of the numbers:

python, how to convert a pandas series into a pandas DataFrame?

点点圈 提交于 2020-01-14 03:59:27
问题 I have a pandas series sf: email email1@email.com [1.0, 0.0, 0.0] email2@email.com [2.0, 0.0, 0.0] email3@email.com [1.0, 0.0, 0.0] email4@email.com [4.0, 0.0, 0.0] email5@email.com [1.0, 0.0, 3.0] email6@email.com [1.0, 5.0, 0.0] How can I convert it to this following pandas DataFrame: index | email ________________________ 0 | email1@email.com 1 | email2@email.com 2 | email3@email.com 3 | email4@email.com 4 | email5@email.com 5 | email6@email.com Thanks for your help! 回答1: >>> s = p.Series

how to remove redundant date time when x-axis is incontinuous pandas DatetimeIndex

两盒软妹~` 提交于 2020-01-14 01:06:10
问题 I want to plot a pandas series which index is incountinuous DatatimeIndex. My code is as follows: import matplotlib.dates as mdates index = pd.DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:01:00', '2000-01-01 00:02:00', '2000-01-01 00:03:00', '2000-01-01 00:07:00', '2000-01-01 00:08:00'], dtype='datetime64[ns]') df = pd.Series(range(6), index=index) print(df) plt.plot(df.index, df.values) plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%M")) plt.show() The output is: But the

Dask item assignment. Cannot use loc for item assignment

可紊 提交于 2020-01-13 11:23:20
问题 I have a folder of parquet files that I can't fit in memory so I am using dask to perform the data cleansing operations. I have a function where I want to perform item assignment but I can't seem to find any solutions online that qualify as solutions to this particular function. Below is the function that works in pandas. How do I get the same results in a dask dataframe? I thought delayed might help but all of the solutions I've tried to write haven't been working. def item_assignment(df):

Pandas: adding multiindex Series/Dataframes containing lists

半世苍凉 提交于 2020-01-07 04:59:06
问题 How do I add / merge two multiindex Series/DataFrames which contain lists as elements (a port-sequence or timestamp-sequence in my case). Especially, how to deal with indices, which appear only in one Series/DataFrame? Unfortunately, the .add() -method allows only floats for the fill_value argument, not empty lists. My Data: print series1 print series2 IP sessionID 195.12*.21*.11* 49 [5900] 50 [5900, 5900, 5900, 5900, ... IP sessionID 85.15*.24*.12* 63 [3389] 91.20*.4*.14* 68 [445, 445, 139]

simple number series

天大地大妈咪最大 提交于 2020-01-06 12:51:21
问题 This is a simple number series question, I have numbers in series like 2,4,8,16,32,64,128,256 these numbers are formed by 2,2(square),2(cube) and so on. Now if I add 2+4+8 = 14 . 14 will get only by the addition 2,4 and 8. so i have 14 in my hand now, By some logic i need to get the values which are helped to get 14 Example: 2+4+8 = 14 14(some logic) = 2,4,8. 回答1: This is an easy one: 2+4+8=14 ... 14+2=16 2+4+8+16=30 ... 30+2=32 2+4+8+16+32=62 ... 62+2=64 So you just need to add 2 to your sum

pandas Timedelta error

只谈情不闲聊 提交于 2020-01-06 10:14:47
问题 I'm getting errors when running the code samples from the pandas documentation. I suspect it might be related to the version of pandas I'm using, but I haven't been able to confirm that. pandas VERSION 0.10.1 numpy VERSION 1.7.0 scipy VERSION 0.12.0.dev-14b1e07 The below examples are taken directly from the pandas documentation here: pandas - Time Deltas This works from datetime import datetime, timedelta from pandas import * s = Series(date_range('2012-1-1', periods=3, freq='D')) s Out[52]:

pandas Timedelta error

Deadly 提交于 2020-01-06 10:14:31
问题 I'm getting errors when running the code samples from the pandas documentation. I suspect it might be related to the version of pandas I'm using, but I haven't been able to confirm that. pandas VERSION 0.10.1 numpy VERSION 1.7.0 scipy VERSION 0.12.0.dev-14b1e07 The below examples are taken directly from the pandas documentation here: pandas - Time Deltas This works from datetime import datetime, timedelta from pandas import * s = Series(date_range('2012-1-1', periods=3, freq='D')) s Out[52]: