pandas

Extracting year from datetime datatypes is giving output as float

江枫思渺然 提交于 2021-02-11 15:30:20
问题 I'm a newbee in Pandas. I need your support in a problem i'm facing I have a datatime column in a dataframe and I'm trying to extract the year from it but the output it is giving is in float? fifa['Joined_date'].head() 0 1970-01-01 1 1970-01-01 2 1970-01-01 3 1970-01-01 4 1970-01-01 Name: Joined_date, dtype: datetime64[ns] fifa['Joined_date'].dt.year 0 1970.0 1 1970.0 2 1970.0 3 1970.0 4 1970.0 Name: Joined, Length: 18207, dtype: float64 Output Expected is --> 1970 Can you please help? 回答1:

Define recursive function in Pandas dataframe

老子叫甜甜 提交于 2021-02-11 15:27:58
问题 I can't seem to find the answer to my question so I'm trying my luck on here. Would very much appreciate your help. I've got a Pandas dataframe with values in Col1 and Col2. Instead of the np.nan values in Col2, I'd like to calculate the following: today's Col2 value = previous day's Col2 value multiplied by today's Col1 value. This should be some form of recursive function. I've tried several answers, including a for loop here below, but none seem to work: df = pd.read_excel('/Users/fhggshgf

Define recursive function in Pandas dataframe

杀马特。学长 韩版系。学妹 提交于 2021-02-11 15:26:52
问题 I can't seem to find the answer to my question so I'm trying my luck on here. Would very much appreciate your help. I've got a Pandas dataframe with values in Col1 and Col2. Instead of the np.nan values in Col2, I'd like to calculate the following: today's Col2 value = previous day's Col2 value multiplied by today's Col1 value. This should be some form of recursive function. I've tried several answers, including a for loop here below, but none seem to work: df = pd.read_excel('/Users/fhggshgf

Group by and fill missing datetime values with duplicates

筅森魡賤 提交于 2021-02-11 15:24:49
问题 This question comes from this one: Group by and fill missing datetime values What I'm just trying is to group a Pandas Dataframe by contract, check if there are duplicated datetime values and fill this ones. If there are duplicates, there will be a total of 25 hours, and if not, 24. My input is this: contract datetime value1 value2 x 2019-01-01 00:00:00 50 60 x 2019-01-01 02:00:00 30 60 x 2019-01-01 02:00:00 70 80 x 2019-01-01 03:00:00 70 80 y 2019-01-01 00:00:00 30 100 With this Dataframe my

Pandas Dataframe: Find the column with the closest coordinate point to another columns coordinate point

纵饮孤独 提交于 2021-02-11 15:23:46
问题 I am working with soccer ball and soccer player tracking data. I am trying to find the player that is closest to the ball for each row of coordinate points, and make a new column attributing the closest player to the ball example data | ball_point | home_player1_point | home_player2_point | away_player1_point | | -------- | -------------- | ---------------------------------- | (7.00,3.00) (-15.37,8.22) (25.3,-.2) (12.0,12.9) desired output | ball_point | home_player1_point | home_player2

pandas: melt 100+ variables into 100+ new dataframes

久未见 提交于 2021-02-11 15:19:09
问题 Pretty new to stackoverflow, please bear with me if the format looks odd.. I have a big set of data with 100+ columns of data structured like: countrya countryb year variable1 variable2 ...... varaible100 I want to have the 100 variables separated into 100 new dataframes and save them into csvs. Below is the code I have for creating 1 new csv. dfm1=pd.melt(df, id_vars=['countrya','countryb','year'], value_vars=['variable1'], value_name='variable1') dfm1.drop('variable',axis=1) dfm1.to_csv(

cuDF - Not leveraging GPU cores

六眼飞鱼酱① 提交于 2021-02-11 15:16:55
问题 I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4 def arima(train): h = [] for each in train: model = pm.auto_arima(np.array(ast.literal_eval(each))) p = model.predict(1).item(0) h.append(p) return h for t_df in pd.read_csv("testset.csv",chunksize=1000): t_df = cudf.DataFrame.from_pandas(t_df) t_df['predicted'] = arima(t_df['prev_sales'])

pandas: melt 100+ variables into 100+ new dataframes

眉间皱痕 提交于 2021-02-11 15:16:14
问题 Pretty new to stackoverflow, please bear with me if the format looks odd.. I have a big set of data with 100+ columns of data structured like: countrya countryb year variable1 variable2 ...... varaible100 I want to have the 100 variables separated into 100 new dataframes and save them into csvs. Below is the code I have for creating 1 new csv. dfm1=pd.melt(df, id_vars=['countrya','countryb','year'], value_vars=['variable1'], value_name='variable1') dfm1.drop('variable',axis=1) dfm1.to_csv(

Transposing Data in Pandas

只谈情不闲聊 提交于 2021-02-11 15:15:55
问题 I have an excel file that contains a count of how many times an Part has been used during its lifespan. The data is currently stored in such a way that the Serial numbers are in Column A, and each "Lifespan" Count is stored in adjacent columns, with a "Date" value as its heading. Here is an example: Image1 I want to be able to pivot/transpose ALL of the Date columns in Python so that the output is in the following format with the Lifespan Count as a new column named "Count": Image2 I've tried

How can I remove a substring from a given String using Pandas

假装没事ソ 提交于 2021-02-11 15:10:31
问题 Recently I started to analyse a data frame and I want to remove all the substrings that don't contain ('Aparelho Celular','Internet (Serviços e Produtos)','Serviços Telefônicos Diversos','Telefonia Celular','Telefonia Comunitária ( PABX, DDR, Etc. )','Telefonia Fixa','TV por Assinatura','Televisão / Aparelho DVD / Filmadora','Telemarketing') But when I use this syntax- df = df[~df["GrupoAssunto"].str.contains('Aparelho Celular','Internet (Serviços e Produtos)','Serviços Telefônicos Diversos',