pandas | 易学教程

Extracting year from datetime datatypes is giving output as float

阅读更多关于 Extracting year from datetime datatypes is giving output as float

问题 I'm a newbee in Pandas. I need your support in a problem i'm facing I have a datatime column in a dataframe and I'm trying to extract the year from it but the output it is giving is in float? fifa['Joined_date'].head() 0 1970-01-01 1 1970-01-01 2 1970-01-01 3 1970-01-01 4 1970-01-01 Name: Joined_date, dtype: datetime64[ns] fifa['Joined_date'].dt.year 0 1970.0 1 1970.0 2 1970.0 3 1970.0 4 1970.0 Name: Joined, Length: 18207, dtype: float64 Output Expected is --> 1970 Can you please help? 回答1:

Define recursive function in Pandas dataframe

阅读更多关于 Define recursive function in Pandas dataframe

问题 I can't seem to find the answer to my question so I'm trying my luck on here. Would very much appreciate your help. I've got a Pandas dataframe with values in Col1 and Col2. Instead of the np.nan values in Col2, I'd like to calculate the following: today's Col2 value = previous day's Col2 value multiplied by today's Col1 value. This should be some form of recursive function. I've tried several answers, including a for loop here below, but none seem to work: df = pd.read_excel('/Users/fhggshgf

Define recursive function in Pandas dataframe

阅读更多关于 Define recursive function in Pandas dataframe

Group by and fill missing datetime values with duplicates

阅读更多关于 Group by and fill missing datetime values with duplicates

问题 This question comes from this one: Group by and fill missing datetime values What I'm just trying is to group a Pandas Dataframe by contract, check if there are duplicated datetime values and fill this ones. If there are duplicates, there will be a total of 25 hours, and if not, 24. My input is this: contract datetime value1 value2 x 2019-01-01 00:00:00 50 60 x 2019-01-01 02:00:00 30 60 x 2019-01-01 02:00:00 70 80 x 2019-01-01 03:00:00 70 80 y 2019-01-01 00:00:00 30 100 With this Dataframe my

Pandas Dataframe: Find the column with the closest coordinate point to another columns coordinate point

阅读更多关于 Pandas Dataframe: Find the column with the closest coordinate point to another columns coordinate point

问题 I am working with soccer ball and soccer player tracking data. I am trying to find the player that is closest to the ball for each row of coordinate points, and make a new column attributing the closest player to the ball example data | ball_point | home_player1_point | home_player2_point | away_player1_point | | -------- | -------------- | ---------------------------------- | (7.00,3.00) (-15.37,8.22) (25.3,-.2) (12.0,12.9) desired output | ball_point | home_player1_point | home_player2

pandas: melt 100+ variables into 100+ new dataframes

阅读更多关于 pandas: melt 100+ variables into 100+ new dataframes

问题 Pretty new to stackoverflow, please bear with me if the format looks odd.. I have a big set of data with 100+ columns of data structured like: countrya countryb year variable1 variable2 ...... varaible100 I want to have the 100 variables separated into 100 new dataframes and save them into csvs. Below is the code I have for creating 1 new csv. dfm1=pd.melt(df, id_vars=['countrya','countryb','year'], value_vars=['variable1'], value_name='variable1') dfm1.drop('variable',axis=1) dfm1.to_csv(

cuDF - Not leveraging GPU cores

阅读更多关于 cuDF - Not leveraging GPU cores

问题 I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4 def arima(train): h = [] for each in train: model = pm.auto_arima(np.array(ast.literal_eval(each))) p = model.predict(1).item(0) h.append(p) return h for t_df in pd.read_csv("testset.csv",chunksize=1000): t_df = cudf.DataFrame.from_pandas(t_df) t_df['predicted'] = arima(t_df['prev_sales'])

pandas: melt 100+ variables into 100+ new dataframes

阅读更多关于 pandas: melt 100+ variables into 100+ new dataframes

Transposing Data in Pandas

阅读更多关于 Transposing Data in Pandas

问题 I have an excel file that contains a count of how many times an Part has been used during its lifespan. The data is currently stored in such a way that the Serial numbers are in Column A, and each "Lifespan" Count is stored in adjacent columns, with a "Date" value as its heading. Here is an example: Image1 I want to be able to pivot/transpose ALL of the Date columns in Python so that the output is in the following format with the Lifespan Count as a new column named "Count": Image2 I've tried

How can I remove a substring from a given String using Pandas

阅读更多关于 How can I remove a substring from a given String using Pandas

问题 Recently I started to analyse a data frame and I want to remove all the substrings that don't contain ('Aparelho Celular','Internet (Serviços e Produtos)','Serviços Telefônicos Diversos','Telefonia Celular','Telefonia Comunitária ( PABX, DDR, Etc. )','Telefonia Fixa','TV por Assinatura','Televisão / Aparelho DVD / Filmadora','Telemarketing') But when I use this syntax- df = df[~df["GrupoAssunto"].str.contains('Aparelho Celular','Internet (Serviços e Produtos)','Serviços Telefônicos Diversos',