dataframe

Is there an easy way to simplify this code using a loop?

最后都变了- 提交于 2021-01-29 09:50:44
问题 Is there a way to simplify this code using a loop? set.seed(100) AL_INDEX <- sample(1:nrow(AL_DF), 0.7*nrow(AL_DF)) AL_TRAIN <- AL_DF[AL_INDEX,] AL_TEST <- AL_DF[-AL_INDEX,] AR_INDEX <- sample(1:nrow(AR_DF), 0.7*nrow(AR_DF)) AR_TRAIN <- AR_DF[AR_INDEX,] AR_TEST <- AR_DF[-AR_INDEX,] AZ_INDEX <- sample(1:nrow(AZ_DF), 0.7*nrow(AZ_DF)) AZ_TRAIN <- AZ_DF[AZ_INDEX,] AZ_TEST <- AZ_DF[-AZ_INDEX,] AL_DF, AR_DF & AZ_DF are data frames that have the same field structure, but different number of records.

time difference between timestamps in a pandas dataframe as histogram

别等时光非礼了梦想. 提交于 2021-01-29 09:46:31
问题 I have a data frame with one column that is all timestamps like below. What I need to do is now calculate the difference between each of the timestamps and then use those differences to plot as a histogram. I am unable to decipher how to do the calculation on the differences. Any help would be appreciated. 0 2020-09-16 00:00:02.713264 1 2020-09-16 00:00:02.827854 2 2020-09-16 00:00:05.919288 3 2020-09-16 00:00:05.940775 4 2020-09-16 00:00:06.682184 回答1: given a dummy df # df # timestamp # 0

R: M3C library - Duplicate row.names error message

南楼画角 提交于 2021-01-29 09:39:14
问题 I am trying to run consensus clustering using M3C library in R. My dataset contains 451 samples and ~2500 genes. The row names are the ENTREZ IDs (numeric values) of the genes. I have crosschecked the dataset using "any(duplicated(colnames(MyData)))" command to make sure that there is no duplicate entries in the row names. I ran the following command to perform the consensus clustering using M3C library: res <- M3C(MyData, cores=8, seed = 123, des = annotation, removeplots = TRUE,

Adding multiple markers to a folium map using city names from pandas dataframe

可紊 提交于 2021-01-29 09:21:50
问题 Im trying to visualize data using folium maps, and I have to plot all Finlands' city names to the map. I've tried to use pandas dataframe since all my data is in csv format. Here's the code I've tried so far. import folium from folium import plugins import ipywidgets import geocoder import geopy import numpy as np import pandas as pd from vega_datasets import data as vds m = folium.Map(location=[65,26], zoom_start=5) # map map_layer_control = folium.Map(location=[65, 26], zoom_start=5) # add

Pandas and xlsxwriter: how to create a new sheet without exporting a dataframe?

僤鯓⒐⒋嵵緔 提交于 2021-01-29 09:08:57
问题 If I call the xlsxwriter module directly, it is very easy to create a new sheet in a new file, and write to its cells, e.g.: import xlsxwriter workbook = xlsxwriter.Workbook('test 1.xlsx') wks1=workbook.add_worksheet('Test sheet') wks1.write(0,0,'some random text') workbook.close() Howeer, my question is: how can I create a new sheet using a Pandas.ExcelWriter object? The object can create a new sheet when exporting a dataframe, but what if I don't have any dataframes to export? E.g. say I

Changing dataframe columns names by columns from another dataframe python

五迷三道 提交于 2021-01-29 09:03:31
问题 I have a dataframe valence_data with columns word1, word, word3, word4.... And I have my second dataframe word_data with columns 1, 2, ,3 ,4 ... How can I replace the columns names in word_data by names from valence_data. e.g. word_data with columns word1, word, word3, word4.... I am using pandas processing my data. Thanks 回答1: You need to use DataFrame.rename original_names = ["1", "2", ...] new_names = ["word1", "word2", ...] new_columns = dict(zip(original_names, new_names)) df.rename

How to get count of words from DataFrame based on conditions

核能气质少年 提交于 2021-01-29 08:46:39
问题 I have the following two dataframes badges and comments . I have created a list of 'gold users' from badges dataframe whose Class=1 . Here Name means the 'Name of Badge' and Class means the level of Badge (1=Gold, 2=Silver, 3=Bronze). I have already done the text preprocessing on comments['Text'] and now want to find the count of top 10 words for gold users from comments['Text'] . I tried the given code but am getting error "KeyError: "None of [Index(['1532', '290', '1946', '1459', '6094',

Scala — Conditional replace column value of a data frame

蹲街弑〆低调 提交于 2021-01-29 08:43:08
问题 DataFrame 1 is what I have now, and I want to write a Scala function to make DataFrame 1 look like DataFrame 2. Transfer is the big category; e-transfer and IMT are subcategories. The Logic is that for a same ID (31898), if both Transfer and e-Transfer tagged to it, it should only be e-Transfer; if Transfer and IMT and e-Transfer all tagged to a same ID (32614), it should be e-Transfer + IMT; If only Transfer tagged to one ID (33987), it should be Other; if only e-Transfer or IMT tagged to a

How do I make my plot show simple input as loaded from a(nother) class file?

点点圈 提交于 2021-01-29 08:27:50
问题 Got this: import pandas as pd from df import df import matplotlib as plt import seaborn as sns class Data_load: def __init__(self, df): self.df = pd.read_csv(df, delimiter=';') # Data information section def get_EDA_columns(self): return self.df.columns def get_EDA_info(self): return self.df.info() def get_EDA_describe(self): return self.df.describe() def get_EDA_shape(self): return self.df.shape def get_EDA_value_counts(self): return self.df.value_counts() def get_EDA_isnull(self): return

Combine series by date

a 夏天 提交于 2021-01-29 08:22:44
问题 The following 2 series of stocks in a single excel file: Can be combined using the date as index? The result should be like this: 回答1: You need a simple df.merge() here: df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer') OR df = df1.join(df2, how='outer') 回答2: I am trying this: df3 = pd.concat([df1, df2]).sort_values('Date').reset_index(drop=True) or df3 = df1.append(df2).sort_values('Date').reset_index(drop=True) 来源: https://stackoverflow.com/questions/64212463/combine