dataframe

Use keywords from dataframe to detect if any present in another dataframe or string

末鹿安然 提交于 2021-02-10 18:19:19
问题 I have two problems: First is... I have one dataframe with category and keywords like this: Category Keywords 0 Fruit ['apple', 'pear', 'plum', 'grape'] 1 Color ['red', 'purple', 'green'] Another dataframe like this: Summary 0 This is a basket of red apples. They are sour. 1 We found a bushel of fruit. They are red. 2 There is a peck of pears that taste sweet. 3 We have a box of plums. I want the end result like this: Category Summary 0 Fruit, Color This is a basket of red apples. They are

Sorting rows in python pandas

房东的猫 提交于 2021-02-10 18:17:30
问题 I have a dataframe (the sample looks like this) Type SKU Description FullDescription Size Price Variable 2 Boots Shoes on sale XL,S,M Variation 2.5 Boots XL XL 330 Variation 2.6 Boots S S 330 Variation 2.7 Boots M M 330 Variable 3 Helmet Helmet Sizes E42,E41 Variation 3.8 Helmet E42 E42 89 Variation 3.2 Helmet E41 E41 89 What I want to do is sort the values based on Size so the final data frame should look like this: Type SKU Description FullDescription Size Price Variable 2 Boots Shoes on

How to extract information from a dataframe name and create a column based on it

99封情书 提交于 2021-02-10 18:07:45
问题 Here's some mock data that represents the data I have: pend4P_17k <- data.frame(x = c(1, 2, 3, 4, 5), var1 = c('a', 'b', 'c', 'd', 'e'), var2 = c(1, 1, 0, 0, 1)) pend5P_17k <- data.frame(x = c(1, 2, 3, 4, 5), var1 = c('a', 'b', 'c', 'd', 'e'), var2 = c(1, 1, 0, 0, 1)) I need to add a column to each data frame that represents the first letter/number code within the dataframe name, so for each dataframe I've been doing the following: pend4P_17k$Pendant_ID<-"4P" pend5P_17k$Pendant_ID<-"5P"

How to extract information from a dataframe name and create a column based on it

随声附和 提交于 2021-02-10 18:07:12
问题 Here's some mock data that represents the data I have: pend4P_17k <- data.frame(x = c(1, 2, 3, 4, 5), var1 = c('a', 'b', 'c', 'd', 'e'), var2 = c(1, 1, 0, 0, 1)) pend5P_17k <- data.frame(x = c(1, 2, 3, 4, 5), var1 = c('a', 'b', 'c', 'd', 'e'), var2 = c(1, 1, 0, 0, 1)) I need to add a column to each data frame that represents the first letter/number code within the dataframe name, so for each dataframe I've been doing the following: pend4P_17k$Pendant_ID<-"4P" pend5P_17k$Pendant_ID<-"5P"

Add a column value depending on a date range (if-else)

送分小仙女□ 提交于 2021-02-10 17:54:10
问题 I have a date column in my dataframe and want to add a column called location. The value of location in each row should depend on which date range it falls under. For example, the date 13th November falls between 12th November and 16th November & therefore the location should be Seattle. The date 17th November falls between 17th November and 18th November and must be New York. Below is an example of the data frame I want to achieve Dates | Location (column I want to add) .....................

Add a column value depending on a date range (if-else)

我只是一个虾纸丫 提交于 2021-02-10 17:50:26
问题 I have a date column in my dataframe and want to add a column called location. The value of location in each row should depend on which date range it falls under. For example, the date 13th November falls between 12th November and 16th November & therefore the location should be Seattle. The date 17th November falls between 17th November and 18th November and must be New York. Below is an example of the data frame I want to achieve Dates | Location (column I want to add) .....................

ffill not filling data in pandas dataframe

萝らか妹 提交于 2021-02-10 16:55:32
问题 I have a dataframe like this : A B C E D --------------- 0 a r g g 1 x 2 x f f r 3 t 3 y I am trying for forward filling using ffill. It is not working cols = df.columns[:4].tolist() df[cols] = df[cols].ffill() I also tried : df[cols] = df[cols].fillna(method='ffill') But it is not getting filled. Is it the empty columns in data causing this issue? Data is mocked. Exact data is different (contains strings,numbers and empty columns) desired o/p: A B C E D --------------- 0 a r g g 1 a r g x 2

Importing Multiple Data-frames with Pandas

流过昼夜 提交于 2021-02-10 16:48:22
问题 I'm trying to import multiple datasets into a single data frame through a function. # function to import each of the new datasets def csvImport(yearOfDataset): import glob, os for items in yearOfDataset: # dataset name ds = pd.concat(map(pd.read_csv, glob.glob(os.path.join("PSNI_StreetCrime_"+str(yearOfDataset)),"*.csv"))) I want to pass the argument to the function as follows, as it means I can call it quicker for the multiple folders I have; The folder name follow the pattern ChildFolder

Remove “days 00:00:00”from dataframe [duplicate]

杀马特。学长 韩版系。学妹 提交于 2021-02-10 16:42:30
问题 This question already has answers here : Pandas Timedelta in Days (5 answers) Closed 1 year ago . So, I have a pandas dataframe with a lot of variables including start/end date of loans. I subtract these two in order to get their difference in days. The result I get is of the type i.e. 349 days 00:00:00. How can I keep only for example the number 349 from this column? 回答1: Check this format, df['date'] = pd.to_timedelta(df['date'], errors='coerce').days also, check .normalize() function in

Converting a list of lists of strings to a data frame of numbers in R

蹲街弑〆低调 提交于 2021-02-10 16:02:33
问题 I have a list of lists of strings as follows: > ll [[1]] [1] "2" "1" [[2]] character(0) [[3]] [1] "1" [[4]] [1] "1" "8" The longest list is of length 2, and I want to build a data frame with 2 columns from this list. Bonus points for also converting each item in the list to a number or NA for character(0). I have tried using mapply() and data.frame to convert to a data frame and fill with NA's as follows. # Find length of each list element len = sapply(awards2, length) # Number of NAs to fill