dataframe | 易学教程

Use keywords from dataframe to detect if any present in another dataframe or string

阅读更多关于 Use keywords from dataframe to detect if any present in another dataframe or string

问题 I have two problems: First is... I have one dataframe with category and keywords like this: Category Keywords 0 Fruit ['apple', 'pear', 'plum', 'grape'] 1 Color ['red', 'purple', 'green'] Another dataframe like this: Summary 0 This is a basket of red apples. They are sour. 1 We found a bushel of fruit. They are red. 2 There is a peck of pears that taste sweet. 3 We have a box of plums. I want the end result like this: Category Summary 0 Fruit, Color This is a basket of red apples. They are

Sorting rows in python pandas

阅读更多关于 Sorting rows in python pandas

问题 I have a dataframe (the sample looks like this) Type SKU Description FullDescription Size Price Variable 2 Boots Shoes on sale XL,S,M Variation 2.5 Boots XL XL 330 Variation 2.6 Boots S S 330 Variation 2.7 Boots M M 330 Variable 3 Helmet Helmet Sizes E42,E41 Variation 3.8 Helmet E42 E42 89 Variation 3.2 Helmet E41 E41 89 What I want to do is sort the values based on Size so the final data frame should look like this: Type SKU Description FullDescription Size Price Variable 2 Boots Shoes on

How to extract information from a dataframe name and create a column based on it

阅读更多关于 How to extract information from a dataframe name and create a column based on it

问题 Here's some mock data that represents the data I have: pend4P_17k <- data.frame(x = c(1, 2, 3, 4, 5), var1 = c('a', 'b', 'c', 'd', 'e'), var2 = c(1, 1, 0, 0, 1)) pend5P_17k <- data.frame(x = c(1, 2, 3, 4, 5), var1 = c('a', 'b', 'c', 'd', 'e'), var2 = c(1, 1, 0, 0, 1)) I need to add a column to each data frame that represents the first letter/number code within the dataframe name, so for each dataframe I've been doing the following: pend4P_17k$Pendant_ID<-"4P" pend5P_17k$Pendant_ID<-"5P"

How to extract information from a dataframe name and create a column based on it

阅读更多关于 How to extract information from a dataframe name and create a column based on it

Add a column value depending on a date range (if-else)

阅读更多关于 Add a column value depending on a date range (if-else)

问题 I have a date column in my dataframe and want to add a column called location. The value of location in each row should depend on which date range it falls under. For example, the date 13th November falls between 12th November and 16th November & therefore the location should be Seattle. The date 17th November falls between 17th November and 18th November and must be New York. Below is an example of the data frame I want to achieve Dates | Location (column I want to add) .....................

Add a column value depending on a date range (if-else)

阅读更多关于 Add a column value depending on a date range (if-else)

ffill not filling data in pandas dataframe

阅读更多关于 ffill not filling data in pandas dataframe

问题 I have a dataframe like this : A B C E D --------------- 0 a r g g 1 x 2 x f f r 3 t 3 y I am trying for forward filling using ffill. It is not working cols = df.columns[:4].tolist() df[cols] = df[cols].ffill() I also tried : df[cols] = df[cols].fillna(method='ffill') But it is not getting filled. Is it the empty columns in data causing this issue? Data is mocked. Exact data is different (contains strings,numbers and empty columns) desired o/p: A B C E D --------------- 0 a r g g 1 a r g x 2

Importing Multiple Data-frames with Pandas

阅读更多关于 Importing Multiple Data-frames with Pandas

问题 I'm trying to import multiple datasets into a single data frame through a function. # function to import each of the new datasets def csvImport(yearOfDataset): import glob, os for items in yearOfDataset: # dataset name ds = pd.concat(map(pd.read_csv, glob.glob(os.path.join("PSNI_StreetCrime_"+str(yearOfDataset)),"*.csv"))) I want to pass the argument to the function as follows, as it means I can call it quicker for the multiple folders I have; The folder name follow the pattern ChildFolder

Remove “days 00:00:00”from dataframe [duplicate]

阅读更多关于 Remove “days 00:00:00”from dataframe [duplicate]

问题 This question already has answers here : Pandas Timedelta in Days (5 answers) Closed 1 year ago . So, I have a pandas dataframe with a lot of variables including start/end date of loans. I subtract these two in order to get their difference in days. The result I get is of the type i.e. 349 days 00:00:00. How can I keep only for example the number 349 from this column? 回答1: Check this format, df['date'] = pd.to_timedelta(df['date'], errors='coerce').days also, check .normalize() function in

Converting a list of lists of strings to a data frame of numbers in R

阅读更多关于 Converting a list of lists of strings to a data frame of numbers in R

问题 I have a list of lists of strings as follows: > ll [[1]] [1] "2" "1" [[2]] character(0) [[3]] [1] "1" [[4]] [1] "1" "8" The longest list is of length 2, and I want to build a data frame with 2 columns from this list. Bonus points for also converting each item in the list to a number or NA for character(0). I have tried using mapply() and data.frame to convert to a data frame and fill with NA's as follows. # Find length of each list element len = sapply(awards2, length) # Number of NAs to fill