dataframe

Creating datetime in pandas from year and julian day

随声附和 提交于 2021-01-29 04:19:57
问题 ad_name adl_name year JD 0 united_states_of_america colorado 2000 1 1 united_states_of_america colorado 2000 2 2 united_states_of_america colorado 2000 3 3 united_states_of_america colorado 2000 4 4 united_states_of_america colorado 2000 5 how do I add a datetime column using the year and JD (julian day) columns? I was trying to use: pd.to_datetime(df, format='%Y_%d') , but that does not work 回答1: You need add to year column JD converted to_timedelta: df['date'] = pd.to_datetime(df.year,

Proper way of handling JSON Parsing TypeError when element does not exist

爱⌒轻易说出口 提交于 2021-01-29 04:18:15
问题 The code get's me what I want in the end. (which is to create a list of dictionary of the fields I want from a very large json dataset, so that I can create a dataframe for additional data processing) However I have to construct a very large try/expect block to get this done. I am was wondering if there is a clearer/clever way of doing this. The problem I'm having is that the details[ 'element' ] sometimes don't exist or have a value, which throws a NoneType exception if it does not exist on

Pandas dataframe - move rows from one dataframe to another

南楼画角 提交于 2021-01-29 04:04:41
问题 I have a python pandas dataframe that has 3 rows in it: Name Time count AAA 5:45 5 BBB 13:01 8 CCC 11:16 3 I am trying to loop through this dataframe and if the count is greater than 5, i have to populate that row to a new dataframe. I know the count is 2 from a function as only 2 rows are greater than 5. I tried the below code but it is not working. Any help would be appreciated. for i in range(2): if(row['Occurences'] >= 5 ): df6.loc[i] = [df4['MachineName'], df4['DateTime'], df4['Count']]

Find all indices/instances of all repeating patterns across columns and rows of pandas dataframe

廉价感情. 提交于 2021-01-29 03:35:24
问题 Suppose I have a simple pandas dataframe df as so: | name | car | |----|-----------|-------| | 0 | 'bob' | 'b' | | 1 | 'bob' | 'c' | | 2 | 'fox' | 'b' | | 3 | 'fox' | 'c' | | 4 | 'cox' | 'b' | | 5 | 'cox' | 'c' | | 6 | 'jo' | 'b' | | 7 | 'jo' | 'c' | | 8 | 'bob' | 'b' | | 9 | 'bob' | 'c' | | 10 | 'bob' | 'b' | | 11 | 'bob' | 'c' | | 12 | 'rob' | 'b' | | 13 | 'rob' | 'c' | I would like to find the row indices of a specific pattern that spans both columns. In my real application, the above

Applying Time series models for each row

非 Y 不嫁゛ 提交于 2021-01-29 03:17:08
问题 I have a dataframe (df), which is wide dataset with the following structure, ID 2015/01/01 2015/02/01 2015/03/01 2015/04/01 A1 42 42 24 32 A2 22 22 24 32 A3 12 15 19 22 A4 8 12 18 24 I want to build time series model for each row, thus there will be N time series model where N = number of rows in dataframe I tired the following: ts_1 <- ts(df[1:1,], start = c(2015, 05), frequency = 12) ts_1_stl <- stl(ts_1, s.window = "periodic") But I got the error as : Error in stl(ts_1, s.window =

Pandas groupby and correct with median in new column

扶醉桌前 提交于 2021-01-29 03:09:48
问题 My dataframe look like this Plate Sample LogRatio P1 S1 0.42 P1 S2 0.23 P2 S3 0.41 P3 S4 0.36 P3 S5 0.18 I have calculated the median of each plate (but it's probably not the best idea to start like this) grouped = df.groupby("Plate") medianesPlate = grouped["LogRatio"].median() And I want to add a column on my dataframe CorrectedLogRatio = LogRatio-median(plate) I suppose with : df["CorrectedLogRatio"] = LogRatio-median(plate) To have something like this : Plate Sample LogRatio

How to convert python JSON rows to dataframe columns without looping

元气小坏坏 提交于 2021-01-29 03:09:13
问题 I'm trying to figure out how to do the following without using a loop. I have a dataframe that has several columns including one that has a JSON string. What I'm trying to do is convert the JSON string column into their own columns within the dataframe. For example I have the following dataframe: Column 1 | column 2 | Json Column 123 | ABC | {"anotherNumber":345,"anotherString":"DEF"} I want to convert to this: Column 1 | column 2 | anotherNumber | anotherString 123 | ABC | 345 | DEF 回答1: You

ordering n factor variables in data.frame(r) [duplicate]

强颜欢笑 提交于 2021-01-29 03:07:18
问题 This question already has answers here : Grouping functions (tapply, by, aggregate) and the *apply family (10 answers) Closed 3 years ago . Suppose I have data frame with 6 columns(all of them are unordered factors). teacher$TC001Q01NA TC026Q01NA TC026Q02NA TC026Q04NA TC026Q05NA TC026Q06NA 1 <NA> <NA> <NA> <NA> <NA> <NA> 2 Female Strongly agree Strongly agree Strongly disagree Strongly agree Strongly disagree 3 Male Agree Agree Disagree Agree Disagree 4 Female Agree Agree Disagree Agree Agree

How to extract URL from Pandas DataFrame?

笑着哭i 提交于 2021-01-29 02:57:04
问题 I need to extract URLs from a column of DataFrame which was created using following values creation_date,tweet_id,tweet_text 2020-06-06 03:01:37,1269102116364324865,#Webinar: Sign up for @SumoLogic's June 16 webinar to learn how to navigate your #Kubernetes environment and unders… https://stackoverflow.com/questions/42237666/extracting-information-from-pandas-dataframe 2020-06-06 01:29:38,1269078966985461767,"In this #webinar replay, @DisneyStreaming's @rothgar chats with @SumoLogic's

Calculation within Pandas dataframe group

时间秒杀一切 提交于 2021-01-29 02:44:58
问题 I've Pandas Dataframe as shown below. What I'm trying to do is, partition (or groupby) by BlockID, LineID, WordID , and then within each group use current WordStartX - previous (WordStartX + WordWidth) to derive another column, e.g., WordDistance to indicate the distance between this word and previous word. This post Row operations within a group of a pandas dataframe is very helpful but in my case multiple columns involved (WordStartX and WordWidth). *BlockID LineID WordID WordStartX