dataframe

Pyspark explode json string

拜拜、爱过 提交于 2021-01-29 08:04:04
问题 Input_dataframe id name collection 111 aaaaa {"1":{"city":"city_1","state":"state_1","country":"country_1"}, "2":{"city":"city_2","state":"state_2","country":"country_2"}, "3":{"city":"city_3","state":"state_3","country":"country_3"} } 222 bbbbb {"1":{"city":"city_1","state":"state_1","country":"country_1"}, "2":{"city":"city_2","state":"state_2","country":"country_2"}, "3":{"city":"city_3","state":"state_3","country":"country_3"} } here id ==> string name ==> string collection ==> string

Pandas Dataframe splitting a column with dict values into columns

£可爱£侵袭症+ 提交于 2021-01-29 08:01:04
问题 I am attempting to split and convert a column, in a pandas dataframe, with list of dictionary values into a new columns. Using Splitting dictionary/list inside a Pandas Column into Separate Columns as a reference things appear to fail because some of the values are NaN. When these rows are encountered an error is thrown, can't iterate over float and if I fillna with None the error changes to a str related error. I have attempted to first use: df.explode('freshness_grades') df_new = pd.concat(

How can I turn this nested JSON into a DataFrame?

孤人 提交于 2021-01-29 08:00:37
问题 So I have a piece of JSON code and I want to turn it into a DataFrame, however I am quite new to DataFrames so I am a bit stuck. Any help would be appreciated :) So this is my code: data = response.json() data_pretty = json.dumps(data, sort_keys=True, indent=4) data_frame = pd.DataFrame(data) # Pretty print print(data_pretty) print(data_frame) This is the output: { "status": "OK", "users": [ { "email": "raf@webconexus.nl", "first_name": "Raf", "id": "24959", "last_name": "Rasenberg" }, {

How to convert pandas dataframe to uniquely structured nested json

被刻印的时光 ゝ 提交于 2021-01-29 07:53:51
问题 I have a DF with structure as follows: traffic_group app_id key category factors 0 desktop app1 CI html 16.618628 1 desktop app1 CI xhr 35.497082 2 desktop app1 IP html 18.294468 3 desktop app1 IP xhr 30.422464 4 desktop app2 CI html 11.028240 5 desktop app2 CI json 33.548279 6 mobile app1 IP html 12.808367 7 mobile app1 IP image 14.410633 I need to output it to a json of the following structure: { "desktop": { app1: [ { "key": "CI", "threshold: 1, "window": 60, "factors: { "html" : 16.618628

vectorization of multiple return of a complex function in a dataframe

半世苍凉 提交于 2021-01-29 07:43:50
问题 I am trying to plot various data including complex vectors.Thanks to contributors see answers https://stackoverflow.com/a/64480659/13953414, i managed to generate the dataframes but i get stuck when i want to add some additional calculations. i get an error : df['T_depth'] = (math.sqrt(D / (4 * (math.pi) * frequency)) / 1e-6), TypeError: only size-1 arrays can be converted to Python scalars all calculations starting from T_depth are not executed due to a format issue. the function were

Pandas DataFrames If else condition on multiple columns [duplicate]

核能气质少年 提交于 2021-01-29 07:36:07
问题 This question already has answers here : Multiple logical comparisons in pandas df (3 answers) Closed 8 months ago . I have a Data frame as shown below import pandas as pd df = pd.DataFrame({ "name": ["john","peter","john","alex"], "height": [6,5,4,4], "shape": ["null","null","null","null"] }) I want to apply this--- If name == john and height == 6 return shape = good else if height == 4 return shape = bad else change the shape to middle so the final Dataframe should look like this df = ({

TypeError: only integer scalar arrays can be converted to a scalar index when use Pandas Fillna

喜夏-厌秋 提交于 2021-01-29 07:30:50
问题 Update : It seems due to .loc ,if i uses the original df from pd.read_excel, it is fine. I have a dataframe with Dtypes as follows. This is csv for the dataframe : CSV File Date datetime64[ns] Amout float64 Currency object ID object I used the following code to replace NaT, NaN a=np.datetime64('2000-01-01') values={'Date':a,'Amount':0,'Currency':'0','ID':'0'} df.fillna(value=values,inplace=True) However, I got the error : TypeError: only integer scalar arrays can be converted to a scalar

Call specific columns with regular expression pandas

ぃ、小莉子 提交于 2021-01-29 07:19:47
问题 Using pandas, I want to do something like this while looping through data frames: for body_part, columns in zip(self.body_parts, usecols_gen()): body_part_df = self.read_csv(usecols=columns) if self.normalize: body_part_df[r'x(\.\d)?'] = body_part_df[r'x(\.\d)?'].apply(lambda x: x/x_max) print(body_part_df) result[body_part] = body_part_df I use regular expressions because the column names I refer to are mangled: x, x.1, x.2, ..., x.n This gives KeyError, and I don't understand the reason.

python pandas dtypes detection from sql

让人想犯罪 __ 提交于 2021-01-29 07:03:43
问题 I am quite troubled by the behaviour of Pandas DataFrame about Dtype detection. I use 'read_sql_query' to retrieve data from a database to build a DataFrame, and then dump it into a csv file. I don't need any transformation. Just dump it into a csv file and change date fields in the form : '%d/%m/%Y' However : self.dataframe.to_csv(self.fic, index=False, header=False, sep='|', mode='a', encoding='utf-8', line_terminator='\n', date_format='%d/%m/%Y ) Would miss to transforme/format some date

Convert dataframe column to datetime only if length of string is not zero

给你一囗甜甜゛ 提交于 2021-01-29 07:02:17
问题 I'd like to convert a dataframe column which has a date string. But in some cases, the date string might be empty due to certain conditions. So I just want all the other rows in that column to be converted to datetime format except the rows in that particular column which might be blank. Is it possible? What I've tried so far: Option1: df['etime'] = pd.to_datetime(df['etime'],errors='ignore').dt.strftime('%Y-%m-%d %H:%M') Option 2: for ind in df.index: if (df['etime'].str.len()[ind] == 0) :