dataframe | 易学教程

Pyspark explode json string

阅读更多关于 Pyspark explode json string

问题 Input_dataframe id name collection 111 aaaaa {"1":{"city":"city_1","state":"state_1","country":"country_1"}, "2":{"city":"city_2","state":"state_2","country":"country_2"}, "3":{"city":"city_3","state":"state_3","country":"country_3"} } 222 bbbbb {"1":{"city":"city_1","state":"state_1","country":"country_1"}, "2":{"city":"city_2","state":"state_2","country":"country_2"}, "3":{"city":"city_3","state":"state_3","country":"country_3"} } here id ==> string name ==> string collection ==> string

Pandas Dataframe splitting a column with dict values into columns

阅读更多关于 Pandas Dataframe splitting a column with dict values into columns

问题 I am attempting to split and convert a column, in a pandas dataframe, with list of dictionary values into a new columns. Using Splitting dictionary/list inside a Pandas Column into Separate Columns as a reference things appear to fail because some of the values are NaN. When these rows are encountered an error is thrown, can't iterate over float and if I fillna with None the error changes to a str related error. I have attempted to first use: df.explode('freshness_grades') df_new = pd.concat(

How can I turn this nested JSON into a DataFrame?

阅读更多关于 How can I turn this nested JSON into a DataFrame?

问题 So I have a piece of JSON code and I want to turn it into a DataFrame, however I am quite new to DataFrames so I am a bit stuck. Any help would be appreciated :) So this is my code: data = response.json() data_pretty = json.dumps(data, sort_keys=True, indent=4) data_frame = pd.DataFrame(data) # Pretty print print(data_pretty) print(data_frame) This is the output: { "status": "OK", "users": [ { "email": "raf@webconexus.nl", "first_name": "Raf", "id": "24959", "last_name": "Rasenberg" }, {

How to convert pandas dataframe to uniquely structured nested json

阅读更多关于 How to convert pandas dataframe to uniquely structured nested json

问题 I have a DF with structure as follows: traffic_group app_id key category factors 0 desktop app1 CI html 16.618628 1 desktop app1 CI xhr 35.497082 2 desktop app1 IP html 18.294468 3 desktop app1 IP xhr 30.422464 4 desktop app2 CI html 11.028240 5 desktop app2 CI json 33.548279 6 mobile app1 IP html 12.808367 7 mobile app1 IP image 14.410633 I need to output it to a json of the following structure: { "desktop": { app1: [ { "key": "CI", "threshold: 1, "window": 60, "factors: { "html" : 16.618628

vectorization of multiple return of a complex function in a dataframe

阅读更多关于 vectorization of multiple return of a complex function in a dataframe

问题 I am trying to plot various data including complex vectors.Thanks to contributors see answers https://stackoverflow.com/a/64480659/13953414, i managed to generate the dataframes but i get stuck when i want to add some additional calculations. i get an error : df['T_depth'] = (math.sqrt(D / (4 * (math.pi) * frequency)) / 1e-6), TypeError: only size-1 arrays can be converted to Python scalars all calculations starting from T_depth are not executed due to a format issue. the function were

Pandas DataFrames If else condition on multiple columns [duplicate]

阅读更多关于 Pandas DataFrames If else condition on multiple columns [duplicate]

问题 This question already has answers here : Multiple logical comparisons in pandas df (3 answers) Closed 8 months ago . I have a Data frame as shown below import pandas as pd df = pd.DataFrame({ "name": ["john","peter","john","alex"], "height": [6,5,4,4], "shape": ["null","null","null","null"] }) I want to apply this--- If name == john and height == 6 return shape = good else if height == 4 return shape = bad else change the shape to middle so the final Dataframe should look like this df = ({

TypeError: only integer scalar arrays can be converted to a scalar index when use Pandas Fillna

阅读更多关于 TypeError: only integer scalar arrays can be converted to a scalar index when use Pandas Fillna

问题 Update : It seems due to .loc ,if i uses the original df from pd.read_excel, it is fine. I have a dataframe with Dtypes as follows. This is csv for the dataframe : CSV File Date datetime64[ns] Amout float64 Currency object ID object I used the following code to replace NaT, NaN a=np.datetime64('2000-01-01') values={'Date':a,'Amount':0,'Currency':'0','ID':'0'} df.fillna(value=values,inplace=True) However, I got the error : TypeError: only integer scalar arrays can be converted to a scalar

Call specific columns with regular expression pandas

阅读更多关于 Call specific columns with regular expression pandas

问题 Using pandas, I want to do something like this while looping through data frames: for body_part, columns in zip(self.body_parts, usecols_gen()): body_part_df = self.read_csv(usecols=columns) if self.normalize: body_part_df[r'x(\.\d)?'] = body_part_df[r'x(\.\d)?'].apply(lambda x: x/x_max) print(body_part_df) result[body_part] = body_part_df I use regular expressions because the column names I refer to are mangled: x, x.1, x.2, ..., x.n This gives KeyError, and I don't understand the reason.

python pandas dtypes detection from sql

阅读更多关于 python pandas dtypes detection from sql

问题 I am quite troubled by the behaviour of Pandas DataFrame about Dtype detection. I use 'read_sql_query' to retrieve data from a database to build a DataFrame, and then dump it into a csv file. I don't need any transformation. Just dump it into a csv file and change date fields in the form : '%d/%m/%Y' However : self.dataframe.to_csv(self.fic, index=False, header=False, sep='|', mode='a', encoding='utf-8', line_terminator='\n', date_format='%d/%m/%Y ) Would miss to transforme/format some date

Convert dataframe column to datetime only if length of string is not zero

阅读更多关于 Convert dataframe column to datetime only if length of string is not zero

问题 I'd like to convert a dataframe column which has a date string. But in some cases, the date string might be empty due to certain conditions. So I just want all the other rows in that column to be converted to datetime format except the rows in that particular column which might be blank. Is it possible? What I've tried so far: Option1: df['etime'] = pd.to_datetime(df['etime'],errors='ignore').dt.strftime('%Y-%m-%d %H:%M') Option 2: for ind in df.index: if (df['etime'].str.len()[ind] == 0) :