dataframe | 易学教程

Python Matplotlib: how to reduce number of x tick marks for non-numeric tyoe

阅读更多关于 Python Matplotlib: how to reduce number of x tick marks for non-numeric tyoe

问题 I have this very simple pandas dataframe here and i'm trying to plot the index column (Date, which is formatted as string) against 'Adj Close' Column . Adj Close Date 2010-01-01 1.438994 2010-01-04 1.442398 2010-01-05 1.436596 2010-01-06 1.440403 2010-01-07 1.431803 ... a lot more rows Here's my very simple code: import matplotlib.pyplot as plt fig, ax = plt.subplots() ax.plot(df['Adj Close'], label='Close Price History') fig.autofmt_xdate() however the graph is unpleasant in the sense that

Construct pandas DataFrame from nested dictionaries having list as item

阅读更多关于 Construct pandas DataFrame from nested dictionaries having list as item

问题 I have several dictionary data and I want to convert to Pandas DataFrame. However, due to unnecessary key '0' (for me), I've obtained undesirable format of DataFrame when I convert these dict to DataFrame. Actually, these dicts are short part of whole data. dict1 = {1: {0: [-0.022, -0.017]}, 2: {0: [0.269, 0.271]}, 3: {0: [0.118, 0.119]}, 4: {0: [0.057, 0.061]}, 5: {0: [-0.916, -0.924]}} dict2 = {1: {0: [0.384, 0.398]}, 2: {0: [0.485, 0.489]}, 3: {0: [0.465, 0.469]}, 4: {0: [0.456, 0.468]}, 5

Self Joining in R

阅读更多关于 Self Joining in R

问题 Here is a sample tibble: test <- tibble(a = c("dd1","dd2","dd3","dd4","dd5"), name = c("a", "b", "c", "d", "e"), b = c("dd3","dd4","dd1","dd5","dd2")) And I want to add a new column b_name as self-join to test using: dplyr::inner_join(test, test, by = c("a" = "b")) My table is way to large (2.7M rows with 4 columns) and I get the following error: Error: std::bad_alloc Please advise how to do it right / best practice. My final goal is to get the following structure: a name b b_name dd1 a dd3 c

Pandas dataframe - create new column based on simple calcuation

阅读更多关于 Pandas dataframe - create new column based on simple calcuation

问题 I want to make a calculation based on 4 columns in a dataframe and apply the result to a new column. The 4 columns I'm interested in are as follows. rating_1, time_1, rating_2, time_2 col_x col_y etc 0 1 1 1 1 1 1 If time_1 is greater than time_2 I want rating_1 in the new column, if time_2 is greater I want rating_2 in the column. What's the simplest way to do this please? 回答1: you can use numpy.where() method: In [241]: x Out[241]: rating_1 time_1 rating_2 time_2 col_x col_y 0 11 1 21 1 1 1

how to subset rows in specific columns based on minimum values in individual columns in a dataframe using R

阅读更多关于 how to subset rows in specific columns based on minimum values in individual columns in a dataframe using R

问题 we have a data frame that has 1000's of rows with multiple columns. the sample data frame is presented below df1 <- data.frame(X = c(7.48, 7.82, 8.15, 8.47, 8.80, 9.20, 9.51, 9.83, 10.13, 10.59, 7.59, 8.06, 8.39, 8.87, 9.26, 9.64, 10.09, 10.48, 10.88, 11.45), Y = c(49.16, 48.78, 48.40, 48.03, 47.65, 47.24, 46.87, 46.51, 46.15, 45.73, 48.70, 48.18, 47.72, 47.20, 46.71, 46.23, 45.72, 45.24, 44.77, 44.23), ID = c("B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1_2", "B_1

Find values from a column in a DF at very specific times for every unique date

阅读更多关于 Find values from a column in a DF at very specific times for every unique date

问题 I asked this question and I got an answer which works for a general case with sequential and non missing data but not for my case specifically. I have a DF that looks as follows. eventTime MeteredEnergy Demand RunningHoursLamps 6/7/2018 0:00 67.728 64 1037.82 6/7/2018 1:00 67.793 64 1038.82 6/7/2018 2:00 67.857 64 1039.82 6/7/2018 3:00 67.922 64 1040.82 6/7/2018 4:00 67.987 64 1041.82 6/7/2018 5:00 64 1042.82 6/7/2018 6:00 1043.43 6/7/2018 23:00 68.288 6/8/2018 0:00 67.728 64 1037.82 6/8/2018

Changing pandas dataframe values based on dictionary

阅读更多关于 Changing pandas dataframe values based on dictionary

问题 Is there any way to replace value using dictionary or mapping? I have dataframe like this: Q14r63: Audi Q14r2: BMW Q14r1: VW Selected Not Selected Not Selected Not Selected Selected Selected Selected Selected Not Selected and i have another dataframe which provides codes for the Brands. This df ofcourse can be changed into dictionary also. Brand Code Audi 63 BMW 2 VW 1 Is there any way to get output where "selected" values in main df can be be changes with car brand? Desired Output Q14r63:

How can I summarize several pandas dataframe columns into a parent column name?

阅读更多关于 How can I summarize several pandas dataframe columns into a parent column name?

问题 I've a dataframe which looks like this some feature another feature label sample 0 ... ... ... and I'd like to get a dataframe with multiindexed columns like this features label sample some another 0 ... ... ... From the API it's not clear to me how to use from_arrays() , from_product() , from_tuples() or from_frame() correctly. The solution shall not depend on string parsing of the feature columns ( some feature , another feature ). The last column for the label is the last column and it's

Python Pandas Groupby/Append columns

阅读更多关于 Python Pandas Groupby/Append columns

问题 This is my example dataframe: Index Param1 Param2 A 1 2 A 3 4 B 1 3 B 4 Nan C 2 4 What I would like to get is: Index Param1 Param2 Param3 Param4 A 1 2 3 4 B 1 3 4 C 2 4 What would be the best way to achieve it using pandas? Thanks in advance for your help. 回答1: You can use groupby with unstack: def f(x): return (pd.DataFrame(np.sort(x.values.ravel()))) df = df.groupby('Index')['Param1','Param2'].apply(f).unstack() df.columns = df.columns.droplevel(0) print (df) 0 1 2 3 Index A 1 2 3 4 B 1 3 4

How to remove duplicate comma separated character values from each cell of a column using R

阅读更多关于 How to remove duplicate comma separated character values from each cell of a column using R

问题 I have a data-frame with 2 columns ID and Product as below : ID Product A Clothing, Clothing Food, Furniture, Furniture B Food,Food,Food, Clothing C Food, Clothing, Clothing I need to have only unique products for each ID, for example : ID Product A Clothing, Food, Furniture B Food, Clothing C Food, Clothing How do I do this using R 回答1: If there are multiple delimiters in the dataset, one way would be to split the 'Product' column using all the delimiters, get the unique and then paste it