dataframe

Python Matplotlib: how to reduce number of x tick marks for non-numeric tyoe

两盒软妹~` 提交于 2021-01-28 22:10:23
问题 I have this very simple pandas dataframe here and i'm trying to plot the index column (Date, which is formatted as string) against 'Adj Close' Column . Adj Close Date 2010-01-01 1.438994 2010-01-04 1.442398 2010-01-05 1.436596 2010-01-06 1.440403 2010-01-07 1.431803 ... a lot more rows Here's my very simple code: import matplotlib.pyplot as plt fig, ax = plt.subplots() ax.plot(df['Adj Close'], label='Close Price History') fig.autofmt_xdate() however the graph is unpleasant in the sense that

Construct pandas DataFrame from nested dictionaries having list as item

这一生的挚爱 提交于 2021-01-28 21:30:40
问题 I have several dictionary data and I want to convert to Pandas DataFrame. However, due to unnecessary key '0' (for me), I've obtained undesirable format of DataFrame when I convert these dict to DataFrame. Actually, these dicts are short part of whole data. dict1 = {1: {0: [-0.022, -0.017]}, 2: {0: [0.269, 0.271]}, 3: {0: [0.118, 0.119]}, 4: {0: [0.057, 0.061]}, 5: {0: [-0.916, -0.924]}} dict2 = {1: {0: [0.384, 0.398]}, 2: {0: [0.485, 0.489]}, 3: {0: [0.465, 0.469]}, 4: {0: [0.456, 0.468]}, 5

Self Joining in R

淺唱寂寞╮ 提交于 2021-01-28 21:06:13
问题 Here is a sample tibble: test <- tibble(a = c("dd1","dd2","dd3","dd4","dd5"), name = c("a", "b", "c", "d", "e"), b = c("dd3","dd4","dd1","dd5","dd2")) And I want to add a new column b_name as self-join to test using: dplyr::inner_join(test, test, by = c("a" = "b")) My table is way to large (2.7M rows with 4 columns) and I get the following error: Error: std::bad_alloc Please advise how to do it right / best practice. My final goal is to get the following structure: a name b b_name dd1 a dd3 c

Pandas dataframe - create new column based on simple calcuation

情到浓时终转凉″ 提交于 2021-01-28 20:16:50
问题 I want to make a calculation based on 4 columns in a dataframe and apply the result to a new column. The 4 columns I'm interested in are as follows. rating_1, time_1, rating_2, time_2 col_x col_y etc 0 1 1 1 1 1 1 If time_1 is greater than time_2 I want rating_1 in the new column, if time_2 is greater I want rating_2 in the column. What's the simplest way to do this please? 回答1: you can use numpy.where() method: In [241]: x Out[241]: rating_1 time_1 rating_2 time_2 col_x col_y 0 11 1 21 1 1 1

how to subset rows in specific columns based on minimum values in individual columns in a dataframe using R

感情迁移 提交于 2021-01-28 20:10:40
问题 we have a data frame that has 1000's of rows with multiple columns. the sample data frame is presented below df1 <- data.frame(X = c(7.48, 7.82, 8.15, 8.47, 8.80, 9.20, 9.51, 9.83, 10.13, 10.59, 7.59, 8.06, 8.39, 8.87, 9.26, 9.64, 10.09, 10.48, 10.88, 11.45), Y = c(49.16, 48.78, 48.40, 48.03, 47.65, 47.24, 46.87, 46.51, 46.15, 45.73, 48.70, 48.18, 47.72, 47.20, 46.71, 46.23, 45.72, 45.24, 44.77, 44.23), ID = c("B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1", "B_1_2", "B_1

Find values from a column in a DF at very specific times for every unique date

夙愿已清 提交于 2021-01-28 20:01:47
问题 I asked this question and I got an answer which works for a general case with sequential and non missing data but not for my case specifically. I have a DF that looks as follows. eventTime MeteredEnergy Demand RunningHoursLamps 6/7/2018 0:00 67.728 64 1037.82 6/7/2018 1:00 67.793 64 1038.82 6/7/2018 2:00 67.857 64 1039.82 6/7/2018 3:00 67.922 64 1040.82 6/7/2018 4:00 67.987 64 1041.82 6/7/2018 5:00 64 1042.82 6/7/2018 6:00 1043.43 6/7/2018 23:00 68.288 6/8/2018 0:00 67.728 64 1037.82 6/8/2018

Changing pandas dataframe values based on dictionary

强颜欢笑 提交于 2021-01-28 19:58:06
问题 Is there any way to replace value using dictionary or mapping? I have dataframe like this: Q14r63: Audi Q14r2: BMW Q14r1: VW Selected Not Selected Not Selected Not Selected Selected Selected Selected Selected Not Selected and i have another dataframe which provides codes for the Brands. This df ofcourse can be changed into dictionary also. Brand Code Audi 63 BMW 2 VW 1 Is there any way to get output where "selected" values in main df can be be changes with car brand? Desired Output Q14r63:

How can I summarize several pandas dataframe columns into a parent column name?

二次信任 提交于 2021-01-28 19:41:43
问题 I've a dataframe which looks like this some feature another feature label sample 0 ... ... ... and I'd like to get a dataframe with multiindexed columns like this features label sample some another 0 ... ... ... From the API it's not clear to me how to use from_arrays() , from_product() , from_tuples() or from_frame() correctly. The solution shall not depend on string parsing of the feature columns ( some feature , another feature ). The last column for the label is the last column and it's

Python Pandas Groupby/Append columns

て烟熏妆下的殇ゞ 提交于 2021-01-28 19:30:43
问题 This is my example dataframe: Index Param1 Param2 A 1 2 A 3 4 B 1 3 B 4 Nan C 2 4 What I would like to get is: Index Param1 Param2 Param3 Param4 A 1 2 3 4 B 1 3 4 C 2 4 What would be the best way to achieve it using pandas? Thanks in advance for your help. 回答1: You can use groupby with unstack: def f(x): return (pd.DataFrame(np.sort(x.values.ravel()))) df = df.groupby('Index')['Param1','Param2'].apply(f).unstack() df.columns = df.columns.droplevel(0) print (df) 0 1 2 3 Index A 1 2 3 4 B 1 3 4

How to remove duplicate comma separated character values from each cell of a column using R

大憨熊 提交于 2021-01-28 19:16:37
问题 I have a data-frame with 2 columns ID and Product as below : ID Product A Clothing, Clothing Food, Furniture, Furniture B Food,Food,Food, Clothing C Food, Clothing, Clothing I need to have only unique products for each ID, for example : ID Product A Clothing, Food, Furniture B Food, Clothing C Food, Clothing How do I do this using R 回答1: If there are multiple delimiters in the dataset, one way would be to split the 'Product' column using all the delimiters, get the unique and then paste it