pandas

How to create a pandas DataFrame column based on the existence of values in a subset of columns, by row?

风流意气都作罢 提交于 2021-02-11 08:47:26
问题 I have a pandas DataFrame as follows: import pandas as pd data1 = {"column1": ["A", "B", "C", "D", "E", "F", "G"], "column2": [338, 519, 871, 1731, 2693, 2963, 3379], "column3": [5, 1, 8, 3, 731, 189, 9], "columnA" : [5, 0, 75, 150, 0, 0, 0], "columnB" : [0, 32, 0, 96, 0, 51, 0], "columnC" : [0, 42, 0, 42, 0, 42, 42]} df = pd.DataFrame(data1) df >>> column1 column2 column3 columnA columnB columnC 0 A 338 5 5 0 0 1 B 519 1 0 32 42 2 C 871 8 75 0 0 3 D 1731 3 150 96 42 4 E 2693 731 0 0 0 5 F

Why doesn't iLocation based boolean indexing work?

时光怂恿深爱的人放手 提交于 2021-02-11 08:40:53
问题 I was trying to filter a Dataframe and thought that if a loc takes a boolean list as an input to filter, it should also work in the case for iloc . Eg. import pandas as pd df = pd.read_csv('https://query.data.world/s/jldxidygjltewualzthzkaxtdrkdvq') df.iloc[[True,False,True]] #works df.loc[[True,False,True]] #works df.loc[df['PointsPerGame'] > 10.0] #works df.iloc[df['PointsPerGame'] > 10.0] # DOES NOT WORK The documentation states that both loc and iloc accept a boolean array as an argument.

using isin() for a column that has list values

这一生的挚爱 提交于 2021-02-11 08:15:13
问题 I have two dataframes. Dataframe A has a column that consists of list values of ids (named items). Dataframe B has a column of int values of ids (named id). Dataframe A: date | items 2019-06-05 | [121, 123, 124] 2019-06-06 | [109, 125] 2019-06-07 | [108, 126] Dataframe B: name | id item1 | 121 item2 | 122 item3 | 123 item4 | 124 item5 | 125 item6 | 126 I want to filter the Dataframe A and keep only the rows that all values of items in that row exist in the id column of Dataframe B. Based on

how do I update pandas library using conda in mac os?

本秂侑毒 提交于 2021-02-11 08:10:37
问题 I installed anaconda in my Mac and it shows pandas version as 1.0.5. (using conda list ) I want to upgrade my pandas version. How can I do it using conda commands? I tried conda update pandas but it shows me this conda update pandas Collecting package metadata (current_repodata.json): done Solving environment: | Updating pandas is constricted by anaconda -> requires pandas==1.0.5=py38h959d312_0 If you are sure you want an update of your package either try `conda update --all` or install a

Filter Folium Map based on marker color

拥有回忆 提交于 2021-02-11 07:57:22
问题 I am mapping markers that have a row called "marker_color" indicating "red", "yellow", and "green" based on other column values. How can I add a filter option to the corner of my map that will allow me to only show one, two, all, or none of the markers based on color? Basically, three clickable radio options to render the three colored markers. Currently, I am mapping all markers like so from my sales_colored dataframe: basemap2 = generateBaseMap() for index, row in sales_colored.iterrows():

Repeat sections of dataframe based on a column value

纵饮孤独 提交于 2021-02-11 07:23:01
问题 I'm collecting data over the course of many days and rather than filling it in for every day, I can elect to say that the data from one day should really be a repeat of another day. I'd like to repeat some of the rows from my existing data frame into the days specified as repeats. I have a column that indicates which day the current day is to repeat from but I am getting stuck with errors. I have found ways to repeat rows n times based a column value but I am trying to use a column as an

Check multiple columns data format and append results to one column in Pandas

风流意气都作罢 提交于 2021-02-11 07:14:10
问题 Given a toy dataset as follows: id room area situation 0 1 A-102 world under construction 1 2 NaN 24 under construction 2 3 B309 NaN NaN 3 4 C·102 25 under decoration 4 5 E_1089 hello under decoration 5 6 27 NaN under plan 6 7 27 NaN NaN I need to check three columns: room, area, situation based on the following conditions: (1) if room name is not number, alphabet, - ( NaN s are also considered as invalid one), then returns incorrect room name for check column; (2) if area is not number or

Pandas column multi-index to rows

女生的网名这么多〃 提交于 2021-02-11 07:11:54
问题 I'm using yfinance to download price history for multiple symbols, which returns a df with multiple indexes. For example: import yfinance as yf df = yf.download(tickers = ['AAPL', 'MSFT'], period = '2d') A similar dataframe could be constructed without yfinance like: import pandas as pd pd.options.display.float_format = '{:.2f}'.format import numpy as np attributes = ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume'] symbols = ['AAPL', 'MSFT'] dates = ['2020-07-23', '2020-07-24'] data =

Difference between two dates in Pandas DataFrame

℡╲_俬逩灬. 提交于 2021-02-11 07:10:57
问题 I have many columns in a data frame and I have to find the difference of time in two column named as in_time and out_time and put it in the new column in the same data frame. The format of time is like this 2015-09-25T01:45:34.372Z . I am using Pandas DataFrame. I want to do like this: df.days = df.out_time - df.in_time I have many columns and I have to increase 1 more column in it named days and put the differences there. 回答1: You need to convert the strings to datetime dtype, you can then

Pandas column multi-index to rows

◇◆丶佛笑我妖孽 提交于 2021-02-11 07:10:21
问题 I'm using yfinance to download price history for multiple symbols, which returns a df with multiple indexes. For example: import yfinance as yf df = yf.download(tickers = ['AAPL', 'MSFT'], period = '2d') A similar dataframe could be constructed without yfinance like: import pandas as pd pd.options.display.float_format = '{:.2f}'.format import numpy as np attributes = ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume'] symbols = ['AAPL', 'MSFT'] dates = ['2020-07-23', '2020-07-24'] data =