pandas | 易学教程

How to create a pandas DataFrame column based on the existence of values in a subset of columns, by row?

阅读更多关于 How to create a pandas DataFrame column based on the existence of values in a subset of columns, by row?

问题 I have a pandas DataFrame as follows: import pandas as pd data1 = {"column1": ["A", "B", "C", "D", "E", "F", "G"], "column2": [338, 519, 871, 1731, 2693, 2963, 3379], "column3": [5, 1, 8, 3, 731, 189, 9], "columnA" : [5, 0, 75, 150, 0, 0, 0], "columnB" : [0, 32, 0, 96, 0, 51, 0], "columnC" : [0, 42, 0, 42, 0, 42, 42]} df = pd.DataFrame(data1) df >>> column1 column2 column3 columnA columnB columnC 0 A 338 5 5 0 0 1 B 519 1 0 32 42 2 C 871 8 75 0 0 3 D 1731 3 150 96 42 4 E 2693 731 0 0 0 5 F

Why doesn't iLocation based boolean indexing work?

阅读更多关于 Why doesn't iLocation based boolean indexing work?

问题 I was trying to filter a Dataframe and thought that if a loc takes a boolean list as an input to filter, it should also work in the case for iloc . Eg. import pandas as pd df = pd.read_csv('https://query.data.world/s/jldxidygjltewualzthzkaxtdrkdvq') df.iloc[[True,False,True]] #works df.loc[[True,False,True]] #works df.loc[df['PointsPerGame'] > 10.0] #works df.iloc[df['PointsPerGame'] > 10.0] # DOES NOT WORK The documentation states that both loc and iloc accept a boolean array as an argument.

using isin() for a column that has list values

阅读更多关于 using isin() for a column that has list values

how do I update pandas library using conda in mac os?

阅读更多关于 how do I update pandas library using conda in mac os?

问题 I installed anaconda in my Mac and it shows pandas version as 1.0.5. (using conda list ) I want to upgrade my pandas version. How can I do it using conda commands? I tried conda update pandas but it shows me this conda update pandas Collecting package metadata (current_repodata.json): done Solving environment: | Updating pandas is constricted by anaconda -> requires pandas==1.0.5=py38h959d312_0 If you are sure you want an update of your package either try `conda update --all` or install a

Filter Folium Map based on marker color

阅读更多关于 Filter Folium Map based on marker color

问题 I am mapping markers that have a row called "marker_color" indicating "red", "yellow", and "green" based on other column values. How can I add a filter option to the corner of my map that will allow me to only show one, two, all, or none of the markers based on color? Basically, three clickable radio options to render the three colored markers. Currently, I am mapping all markers like so from my sales_colored dataframe: basemap2 = generateBaseMap() for index, row in sales_colored.iterrows():

Repeat sections of dataframe based on a column value

阅读更多关于 Repeat sections of dataframe based on a column value

问题 I'm collecting data over the course of many days and rather than filling it in for every day, I can elect to say that the data from one day should really be a repeat of another day. I'd like to repeat some of the rows from my existing data frame into the days specified as repeats. I have a column that indicates which day the current day is to repeat from but I am getting stuck with errors. I have found ways to repeat rows n times based a column value but I am trying to use a column as an

Check multiple columns data format and append results to one column in Pandas

阅读更多关于 Check multiple columns data format and append results to one column in Pandas

问题 Given a toy dataset as follows: id room area situation 0 1 A-102 world under construction 1 2 NaN 24 under construction 2 3 B309 NaN NaN 3 4 C·102 25 under decoration 4 5 E_1089 hello under decoration 5 6 27 NaN under plan 6 7 27 NaN NaN I need to check three columns: room, area, situation based on the following conditions: (1) if room name is not number, alphabet, - ( NaN s are also considered as invalid one), then returns incorrect room name for check column; (2) if area is not number or

Pandas column multi-index to rows

阅读更多关于 Pandas column multi-index to rows

问题 I'm using yfinance to download price history for multiple symbols, which returns a df with multiple indexes. For example: import yfinance as yf df = yf.download(tickers = ['AAPL', 'MSFT'], period = '2d') A similar dataframe could be constructed without yfinance like: import pandas as pd pd.options.display.float_format = '{:.2f}'.format import numpy as np attributes = ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume'] symbols = ['AAPL', 'MSFT'] dates = ['2020-07-23', '2020-07-24'] data =

Difference between two dates in Pandas DataFrame

阅读更多关于 Difference between two dates in Pandas DataFrame

问题 I have many columns in a data frame and I have to find the difference of time in two column named as in_time and out_time and put it in the new column in the same data frame. The format of time is like this 2015-09-25T01:45:34.372Z . I am using Pandas DataFrame. I want to do like this: df.days = df.out_time - df.in_time I have many columns and I have to increase 1 more column in it named days and put the differences there. 回答1: You need to convert the strings to datetime dtype, you can then

Pandas column multi-index to rows

阅读更多关于 Pandas column multi-index to rows