pandas | 易学教程

Select rows from a DataFrame based on string values in a column in pandas

阅读更多关于 Select rows from a DataFrame based on string values in a column in pandas

问题 How to select rows from a DataFrame based on string values in a column in pandas? I just want to display the just States only which are in all CAPS. The states have the total number of cities. import pandas as pd import matplotlib.pyplot as plt %pylab inline d = pd.read_csv("states.csv") print(d) print(df) # States/cities B C D # 0 FL 3 5 6 # 1 Orlando 1 2 3 # 2 Miami 1 1 3 # 3 Jacksonville 1 2 0 # 4 CA 8 3 2 # 5 San diego 3 1 0 # 6 San Francisco 5 2 2 # 7 WA 4 2 1 # 8 Seattle 3 1 0 # 9

Select rows from a DataFrame based on string values in a column in pandas

阅读更多关于 Select rows from a DataFrame based on string values in a column in pandas

How to use str.replace to replace multiple pairs at once? [duplicate]

阅读更多关于 How to use str.replace to replace multiple pairs at once? [duplicate]

问题 This question already has answers here : How to replace multiple substrings of a string? (23 answers) Replace multiple substrings in a Pandas series with a value (5 answers) Closed 8 months ago . Currently I am using the following code to make replacements which is a little cumbersome: df1['CompanyA'] = df1['CompanyA'].str.replace('.','') df1['CompanyA'] = df1['CompanyA'].str.replace('-','') df1['CompanyA'] = df1['CompanyA'].str.replace(',','') df1['CompanyA'] = df1['CompanyA'].str.replace(

Extract date from timestamps of multiple time zones in Pandas

阅读更多关于 Extract date from timestamps of multiple time zones in Pandas

问题 I have a Pandas DataFrame in which I've converted hour to local_hour based on the time_zone column. I now want to extract the date from local_hour as local_date but I get an error saying Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True . How can I do this? # Create dataframe import pandas as pd df = pd.DataFrame({ 'hour': ['2019-01-01 05:00:00', '2019-01-01 07:00:00', '2019-01-01 08:00:00'], 'time_zone': ['US/Eastern', 'US/Central', 'US/Mountain'] }) # Convert hour

Extract date from timestamps of multiple time zones in Pandas

阅读更多关于 Extract date from timestamps of multiple time zones in Pandas

Convert csv file to pandas dataframe

阅读更多关于 Convert csv file to pandas dataframe

问题 I have a CSV file in the following format: DATES, 01-12-2010, 01-12-2010, 01-12-2010, 02-12-2010, 02-12-2010, 02-12-2010 UNITS, Hz, kV, MW, Hz, kV, MW Interval, , , , , , 00:15, 49.82, 33.73755, 34.65, 49.92, 33.9009, 36.33, 00:30, 49.9, 33.7722, 35.34, 49.89, 33.8382, 37.65, 00:45, 49.94, 33.8316, 33.5, 50.09, 34.07745, 37.41, 01:00, 49.86, 33.94875, 30.91, 50.18, 34.20945, 36.11, 01:15, 49.97, 34.2243, 27.28, 50.11, 34.3596, 33.24, 01:30, 50.02, 34.3332, 26.91, 50.12, 34.452, 31.03, 01:45,

Convert csv file to pandas dataframe

阅读更多关于 Convert csv file to pandas dataframe

Using pandas to identify nearest objects

阅读更多关于 Using pandas to identify nearest objects

问题 I have an assignment that can be done using any programming language. I chose Python and pandas since I have little experience using these and thought it would be a good learning experience. I was able to complete the assignment using traditional loops that I know from traditional computer programming, and it ran okay over thousands of rows, but it brought my laptop down to a screeching halt once I let it process millions of rows. The assignment is outlined below. You have a two-lane road on

How to groupby two columns and calculate the summation of rows using Pandas?

阅读更多关于 How to groupby two columns and calculate the summation of rows using Pandas?

问题 I have a pandas data frame df like: Name Hour Activity A 4 TT A 3 TT A 5 UU B 1 TT C 1 TT D 1 TT D 2 TT D 3 UU D 4 UU The next step is to get the summation if the rows have identical value of the column Name and Activity . For example, for the case Name: A and Activity: TT will give the summation of 7 The result is the presented as below TT UU A 7 5 B 1 0 C 1 0 D 3 7 Is it possible to do something like this using pandas groupby? 回答1: Try groupby.sum and unstack df_final = df.groupby(['Name',

Pandas replace zero as the nearest average non-zero value

阅读更多关于 Pandas replace zero as the nearest average non-zero value

问题 I have a dataframe: df = pd.DataFrame({'A':[0,0,15,0,0,12,0,0,0,5]}) And I want to replace the 0 value with the nearest non zero value, For example, the first value is 0, then I find the the nearest non-zero value is 15, so I replace it as 15, then the data becomes: [15,0,15,0,0,12,0,0,0,5], Then for all the value except first one, I need to find the both side of the nearest non-zero value, and average them. So for the second 0, it would be (15+15)/2; And the third zero would be (15+12)/2 I