问题
I need some help.
Let's say I have the below dataframe called venues_df
I also have this function: return_most_common_venues
def return_most_common_venues(row, 4):
# Selects the row values
row_values = row.iloc[1:]
# Sorts the selected row values
row_values_sorted = row_values.sort_values(ascending=False)
# Returns the column name of the first 4 sorted values
return row_values_sorted.index.values[0:4]
If I apply my function on the first row:
return_most_common_venues(venues_df.iloc[0, :], 4)
The result will be an array (the below tables are for illustration purposes):
array (['Bar', 'Restaurant', 'Park', 'Gym'])
The problem is when I apply my function to the second row.
return_most_common_venues(venues_df.iloc[1, :], 4)
I will get
array(['Park', 'Restaurant', 'Gym', 'SuperMarket'])
What I need is for it to return:
array (['Bar', 'Restaurant', 'Not Available', 'Not Available'])
If the value is zero I need it to return 'Not Available' instead of the column names "Gym' and 'SuperMarket'
How can I modify my function to return what i need?
Thank you for your help!
Efren
回答1:
def return_most_common_venues(df, row, cols):
# Selects the row values
row_values = df.loc[row]
# Sorts the selected row values
row_values_sorted = row_values[np.argsort(row_values)[-cols:]][::-1]
# Returns the column name of the first 4 sorted values
return [index if value > 0 and value != np.nan else "Not Available" for index, value in zip(row_values_sorted.index, row_values_sorted.values)]
return_most_common_venues(df, row=1, cols=4)
Output:
['Park', 'Restaurant', 'Not Available', 'Not Available']
回答2:
I suggest the following based on this question:
import pandas as pd
def return_most_common_venues(row, nb_return_values=4):
# Selects the row values
row_values = row.iloc[1:]
# Sorts the selected row values
row_values_sorted = row_values.sort_values(ascending=False)
# Returns the column name of the first 4 sorted values
output = list(row_values_sorted.index.values[0:nb_return_values])\
+ ['Not available'] * (nb_return_values - len(row_values_sorted.index))
return output
df = pd.DataFrame([[7, 4, 1, 5, 9, 3], [5, 0, 0, 8, 0, 0]],
columns=["Restaurant", "Gym", "Supermarket", "Park", "Bar", "Café"],
index=[0,1])
return_most_common_venues(df.iloc[1, :], 4)
And the result is :
['Park', 'Not available', 'Not available', 'Not available']
来源:https://stackoverflow.com/questions/61420721/function-that-will-go-though-a-column-if-the-number-is-above-0-return-column-na