In the following, male_trips is a big pandas data frame and stations is a small pandas data frame. For each station id I\'d like to know how many male trips took place. The
edit: after seeing in the answer above that isin
and value_counts
exist (and value_counts
even comes with its own entry in pandas.core.algorithm
and also isin
isn't simply np.in1d
) I updated the three methods below
male_trips.start_station_id[male_trips.start_station_id.isin(station.id)].value_counts()
You could also do an inner join on stations.id:
pd.merge(male_trips, station, left_on='start_station_id', right_on='id')
followed by value_counts
.
Or:
male_trips.set_index('start_station_id, inplace=True)
station.set_index('id, inplace=True)
male_trips.ix[male_trips.index.intersection(station.index)].reset_index().start_station_id.value_counts()
If you have the time I'd be interested how this performs differently with a huge DataFrame.