pandas: count things

前端未结

关注

 5  903

In the following, male_trips is a big pandas data frame and stations is a small pandas data frame. For each station id I\'d like to know how many male trips took place. The

相关标签:

5条回答

闹比i

2020-12-23 12:13
I'd do like Vishal but instead of using sum() using size() to get a count of the number of rows allocated to each group of 'start_station_id'. So:
```
df = male_trips.groupby('start_station_id').size()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2020-12-23 12:13
```
male_trips.count()
```
doesnt work? http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html
0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2020-12-23 12:16
how long would this take:
```
df = male_trips.groupby('start_station_id').sum()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
情话喂你

2020-12-23 12:33
edit: after seeing in the answer above that isin and value_counts exist (and value_counts even comes with its own entry in pandas.core.algorithm and also isin isn't simply np.in1d) I updated the three methods below
```
male_trips.start_station_id[male_trips.start_station_id.isin(station.id)].value_counts()
```
You could also do an inner join on stations.id: pd.merge(male_trips, station, left_on='start_station_id', right_on='id') followed by value_counts. Or:
```
male_trips.set_index('start_station_id, inplace=True)
station.set_index('id, inplace=True)
male_trips.ix[male_trips.index.intersection(station.index)].reset_index().start_station_id.value_counts()
```
If you have the time I'd be interested how this performs differently with a huge DataFrame.
0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2020-12-23 12:38
My answer below works in Pandas 0.7.3. Not sure about the new releases.

This is what the pandas.Series.value_counts method is for:
```
count_series = male_trips.start_station_id.value_counts()
```
It should be straight-forward to then inspect count_series based on the values in stations['id']. However, if you insist on only considering those values, you could do the following:
```
count_series = (
                male_trips[male_trips.start_station_id.isin(stations.id.values)]
                    .start_station_id
                    .value_counts()
               )
```
and this will only give counts for station IDs actually found in stations.id.
0 讨论(0)
发布评论:

提交评论
- 加载中...