How can I list all the stops associated with a route using GTFS?

一世执手 提交于 2019-11-28 17:43:07

问题


I'm working with some GTFS data and would like to be able to create a list of all stops associated served by a route. I don't really understand how to do with with GTFS data.

Trips.txt comes in a format like this:

route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id 1,A20120610WKD,A20120610WKD_000800_1..S03R,SOUTH FERRY,1,,1..S03R 1,A20120610WKD,A20120610WKD_002700_1..S03R,SOUTH FERRY,1,,1..S03R 1,A20120610WKD,A20120610WKD_004700_1..S03R,SOUTH FERRY,1,,1..S03R 1,A20120610WKD,A20120610WKD_006700_1..S03R,SOUTH FERRY,1,,1..S03R 1,A20120610WKD,A20120610WKD_008700_1..S03R,SOUTH FERRY,1,,1..S03R

I tried reading in the matching shape using the shape_id and then looking for stops with matching latitudes and longitudes but that doesn't seem to work reliably. Does anybody know how to do this?


回答1:


As you've noticed, there isn't a direct relationship between routes and stops in GTFS. Instead, stops are associated with trips, where each trip represents a single "run" of a vehicle along a particular route. This reflects the fact a route does not necessarily serve every one of its stops at all times—on weekends it might skip stops outside a high school, for instance.

So getting a list of every stop served by a route involves combining several models:

  • routes.txt gives you the route ID for the route you're interested in.
  • trips.txt gives you a set of trip IDs for that route.
  • stop_times.txt gives you a set of stop IDs for the stops served on each of these trips.
  • stops.txt gives you information about each of these stops.

Assuming you're using an SQL database to store your GTFS data, you might use a query like this (once you've obtained the route ID):

SELECT stop_id, stop_name FROM stops WHERE stop_id IN (
  SELECT DISTINCT stop_id FROM stop_times WHERE trip_id IN (
    SELECT trip_id FROM trips WHERE route_id = <route_id>));

Remember, though, this will output a record for every stop that is ever served by the route. If you're generating schedule information for a rider you'll probably want to limit the query to only trips running today and only stop times with departures in, say, the next thirty minutes.


Update: I wrote the above SQL query the way I did as I felt it most simply illustrated the relationship between the GTFS models, but btse is correct (in his answer below) that a query like this would never actually be used in production. It's too slow. You would instead use table joins and indices to keep query times reasonable.

Here is an equivalent query, written in a way more suited to being copied and pasted into a real application:

SELECT DISTINCT stops.stop_id, stops.stop_name
  FROM trips
  INNER JOIN stop_times ON stop_times.trip_id = trips.trip_id
  INNER JOIN stops ON stops.stop_id = stop_times.stop_id
  WHERE route_id = <route_id>;

Typically you would also create an index for each column used in a JOIN or WHERE clause, which in this case would mean:

CREATE INDEX stop_times_trip_id_index ON stop_times(trip_id);

CREATE INDEX trips_route_id_index ON trips(route_id);

(Note that RDBMSes normally index each table by its primary key automatically, so there is no need to explicitly create an index on stops.stop_id.)

Many further optimizations are possible, depending on the specific DBMS in use and your willingness to sacrifice disk space for performance. But these commands will yield good performance on virtually any RDBMS without needlessly sacrificing clarity.




回答2:


I came across this post in my Google searches and I figured I would update it with a better answer in case anyone else stumbles upon it. The answer that Simon gave is 100% correct, however, the query he provided is quite slow for large GTFS feeds. Here is a query that does the same thing, but performs significantly faster.

Just to give you some anecdotal evidence, for a GTFS feed of about 50mb, Simon's query took anywhere from 10-25 seconds to complete. The statement below takes consistently < 0.2 seconds.

SELECT T3.stop_id, T3.stop_name 
FROM trips AS T1
JOIN
stop_times AS T2
ON T1.trip_id=T2.trip_id AND route_id = <routeid>
JOIN stops AS T3
ON T2.stop_id=T3.stop_id
GROUP BY T3.stop_id, T3.stop_name

UPDATE:

I realized I didn't mention this before, but of course you will want to have indexes where each of the tables are being joined.




回答3:


If you GROUP BY shape_id when selecting from trips you can make the query even faster.

Using @btse's query to get the unique stops for two routes takes 1.147s.

My equivalent query takes 0.4s.

SELECT unique_stops.route_id, unique_stops.stop_id, stop_name, stop_desc, stop_lat, stop_lon
FROM
  stops,
  (SELECT stop_id, route_id
   FROM
     stop_times,
     (SELECT trip_id, route_id
      FROM trips
      WHERE route_id IN (801, 803)
      GROUP BY shape_id
     ) AS unique_trips
   WHERE stop_times.trip_id = unique_trips.trip_id
   GROUP BY stop_id) AS unique_stops
WHERE stops.stop_id = unique_stops.stop_id



回答4:


If you're working in R you could do this to find routes that stop at your target destination X:

require(dplyr)

routesX <- routes %>%
  left_join(trips %>% select(trip_id, route_id, shape_id)) %>%
  left_join(stop_times %>% select(trip_id, stop_id)) %>%
  semi_join(stops %>% filter(grepl('X', stop_name, ignore.case = T)), by = c('stop_id' = 'stop_code')) %>%
  select(names(routes), shape_id) %>%
  unique 



回答5:


If you use "onebusaway", there is a quick way to do this without touching GTFS

Lets say you want to know the bus stops for bus route "M1" in Manhattan, NYC

http://bustime.mta.info/api/where/stops-for-route/MTA%20NYCT_M1.json?key=yourapikey&includePolylines=false&version=2

will give you a json feed then you can extract bus stops for both direction on route M1.



来源:https://stackoverflow.com/questions/13407468/how-can-i-list-all-the-stops-associated-with-a-route-using-gtfs

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!