I’ve posted many times about the considerable value of open standards for real-time transit data. While it’s always best if a transit authority offers its own feeds using open standards like GTFS-realtime or SIRI, converting available real-time data from a proprietary API into an open format still gets the job done. After a few months of kicking the problem around, I’ve finally written a tool to produce GTFS-realtime StopTimeUpdate, VehiclePosition, and Alert messages for Metrobus, as well as GTFS-realtime Alert messages for Metrorail.
The tool, wmata-gtfsrealtime, isn’t nearly as straightforward as it might be, because while the WMATA API appears to provide all of the information you’d need to create a GTFS-realtime feed, you’ll quickly discover that the route, stop, and trip identifiers returned by the API bear no relation to those used in WMATA’s GTFS feed.
One of the basic tenets of GTFS-realtime is that it is designed to directly integrate with GTFS, and for that reason identifiers must be shared across GTFS and GTFS-realtime feeds.
In WMATA’s case, this means that it is necessary to first map routes in the API to their counterparts in the GTFS feed, and then, for each vehicle, map its trip to the corresponding trip in the GTFS feed. This is done by querying a OneBusAway TransitDataService (via Hessian remoting) for active trips for the mapped route, then finding the active trip which most closely matches the vehicle’s trip.
Matching is done by constructing a metric space in which the distance between a stoptime in the API data and its counterpart in the GTFS feed is defined as an (x, y, t) tuple—that is, our notion of “distance” becomes distance in both space and time. The distances fed into the metric are actually halved, in order to bias the scores towards matching based on time, while allowing some leeway for stops which are wrongly located in either the GTFS or real-time data.
The resulting algorithm will map all but one or two of the 900-odd vehicles on the road during peak hours. Spot-checking arrivals for stops in OneBusAway against arrivals for the same stop in NextBus shows relatively good agreement; of course, considering that NextBus is a “black box”, unexplained variances in NextBus arrival times are to be expected.
You may wonder why we can’t provide better data for Metrorail; the answer is simple: the API is deficient. As I’ve previously discussed, the rail API only provides the same data you get from looking at the PIDS in stations. Unfortunately, that’s not what we need to produce a GTFS-realtime feed. At a minimum, we would need to be able to get a list of all revenue trains in the system, including their current schedule deviation, and a trip ID which would either match a trip ID in the GTFS feed, or be something we could easily map to a trip ID in the GTFS feed.
This isn’t how it’s supposed to be. Look at this diagram, then, for a reality check, look at this one (both are from a presentation by Jamey Harvey, WMATA’s former Enterprise Architect). WMATA’s data management practices are, to say the least, sorely lacking. For most data, there’s no single source of truth. The problem is particularly acute for bus stops; one database might have the stop in one location and identified with one ID, while another database might have the same physical stop identified with a different number, and coordinates that place it in an entirely different location.
Better data management practices would make it easier for developers to develop innovative applications which increase the usability of transit services, and, ultimately improve mobility for the entire region. Isn’t that what it’s supposed to be about, at the end of the day?



Bike sharing and bike rental should cooperate, not compete, on tourism
While in Boston recently, I came across the following advertisement for Urban AdvenTours, a bike shop in Boston, stuck to a Hubway station.
Aside from the unseemly nature of having public property (the Hubway station) subverted as advertising for a private company, I take issue with the general tone of the advertisement: that Hubway is “not cost effective for hassle free exploration”, that bike rental and Hubway are “recreation vs. transportation”, etc.
At the core of this is a simple question: can bike sharing services, like Hubway, be used effectively by tourists? Or are they indeed better served by full-service bike rental firms? I would argue that there isn’t a precise, clear-cut answer, but it’s not as one-sided as Urban AdvenTours makes it out to be.
Continue reading →
Comments Off
Filed under Commentary
Tagged as Bike sharing