Post

So you want to do a public transit data project, part 1: Working with common sources

A short guide to common types of transit data and where to find them.

So you want to do a public transit data project, part 1: Working with common sources

Getting started with transit data

Public transit offers fertile ground for tech and data projects. There is abundant publicly-available data, including geospatial, temporal, financial, and other common data types, often with at least some historical coverage. Plus, you know, people love trains.

Public transit data can also be tricky. The data often comes in formats and sources that may not be familiar, there is industry jargon, and there’s a lot of variation from agency to agency.

With that in mind, I am planning a series of posts to cover some key concepts and common data types related to public transit. I have worked professionally in the public transit data space and I’ve also done volunteer civic technology work with transit data. I’ve seen how transit data is often used by people who are engaging in journalism, volunteer, or side projects, and I hope that these posts can help people who are excited about transit data but aren’t totally sure where to start.

These posts are most likely to be useful to people who have some familiarity with data analysis but who are not domain experts in public transit. I come from a data analytics background, not from an urban planning or transit operations background, so apologies to the planners of the world if my framing is sometimes imprecise.

So, with all that said, let’s get started with a quick overview of some of the most common types or categories of transit data.

Types of data

The following types of data will often be available across agencies.

  • Ridership data: Ridership data includes counts of how many trips were taken on transit during a given time period. Ridership data can help you answer questions about the usage level of different transit services in your community. For example, you would use ridership data to determine which bus route is most popular for a given agency. Ridership data will be covered in more detail in Part 2.
  • Schedule data: Many transit agencies publish their schedule in a special dedicated data format called GTFS (General Transit Feed Specification). GTFS data tells you what the agency’s routes are, where they stop, when and how often they run, etc. Schedule data can help you answer questions about levels of scheduled transit service: for example, which is the most frequent bus in your community, or how far you can travel from different neighborhoods based on their available bus or train service. Schedule data also includes the geographic information you would need to make a map of transit lines or stops. Schedule or static GTFS will be covered in more detail in Part 3.
  • Realtime data: In the last 5-10 years, it has become more common for transit agencies to publish realtime data about their services, including the realtime vehicle tracking and predictions that you see in transit apps. For example, realtime transit data is what you would need to show when the next bus is going to arrive for a given bus stop. Realtime data will often be published in the GTFS-RT data format.
  • Other operational data: Operational data could include information like how many vehicles an agency operates, the agency’s budget, information about agency staffing, the agency’s safety record, etc. Things like on-time performance for an agency might be published as part of its operational data reporting as well.

Common sources

The common data types listed above are often available through standardized or similar locations across transit agencies. When you’re looking for this data, you might find it in one of the following places:

Example projects

It can also be helpful to start by looking at existing related projects and, if available, their code and data sources. Some Chicago-focused transit data projects that may provide some inspiration include:

These lists (especially the example projects) are far from exhaustive, but hopefully they provide some helpful pointers for anyone who is trying to figure out what data they need and where they might be able to find it. Happy analyzing!

Post photo by Jeremiah Higgins on Unsplash

This post is licensed under CC BY 4.0 by the author.