Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many stations have 2-3 points or more on map. Fix this #153

Open
mbjackson-capp opened this issue May 21, 2024 · 3 comments
Open

Many stations have 2-3 points or more on map. Fix this #153

mbjackson-capp opened this issue May 21, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@mbjackson-capp
Copy link
Contributor

For many subway, commuter rail, and streetcar lines, what looks like a single station point is actually 2-3 points on the same spot. This results in massive overcount of number of stations. It will also mess up things like ridership and station/route relation once tooltips are further built out.

See O'Hare Blue Line CTA stop -- at max zoom, it looks like one point:
Screen Shot 2024-05-21 at 11 14 41 AM

But when you zoom out the clusters, it says there are 3:
Screen Shot 2024-05-21 at 11 14 57 AM

And clicking the cluster shows all three:
Screen Shot 2024-05-21 at 11 15 32 AM

I strongly suspect that at least two of these are GTFS treating inbound-direction and outbound-direction as separate stations. Not sure where the third one comes in.

(This shouldn't affect bus stations. Bus stations accurately appear as separate objects on different sides of the same street block.)

A very hacky way to do this might involve a SELECT DISTINCT (but Django syntax version) on querying the database, so there are no exact-matches on location coordinates. It may make sense to set some fuzzy tolerance so stops of the same name and transit mode with coordinates within ~10 meters of each other are not allowed.

There's more backend questions about how to ensure all the important data for each stop is aggregated into the one point that remains.

It would also help as part of this to refactor the views.py and map.js code so that the actual stations geoJSON objects are passed into the map JavaScript (the way the routes are), so we can add more stuff to tooltips (like station_id) and test which attributes are duplicated vs. not across the redundant points. I may make a separate issue for that.

@mbjackson-capp mbjackson-capp added the bug Something isn't working label May 21, 2024
@mbjackson-capp mbjackson-capp self-assigned this May 21, 2024
@mbjackson-capp
Copy link
Contributor Author

@JPMartinezClaeys in standup: it's okay for there to be multiple rows for same station on backend, just make sure only "the first one" displays on map

@mbjackson-capp mbjackson-capp changed the title Many stations have 2-3 points or more on map Many stations have 2-3 points or more on map. Fix this May 21, 2024
@mbjackson-capp
Copy link
Contributor Author

mbjackson-capp commented May 21, 2024

Exact issue differs by service.
For Portland MAX and and the Tigard TC WES Station, we see three station markers in a row, with separate station_ids and distinct locations -- e.g.:

Screen Shot 2024-05-21 at 2 34 48 PM

Of these, some, BUT NOT ALL, of the real-life stations have one or two rows with five-digit station_ids, and a third with station_id of form station-<2 or 3 digits>. (There are 97 total MAX stations and this is true of 68 of them. Others, such as "Clackamas Town Center TC MAX Station", have just one row with a numeric station_id.)

Screen Shot 2024-05-21 at 2 54 48 PM

For NYC, the station markers are exactly overlapping, with an overarching station_id number and then two separate entries numberN and numberS (for "northbound" and "southbound" perhaps? Though nomenclature is consistent even for lines that mainly go east/west)

Screen Shot 2024-05-21 at 2 37 57 PM

For CTA (but not Metra), the station markers are also exactly overlapping, with two station_ids starting with 3 (likely representing inbound and outbound travel) and one starting with 4 (likely representing the "overarching" station):
Screen Shot 2024-05-21 at 2 45 24 PM

Interestingly, the ones starting with 4 have slightly mismatching names sometimes -- for example, "Addison - Red" might be rendered as "Addison (Red)":

Screen Shot 2024-05-21 at 2 44 27 PM

There are 141 rows for CTA stations stop_id starting with 4. The exact number of CTA stations in reality is 145; we would want to double-check to make sure the four that are missing are missing for good reason (e.g. closed for repairs).

@mbjackson-capp
Copy link
Contributor Author

mbjackson-capp commented May 21, 2024

Potential "hacky" cleanup strategy for each city:

  • Portland: if there is a station with ID station-, use that; elif there are multiple, choose one at random; else use the only one
  • New York: select out only stations that don't have an N or S at the end of their station_id (again, would require checks that no stations are missing)
  • Chicago: select out only stations whose station_id starts with 4

Add some filters/query-ness/selectivity in views.py where the stations variable is defined, using a conditional or dictionary to determine course of cleanup based on which city context you're in.

Would not scale well if new cities are added, through process of adding new cities would likely require inspecting GTFS feeds for issues like this in some manner no matter what

Aspirational: figure out how to tie each station to all routes that go through it and display those on tooltips

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant