-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge changes from TfWM #58
base: master
Are you sure you want to change the base?
Conversation
… columns not being explicitly set - and some fields assumed to be present in route table (route_desc) are not present in BODS data
…quencies' and 'feed_info' when reading / writing BODS data. Removed assumptions about table names, all additional tables encountered in zip are read and written. also performance improvement when writing.
…ay anything else that it wasn't expecting. OK for the GTFS tables generated by this tool from CIF files, but caused data loss when using BODS data. modified to keep all encountered columns and tables - passes through if there isn't any specific code for the table in question.
renamed ZZ to reflect it's a generic obfuscated freight, not a specific operator
…lendar_date exists with no calendar (rare but valid data configuration)
…g about merge() failing
…GTFS into oo-tfwm-performance
merge up to latest leeds master
1) make load and merge agnostic about expected tables and fields, so that optional tables and fields are at least passed through rather than discarded. (allows GTFS generated by this tool and BODS to be merged) 2) tighten up data type expectations for fields in BODS data 3) fix bugs when calendar dates have no calendar (this is a legitimate but range arrangement of data) 4) performance improvement while formatting files before writing to file system (3 minute improvement when writing 400mb zipped GTFS file (all GB BODS extract) There is one line in merge that needs resolution - complex line that I couldn't understand so commented it out. calendar_dates <- calendar_dates[sapply(calendar_dates, function(x){ifelse(is.null(nrow(x)),0,nrow(x))}) > 0] no idea what it's supposed to do and no explanatory comment
…o the caller being a different shaped structure being passed back
…tries in LO table
change to use data.table (likely a breaking change) performance improvements add setting of route_long_name "Train from..." to change based on mode
…wrong, overlays not being applied correctly
…bles, change default options to highest performing write combination
…as also been extensively profiled and reasonable level of optimisation applied
…sts for revised overlay functionality
…s from too much scrolling
…of date+time (this means if github can't be reached should only nag once per day to update instead of on every single package load) 2) add timeout to update check on package load and make it short (but make default when explicitly called long)
(atoc data doesn't seem to have many non-public moves in it to start with)
…ily vectorise process_activity
Wow, thats quite the PR! THis may take some time to check. |
Yeah, probably easiest way is to review the unit tests first, then review the completely new file atoc_overlay.R , then pull down both branches and diff across them rather than the line-by-line view that the online diff tool tends to give, which is a bit myopic for changes on this scale. CI build seems to be falling over on something related to update_data stuff, then doesn't seem to get any further into the unit tests in the github CI process, they work ok in R Studio. _Warning in file(con, "w") :
Running specific tests for package ‘UK2GTFS’ |
…e loaded into OTP
… more permissive licence than ATCO data.
… they are NA instead
…t completely follow the schema. Also add 'metro' to long name for underground & Tyne&Wear metro services
…e name instead of NA
…ass time 2) when looking at non-public services include passes marked as 'no pickup/dropoff available' from a GTFS perspective
Resolving conflicts with @oweno-tfwm 's PR in a new branch
Large number of changes in this fork - likely to result in some instability until they are bedded in.
Accompanied by a reasonable number of unit tests for a range of overlay situations
Now at a minimum maintains all input tables and fields, even if it doesn’t do any special processing on them
Have also changed the default options on this fn to the highest performing combination
This means you can do things like using it to convert CIF to CSV like this…
# Read In each File
mca <- importMCA(
file = "..\RAIL-CIF\RAIL-CIF.gz",
silent = FALSE,
working_timetable = FALSE,
public_only = public_only
)
writeGtfs( mca, filename = "RAIL-CIF-CSV" )
(I’ve not actually yet tried pumping these into OTP to see if it moans about the extra fields)
So have put in a fix to stop this getting activated when creating a child process.
Still seems to be absolutely battering the API to get fresh data every time I build, think the caching may need a bit more work to look at some dates so it only ‘phones home’ at most once a month ?
performance
There is a high chance that this is a breaking change in some code paths
– while I’ve got good confidence in the main NR code path because that’s the one I’m testing heavily
– and also confidence in paths I’ve written units tests for (although they are mainly testing the success path and don’t probe edge cases)
I’m relying on the unit tests for all the other code paths, so it’s quite likely something somewhere is broken by this change
So that’s a lot of change in one go, likely to be a bit of instability on code paths with low unit test coverage.